Intuition is a vital component of intelligence; few of our daily activities actually require logic and we solve most of our daily problems using intuition-based methods. This includes body movement, understanding and using language, and learning new skills. Artificial Intuition may well be a stepping stone to full-blown AI, should be much easier to implement, and has many economically important applications. At the 2007 Foresight Vision Weekend, AI researcher Monica Anderson provided an analysis of the problem domains of Artificial Intelligence, and discussed the advantages and disadvantages of intuition based approaches.
The following transcript of the Foresight Vision Weekend presentation by Monica Anderson entitled “Artificial Intuition” has been corrected and approved by the presenter. Video and audio are also available.
Most of our actions do not require logic. That includes some pretty sophisticated actions, such as language understanding and generation, body movements, solving everyday problems, etc. Intuition, which is a term I will define, is a simple mechanism with sufficient power to perform such actions.
Artificial intuition is likely much easier to implement in computers than artificial intelligence is. I am going to use the term AN for artificial intuition. It is analogous to the N in Meyers-Briggs. In spite of being radically different than artificial intelligence in general, I expect this to be a path to AI.
I’m going to be talking about the research project I’ve been doing for awhile. I’m going to talk about the purpose of intelligence. Two background concepts, named emergence and complexity, which I will cover very lightly. Then I am going to talk about problem types that require intuition and/ or intelligence. The bulk of the talk is about intuition versus logic. Then I am going to talk about the cost of using intuitive methods.
First, a couple disclaimers. This is all unproven theory. You should compare it to all other unproven theory about artificial intelligence. I have running code that shows some promise. Also, others have used the term “artificial intuition” for other concepts, but I am going to use it anyway because it covers exactly what I want to say.
I have some experience with weak, logical AI. I’ve written three expert systems for Cisco. I’ve done that for natural language processing for a couple of companies. Around 1998, I had along with lots of other people been waiting over a decade for Doug Lenat‘s CYC project to be released. It had been much hyped as being the be all end all of intelligence. I got a chance to play with that for five months in a job that I had, and it was a disappointment to me. So I lost faith in strong logical AI. Around 2000, I got inspired to look at subsymbolic methods instead of symbolic methods. The difference is basically that in strong logical AI, you are building models of the world, and in subsymbolic methods you are building models of the brain so that you can figure out how brains learn things.
In 2001, I had the best idea I’ve ever had. It opened up a rich field of possibilities to explore about these issues of artificial intelligence. I’ve been exploring it full time ever since then, except for two years at Google in 2004 through 2006 when I was refilling my coffers because I ran out of money. In January this year, I built a personal supercomputer and I will be funding myself beyond 2010.
The project strategy is basically to create intuition-based electric gray matter, if you will. If it works, it will be useful either as stand-alone or as a building block in other artificial intelligence. The application domain I am aiming at is document understanding, which means automatic discovery of semantics of text. This is a pretty advanced goal. This I expect to work for any language, not just English. The goal is to learn language from raw text and from web pages using unsupervised learning, starting from a blank slate.
The first goal will be to demonstrate measurably better than state of the art for multiple tasks, such as topic shift detection, which is exactly what it sounds like. Chinese segmentation, because the Chinese do not put spaces in between their words, is an economically important problem. In order to do anything meaningful with, for instance, Chinese web pages, such as index them, you need to be able to put the spaces in. The quality of Chinese indexing is to a large degree dependent on how good your segmentation is. I don’t know Chinese, so I’m going to work in English. If it works in English, it’s going to work in Chinese.
Breakthrough in this area would be AI complete. Basically, if you can do one of them, you can do all of these things. It opens hundreds of important applications. Let’s look at a few of those. In the future you could imagine doing things like semantic search, which is, of course, the Holy Grail of web search. Semantic search means that you get sites that are matched to what you mean by your query, not by the words in the query. Perfect speech recognition, perfect language generation, perfect language translation. This should be possible if you can get this kind of stuff to work.
The purpose of intelligence
I have speculated on the purpose of intelligence. What’s it for? My theory is that the first brains evolved as nerve clusters to coordinate movements of body parts. For instance, if you have multiple legs you don’t want to step on your own toes all the time. The earliest walkers probably relied on feedback for where to put their feet down, because you don’t want to have too many feet in the air at the same time. Insects probably still use this kind of feedback-based locomotion. But if you want to move fast, if you want to run, then all legs might be off the ground at some point, because I think that is the definition of running. If you are doing that, you must be able to predict your limb positions, the leg impact locations, and the impact times. Otherwise, you can’t run. Also, nerve impulses are too slow for feedback for certain tasks, such as precision throwing. If you are throwing darts at a dart board or a rock at a rabbit, the angle at which you must release the projectile is very small. You can’t rely on feedback to do that at the right point in time. For high precision throwing, you need to do prediction.
Once you start doing prediction, once you can learn how to run, you get an evolutionary pressure to be able to do it better. You want to make longer term predictions, higher precision predictions. You also want to include sensory inputs such as seeing where you are running so that you don’t run into objects. After awhile, you can start to model the world. You remember safe places to be and dangerous places to be. For instance, safe places to put your feet down. After awhile you start to model other agents, such as predators and prey, other members of your tribe such as potential mates and potential rivals.
Evolution rarely throws anything away, so I believe that prediction, and not logic, is still the most important low-level operation in the brain. You can extend predictions by nesting and cascading these low-level predictions. What you do is you start by predicting low level things like how you impact the ground, etc. Then you start predicting the neural impulses in your brain that observe that. So, in essence, you are predicting your own predictions. That’s the way you can extend the time.
Let’s do a quick experiment. “And the rocket’s red glare…” what comes after that? “The bombs bursting in air.” What comes before that…? Okay, you don’t know. The brain basically is built for prediction. It wants to look forward in time. It does not care what’s behind. I’ve been told that Italian sports cars don’t have rear-view mirrors because if you need to know what is behind you, you’re driving to slow. Science is also for prediction. You want to predict experiments and phenomena. You can predict planetary orbits years in advance. You want to plan the safe loading of a bridge when you build one.
Now, let’s talk about emergence quickly here. Emergence is a strange concept because to some people it is obvious that it exists, and some people deny its existence. I like to explain it with an example. I say that a single water molecule does not have a temperature. It’s just floating in space; you can’t say it has a temperature. Temperature is only defined for aggregates of molecules. It is a statistical measure of the relative motion in the aggregate and the behavior of aggregates changes depending on the temperature. For instance, you have ice, water, and steam. They behave very differently.
Emergent properties then are system level properties that do not exist at the lower level, like the molecular level for example. Sometimes these higher level properties cannot be derived from lower level properties. Sometimes they could, in theory, but it’s not worth the trouble. Let’s look at some examples.
I talked about ice, water and steam, where you have properties like wetness, steam pressure, and temperature. How about quality, reliability, top speed and lifespan? The quality of a car isn’t in any single component of the car. It is basically there in every component, the way it is assembled, and the design. It is a system-level property. Radio reception is one of the canonical examples of emergence. The radio reception is not there in any component. It is basically a function of everything put together right.
There is nothing magic with emergence. The designer of a radio knew what he was doing when he designed it. Semantic language from mere text. Life from non-living atoms. And intelligence from mere neurons. These are other good examples, and I like the last one a lot. If you are building artificial intelligence systems, you better build it out of non-intelligent components because if you try to build it from intelligent components you have just pushed a problem one level down. Emergence is the only thing that works to get you intelligence.
Complexity is complex. I’m going to talk about my take on this. Other people have similar ideas, they just express it a bit differently.
This I claim is an example of component-dominated complexity. It’s how we like to build software systems or engineering in general. If you want to build an opera house or a 747, this is what you try to do. We basically have a system which we try to break down into components, and we break those down into sub-components. We try to keep down the number of interfaces between the parts. We try to keep them at least regular, if we cannot keep them simple. All of the functionality is inside one or another of these components. All of the complexity is in the components. That is why it is called “component-dominated complexity.”
We should contrast that with what we find in life, which is basically richly interconnected simple parts. You find this in neurons in the brain, societies, ecologies, life in general. Also in things like concepts, ideas, abstractions, and language. You have simple parts, that can be basically identical if you want, that are richly interconnected. Here, it is not what you know but whom you know that matters, as they say in Hollywood.
Let’s look at the problem domain, if you will, of artificial intelligence. We start with the interaction-dominated complexity that I just explained. If you think about those nodes, if you imagine them having memory, then that gets even worse. If they have non-linear responses it gets even worse.
These three together we basically talk about chaotic systems. If you have one of those in sufficient quantity or you have all three of them, you end up with a system that you cannot predict long-term at all. You may be able to predict it short-term. Then we have the curse of dimensionality. Its formal definition involves hypercubes and hyperspheres. But for these purposes I like to take the example of building a world model. You have objects in the world model and types of objects. The types have properties and the properties have multiple values. So if you have a number of these objects, you end up with a Cartesian explosion of all of these properties and property values interacting with each other and it very quickly blows up in your face. Anyone who has tried to do some serious world modeling knows that this is a major problem.
You have “systems,” so to speak, which are inseparable from the environment. You can’t take a squirrel out of its natural habitat, study it in the lab, and expect it to behave the same as it does in its habitat. You also have systems that never stop changing because they are continuously responding to their environment. These three together I call systems which require a holistic stance. You basically cannot analyze them by taking them apart.
Then we have ambiguity of various kinds: ambiguity of incomplete information, incorrect information. These are ambiguity in your in data. In computers we call it “garbage in, garbage out.” If you can’t trust your input data, you basically can’t do anything useful. It is really a killer for representational systems. Then we have self-reference, apparent paradoxes, and strange loops. Kurt Gödel said that in any sufficiently strong representational system you will be able to express paradoxes. Doug Hofstadter has written a lot about Gödel’s theorem in the book Gödel, Escher, Bach. He also talks about self-reference and strange loops in his other books. Strange loops are things like you have a type-based system, and an object is of type A, which is a subtype of B, which is a subtype of A. It loops. That is the kind of loop that you might end up with. Your system, if it is a representational system, had better know how to deal with those without getting stuck.
Then we have the necessity to keep parallel representations at once. You must believe five contradictory things before breakfast. You have multiple points of view, and hypotheses that you might want to throw away if they don’t work out. Or you might be given a hypothesis from the outside. All these I lump together and say these are systems that contain ambiguity.
Finally, in the fourth quadrant, we have emergent properties, which we have already talked about. So here we have sixteen kinds of mess. Basically, each one of these is outside what we can do with our current science. Some of them we can beat to death with lots of computing power, but in many situations we have things from each quadrant. People in logical AI and elsewhere go ahead and try anyway by modeling the world in general (the CYC project), modeling the global economy and stock markets, computer vision, and the discovery of semantics in text.
All of these problems have all sixteen of the problem types that we looked at. They have a lot in common. We find systems and problem domains that are at once chaotic, require a holistic stance, contain ambiguity, and exhibit emergent properties. I’m going to be talking about these problem domains a lot, so I need a name for this.
I didn’t pick the name, but they are known somewhere as “bizarre systems.” When I say bizarre systems, you have at least one from each quadrant, but you may have all sixteen. Typically, these travel in packs. It was used by Dr. Steven Kercel, the first time, at the ANNIE conference in ’99. You can find some papers about it on the web if you look. There are several almost equivalent terms: impredicative processes, closed loops of causality, and complex systems. You might see why I prefer “bizarre systems” if you see the alternatives. If you look up complex systems in Wikipedia, you will find roughly the same list of things that I have already shown in those sixteen. Which one of these terms you use depends on whether you come from systems theory, chaos theory, or whatnot. You discover that these problems are important.
Intuition vs. logic
Now I’m going to comapre intuition and logic. That’s the bulk of the talk. Intuition and logic are two strategies for prediction and problem solving. Logic is not better, it’s just different. You can use either, both. The choice is often dictated by the problem domain. Sometimes we choose poorly.
What can logic do? We use it in hard sciences such as mathematics. We use it in physics. We build logic based models of causality. In computers we use it both to build them and to program them. We use it when we do puzzle problems. It is taught in schools. We use logical rules to manipulate logical statements, such as when you are doing algebra and so on. When you are doing this kind of algebraic manipulation of forumlas, innovation is in theory unnecessary. It is basically just syntax manipulation. I have a lot to say about that, but let’s just say for now that innovation is unnecessary. That we will be using in intuition.
Logic has some advantages. It can be used for long term predictions, for planetary orbits. They can do high precision predictions such as masses of elementary particles, which we can predict to four or five decimals. They are productive, which means that you can use logical theories that you have to logically create more theories by the kind of manipulation I talked about earlier. Again, no innovation is required for extending science in this manner.
Logical methods have an excellent track record. They also have their limits. They require theory. Some of you might say, “That’s no limitation, you must have theory, otherwise you can’t do anything.” It turns out that we can actually do things without theory. Therefore this is a limitation on what kind of problems you can approach. You must a high level model of the problem before you can solve it. They require idealized conditions. Everyone who has cracked open a physics book has seen the “All else being constant…” statement. In the diagram of sixteen, up in the corner we had “separate from the environment.” That is basically what they are saying.
Logical methods cannot handle bizarre problem domains. If you have bizarre domain problems, meaning you have something from each quadrant from the previous diagram, logic is stumped. You cannot use it. This means you cannot attack many problems in the life sciences. You can also not solve everyday problems including semantics of language.
Let’s look at intuition, the competitor. Intuition handles the everyday problems. Intuition allows a short-term prediction in bizarre problem domains. This is exactly my definition of what intuition is. Sometimes you do it wrong, but that’s okay. You do these short-term predictions in these bizarre domains, in spite of having ambiguous information, and that’s quite a feat. That basically controls things like prediction and control of body movements, pattern recognition such as language and melodies. It also gives you innovation and novelty. At this point, I’m just going to put it up there and not talk about it too much. That will be another talk.
It has advantages, like it’s fast. You may have read Blink by Malcolm Gladwell. It is theory-free. As I said, you do not need a high level logical model to be able to use it. If you are building artificial intelligence, this should be a major feature. Intuition is immune to complexity, chaos, constantly changing conditions, paradoxes, and ambiguity. There is no logical model that can get confused.
Intuition has limits. It cannot do long-term predictions. It cannot do high precision predictions. It is not productive. If you have an intuition about how to do something, you cannot generate new knowledge by mechanical manipulation of an existing theory because you ain’t got one. It requires prior experience and valid generalizations. You can do generalization, but you cannot extend, the way we do with knowledge. The generalization, I will get into in detail what that means.
Here is the really cute thing. Performance of intuition based tasks improves with experience and practice. Logic does not improve with practice. Logic based tasks should in theory be teachable by just stating the logic once… perhaps to Mr. Spock. Intuitive tasks can be taught using coaching style methods. In school we practiced doing arithmetic. Some of the most logical things we were doing, like arithmetic, we actually had to practice at it. Isn’t that weird? You can use this test, whether it requires practice or not. You can use this to determine whether a given task is based on intuition or logic.
So, how does intuition work? Intuition operates at the level below logic. You observe the world, you observe events. These events get converted into nerve impulses before they reach the brain. There is preprocessing going on here and there but you get nerve impulses to the brain. Once in the brain, you get further signaling of nerve impulses going back and forth. It doesn’t matter where the impulse comes from. The logic of the brain, I believe, is that it tracks precedent and antecedent events. It keeps track of what precedes what. Then you start assuming that the precedent events are predictive of the later events. You try to generalize the triggering event type, and then you adjust the details whenever you are wrong. This is basically the brute force version of the algorithm. For millions of years, evolution has discovered some very elegant shortcuts… as have I, and I’m not going to tell you about them. But I will say that there is no need to model causality, because you can stay at the event level and do your predictions.
There are more limits to intuition. For instance, it does not have a good reputation. It’s a mystical power attributed to women. Intuition is “unscientific.” That is because intuitions are often wrong, and science does not like to be wrong. Intuitions being wrong typically come from two sources. Insufficient prior experience or a bad generalization of the triggering event.
“Failure is the only option.” This is a joke from Brad Templeton when he saw these slides. Logical AI tends to model the world. But the world is bizarre, by the technical definition of bizarre. Logical sciences insist on correctness most of the time. This conflict is a fatal conflict for logical AI. You cannot make an infallible godlike artificial intelligence. You just can’t. I think artificial intelligence really should have been a soft science. Artificial intuition systems fail softly at the edges of their confidence. They basically make human-like mistakes. They are not brittle like logic-based artificial intelligence systems.
Let’s look a little more at the fallibility. Neurons are fallible. They are based on electrochemical processes and they involve thresholding of the concentration of neurotransmitters. There are basically indefinite delays. You cannot say how long the delay is. Because there are so many neurons in the brain, you have race conditions all the time. You can’t expect the brain always to get the same result twice. Artificial intuition systems the way I build them, they are fallible all the way down. They are basically nondeterministic.
Failed predictions indicate ether trouble or novelty. It’s basically a situation you have not seen before or something else went wrong. What you get is emergent reliability against these internal errors. If you have a system that is resilient against internal failures, it implies it can be robust against failures in input data. You basically take the holistic view and say, “I am one with my environment. It doesn’t matter where the problem lies, I can correct for it.” In other words, the same mechanism that handles internal failure handles ambiguity and misinformation. We call that “emergent robustness.”
Logic is uncommon. Most people never learn logic. Logic is a recent invention, and yet children are intelligent, most people are intelligent. We were intelligent in 1000 BC, and some animals are somewhat intelligent. Intelligence therefore cannot require logic. It is my belief that intelligence requires intuition.
“No, wait,” some people say. “That can’t be right. I’m a logical thinker.” Well, maybe you are, but most of us are not. Intuition is unscientific. We don’t need logic and science most of the time. I believe intelligence is like 99% intuition. I think of logic as a calculator sitting on my desk. When I need logic, or a calculator, I reach for it and use it. Most of the time I don’t. The reason we think that logic is so important and intuition is so unimportant is that there is no glory in solving everyday problems. We understand language all the time. We speak, we walk, we do all of these things. They are all problems in bizarre domains, but we are so used to dealing with them. Every now and then, science manages to crack another problem using logic, and you get Nobel Prizes for that. This is like unfair. We are used to solving bizarre domain problems all the time and logical solutions get noticed. So there is a skew of frequency of solution and the type of problem that you are solving.
Logical methods are used in the cracks between the problems that it cannot handle. Some problems cannot be solved at all. Some problems require both intuition and logic. We don’t always classify problems right. Whether or not a problem also requires logic, you must use intuition-based methods to attack problems that require intuition, or you will waste a lot of time and effort.
In particular, artificial intelligence requires artificial intuition. “No, wait. That can’t be right. Computers are logical, not intuitive.” Well, programming intuition into computers is quite straightforward. I have been exploring this since 2001. There may even be more than one way to do it. Artificial intelligence might also require some logic, but I think it’s likely 99% intuition.
Let’s return to predictions. There are two kinds of predictions. The hard sciences use logic for long term and precise predictions. In daily life we use intuition for fallible, approximate short term predictions. Both work, but have different advantages and apply in different situations.
Here I will plot the complexity of a problem to the range of prediction that you can make.
Simple problems sit close to the corner. Over here are predictable problems. Next to those are the complex and bizarre problems. The rest of the chart is called the “absurd” region because you can make no models whatsoever about anything in the rest of the chart. The absurd area is basically pushing down, making it impossible to do anything, except close to the axes. Here you have either too complex a problem or expect too long a range of prediction or both.
How do we solve problems? Current logic-based science can solve reasonably complex problems for the short term and they can do very long-range predictions. Here are pool balls, weather, and planetary orbits.
Future logic-based science will of course improve upon that making longer-term predictions and handling slightly more complex problems.
Here is intuition. Intuition can solve really complex problems for really short periods of time forward, but they cannot make very long-term predictions.
Where they overlap, we can use either one. We can use pool balls as an example from a mechanics textbook and compute what will happen when they collide, or we can go down and shoot pool. We can use satellites, radar imagery, thousands of sensors for barometric pressure and supercomputers to compute the weather, or we can open a window and say, “It looks like rain.” So, in this corner we have a choice.
Over here, intuition outperforms logic. Typically humans outperform computers in this region. Semantics of text is something we do on a regular basis and computers cannot do, because it is over this line.
So different kinds of problems have different approaches. Science is over here, and here is artificial intuition. It can give us a sliver of improvement over what we can do already. But the main benefit of being able to do artificial intuition is because we can do the things that humans do, such as understand language. It does not have to be better than us. If you had something that was as smart as a teenager and understood English, it would be worth millions and millions.
Let’s look at the overlap area in some more detail.
Here is good old fashioned artificial intelligence, as it’s been called for decades now. Using the science style methods, logic, they try to push upwards in complexity.
Sometimes they overreach and you get brittle artificial intelligence.
My research starts down here. I’m using the intuition-based technologies in this area, and I conduct experiments, but you can do the same things with logical methods. Only when I get up there will I have a useful demonstration. But I’m messing around down here at the moment using methods that I believe will scale all the way up to consciousness, if you like.
Logic vs. intuition: the tradeoff
There is a cost to using these methods. It requires that you think entirely different about most things that you do with a computer. One could call it a paradigm shift, but the truth is, this actually is worse than a paradigm shift. Historically, advances in science have increased precision of previous methods. With planetary orbits, Ptolemy’s model gave way to Copernicus, then Newton, and Einstein. This time it is different.
In order to handle bizarre problem domains we must trade in many principles of logic and hard science. This is the logic/ intuition tradeoff. Logical methods provide the best answer, all answers, the same answer every time, the answers in bounded time. The simplest theory often is the correct one. You understand how you got the answer and you can understand the answer when you get it. This is what we get with logical science.
The brain needs none of those. If you are a programmer, you will especially curse this one, because it means you cannot debug. If you imagine having a brain in your computer and you watch it think, you will have pointer activations mushrooming completely all the time. There is no way you can stop it. It’s really, really hard to debug these things. This means that you may get a working brain that behaves as you want it to, it understands English and does wonderful things, but you can’t figure out how it works. It’s as opaque as a regular brain.
If you are trying to use intuitive methods and you say, “I can be just a little bit optimal over here if I tweak the thing a little bit,” you are very likely to throw out the baby with the bath water. Your stuff will not work. These are poison to intuitive systems. We hope to get something in return when we give those up. Intuitive methods provide you things that are absolutely essential to artificial intelligence. You get to use theory-free systems. You don’t have to provide a high level model of why this thing goes down; you just watched it so many times that you know that it will.
You get novelty. Again, I’m not going to talk about why, but you definitely can get novelty and innovation in these systems. The systems will organize themselves, which means that you get emergent reliability, emergent robustness, hypotheses, multiple representations, etc. The systems are scale free; they will scale indefinitely. And they give you nearly constant time access to your data. There are two caveats with this. The first one is that when you are learning, it takes a bit longer. It would be perfect constant time access if it was not for garbage collection. Garbage collection is linear to the size of the database. In general, you don’t think slower because you happen to know twice as much as you did a few years ago. These things are straightforward to implement if you are using intuition-based systems, but you want to go beyond that. I’m hoping that if I do these and I do them right, I will get to certain emergent properties.
Mechanical discovery of semantics, which is of course the main reason to do this. You want to automatically be able to discover semantics if you are given a piece of text. Discovery and reuse of abstractions. Generalizable solutions: you want to be able to make one model that understands English and use it in hundreds of different applications. True learning, which is the ability to learn any kind of spatiotemporal sequence that you want, starting from a blank slate.
I’ve been talking about these things at the AI meetup in Menlo Park on alternating Sundays. I also have a website that I’m working on called artificial-intuition.com. I’ll take questions.