Superstition, Forgetfulness & Artificial General Intelligence
Posted by Jeriaska on October 26th, 2007Sam Adams is an IBM Distinguished Engineer within IBM’s Research Division. He argued in his 2007 Singularity Summit presentation that if we are ever to achieve computer systems that construct and maintain their own sense of meaning about the world, we must discover a way to imbue computing systems with real understanding, human-style semantics, and the commonsense reasoning of the human child at the very least. His talk presents evidence for its attainment in Artificial General Intelligence, as well as discussing its implications.
The following transcript of Sam Adams’ 2007 Singularity Summit presentation “Superstition and Forgetfulness: Two Essentials for Artificial General Intelligence” has not been approved by the author. An audio version of the talk is available at the Singularity Institute website.
Superstition and Forgetfulness: Two Essentials for Artificial General Intelligence
I want to join all of our other speakers in expressing our appreciation for everyone who’s come and all the fascinating discussion we’ve already had. We’re looking forward to a lot more of that. My talk is going to focus on what I believe are major challenges and pathways as we try to approach artificial general intelligence, but I am also going to talk about a project that I was privileged enough to be a part of at IBM between 2000 and 2003 where we were allowed to actually go for it, and say let’s take all the blinders off and all the chokes off and try to build one. But first I want to start with a quote from Rodney Brooks. This quote has been one of my inspirations over the years, which is something that he made around the time that people were saying that to do robots that can walk you are going to have to do some really heavy-duty, monstrous stuff. Then he came along with Genghis, that not only could walk but taught itself how to walk, could adapt to its environment, could figure out what was going on with itself, even though it was very simply coded. I think the goal here is the last statement there. What my work was about was something that works. How do you get to something that works? The previous speakers have talked about the difficulty of trying to predict what it’s going to be like if we have a superhuman intelligence. I’m not really worried about that. The only proven pathway to human levels of intelligence we have is the one we walked ourselves. So let’s go look at that, learn from it, and see if we can build something like it.
I think we have some major challenges in trying to do some understanding, meaning, and commonsense in computing. This is a quote by Walter Freeman, a distinguished professor of neurobiology and neurophysiology that I think says it very well. If a system is actually going to be able to really understand what it can do and interact in that environment, it’s going to have to create a sense of meaning, it’s going to have to have real understanding. That is something that we really have not focused on in the AI community and our endeavors. So I played around with the words a little bit to try to give you some of our perspective. Understanding requires an internal system of meanings. Meanings imply outcomes for situations, and all outcomes are subjective, based on experience. Now, commonsense, we’re the company that built Deep Blue, and we had a thirty-year project working on that. IBM has been involved in AI continually ever since people started thinking about it. We’ve always had projects going on in this space, as we think it is valuable, one to track it, two to see as advances come along how we can apply it to business. This quote was a good one. We can beat Gary Kasparov, but no computer has the commonsense of a six year-old child. The question is, why not? Why aren’t we focusing on that? See what a child can do.
Commonsense implies common sensation, producing common experience. Common experience requires a common embodiment and a common environment. Common environments require common events in a common time frame. When you start talking about computers that can “think a million times faster than humans.” If you just think for a minute of how much our perception and our cognition is entirely bound to the rate of time we experience, if it was twice as fast, you’ve all done this. Those of us that had record players as children, you know, we took the 78 thing and we turned it on 45 and saw, isn’t that funny? And then we realized we couldn’t understand what they were saying. We are bound to this particular rate of temporality and if we increase that too dramatically, to think that something that does that is going to be able to think like us a million times faster, I really don’t know what that means. I don’t think we really understand it.
All experience is subjective, because all sensation is subjective. So, if you look at those things that are innately human about human intelligence, human common sense, human understanding, human meaning is about, what does that tell us about what we’re going to have to do if we want to reach some sort of general intelligence capability in a machine? So, the project that I was one of the investigators on, along with Dr. Nancy Alvarado from UCSD and Steve Burbeck from IBM, was something we called Joshua Blue. it was to develop a computer system patterned after the human mind that could autonomously learn to successfully function in any number of embodiments or environments. The approach that we took was to model research findings in developmental psychology and developmental neuroscience, take those things seriously, and look for architectural clues in how human cognition actually develops from early stages, basically emulating the functional bootstrapping of a human mind, and then, going up to about age three. The reason we were interested in stopping around age three was that if you look at what a three year-old or a toddler can generally do, they can do all the stuff pretty effortlessly that classical AI says is incredibly hard. They can do all kinds of fancy motor control with their bodies, they can speak and understand natural language, they can learn lots of different languages. They understand naive physics, they can reason, they do all kinds of things. They haven’t learned how to read yet. They haven’t learned how to write yet, except for some really bright kids. But they do all this cool stuff that we think is incredibly hard. So we said at that point, somewhere between conception and three years-old, lots of magic happens, lots of juju. So, let’s go look at that and see what we can find.
To test that, we proposed doing something that we called a “toddler Turing test,” which we think is a much better measure of this kind of attainment of intelligence. It’s based on standard psychological tests for childhood development. From a little more technical perspective, this is the breadth of what we were shooting for. To be able to process a network that represents observations, experiences, perspectives, ideals, goals, plans, actions, and emotions. To be able to autonomously acquire commonsense knowledge by meaningful self-discovery of the characteristics of the system itself. It has to figure out what it can do and what it’s about, the environment that it’s in, other systems like itself and all the relationships between them. And the other aspect, since we’re very focused on a developmental learning model, we had to come up with a way to scale that kind of learning, because you take a system and you say, “Great, we’re going to build a developmental robot and teach it like an infant, and in twenty years you’re going to have something interesting.” You’ve got one. You could clone its brain, and then you’d have a thousand that were exactly alike. How do you scale that up so you get massive amounts of learning that then you can compress into the same mind?
From an engineering perspective, we were looking at creating a general purpose mind, a universal engine, that was independent of embodiments, that was self-bootstrapping with minimal a priori knowledge. Effectively, the goal would be to put it in a new body and it would bootstrap up and figure out how to run the body and how to deal with its environment. To have specialized embodiments that had limited interfaces between the mind element and the body, using the models that humans have. Sensor effectors, hedonic events like pain and pleasure, affect, emotional response. Then, as I mentioned, to have this kind of mind meld capability so you could take two of these systems, train them in different environments, and then be able to collect and compose the learning that they had into a new individual that could continue learning.
This is even more detailed but just to point out a couple of things. One of the things we said was critical was that if the system is going to attain general intelligence, it has to learn both about what its self can do and have a model of that, as well as what its environment does. That’s when you have two kinds of semiotic feedback for the creation of meaning. You have to get a view of what your self is capable of doing. Our body is full of sensors that give our minds constant feedback about where my joints are, what’s going on, how I’m feeling, am I hungry, am I tired? As well as getting a model of what’s going on out in the world. Another aspect of this that we found really important was really a deep integration of emotion as a control system. The systems based on emotional cognitive principles that we developed and also applied were from lots of work done from the researchers done in this area around the world. We found that that was a very fruitful way to add human style behavior into one of these systems, at least on the level that we were working with, which arguably was very primitive.
Now, I said that we went and we looked at the data from developmental psychology and neuroscience. This is a very busy chart, and in fact, the neuron density curve showed up in black, which doesn’t help. We went and looked over the literature and then we tried to map it all onto a chart like this. On the vertical axis we have human-level. So, neuron density at birth is roughly adult neuron density. We don’t have massively different amounts of neurons as we grow older. Almost all the other parameters we were looking at - language acquisition, eye focus and tracking, the bringing online of various sensory modalities and the integration of touch and balance and everything else - followed kind of a normal growth curve, where it started low and it went up high. We found a could of things that didn’t, and that really gave us an interesting insight. There are several researchers that have looked at synaptic densities, so how many synapses are there at any one time in life and what goes on? These things didn’t follow that curve, so we said something interesting is going on.
Peter Huttenlocher did a couple of interesting studies that are reflected in two curves on this chart. One was looking at synaptic density in the visual cortex. The general idea right now is that the synapse is where the action is. In effect groups of synapses, connections across neurons, form the basis for memory and n-grams and things like that. So we said, if there are a lot of synapses, then there is a lot of information, memories, or something. And if those go away, you lose some. He found by measuring synaptic density in the visual cortex in the first few months of life, about the first year there are huge numbers of synapses over and above what an adult has, and then it drops off. Why is that? Even more fascinating was what happened to the synaptic density in the neocortex, which he said at birth, there’s roughly the same density of synapses in a child as in an adult, but within about two to three months, 50% of them are gone. Now, why in the world would that be? If that’s where the knowledge is stored, if that’s where the information is, what’s the engineering view on having that work that way? We started looking at this and we said, Let’s take a different view of cognitive development and we’ll look at synapses like they’re an entity as opposed to just a place where two neuron trees kind of rub together and form a nob.
So we started thinking of them in terms of a life cycle. We said, let’s talk about synaptic birth. Synaptic birth occurs when two neurons fire in close proximity temporally and physically. So, they are rubbing up against each other, they fire, and they start forming a connection. Without any additional stimulation, without any co-occurent firing, that connection weakens and dissolves. As it grows stronger over time, that connection grows stronger because those two neurons are firing coincidentally within the same time periods. The synapse ages over time based on the firing experience of the neurons that are there and also its chemical environment around it. There are all kinds of issues when we look at things like pscyhoactive drugs, they affect that chemical environment, they affect what’s going on with the neurons. Some synapses live as long as there are neurons. We’re really glad for that. You don’t forget your name, hopefully, throughout your life, though in the end you might have some problem with that. Some are actually killed off in global plasticity events, global events where lots and lots of these synapses go away.
We applied this model not only to synapses but large assemblies of neurons and said if that’s kind of what’s going on architecturally in the information creation and, effectively, garbage collection space, for you programmers out there, what does that mean? What can we use it for? In the Joshua Blue system we took a special kind of semantic network and we applied this same kind of life cycle model in the context of emotional cognition, in something we called a meta-semantic network. So, superstition and forgetfulness: what did we learn? A lot of counterintuitive things. We were really surprised by just looking as an engineer at how human cognition comes about, and then trying to find analogues of doing something similar in a computer system. One of the things we found was superstition. Stevie Wonder is great, but he’s wrong. We believe all kinds of stuff that we don’t understand. In fact, everything you ever learned starts out as a first-time experience. You have no clue if it’s ever going to repeat. If you think about what it takes to learn how to function in an environment and learn how to be successful in meeting your own goals, you have to build a pattern, a predictive model, of your space. That means you have to understand what patterns you are going to experience are going to recur, and thus you can start mapping cause and effect relationships and affective relationships. Is this going to be good for me, bad for me, should I avoid it, should I approach it? It’s all based on that.
At the synaptic level, these are coincidental events. These two neurons just fire. There is nothing up there saying right off the bat, “That’s a really good one, we should keep that one.” The old view of a homunculus, the little man in the head that made all the real decisions about what was important, it doesn’t work. But this is a problem. Here, you’re creating information, you’re creating these synapses, with no knowledge of whether it is going to be interesting or useful. This is superstition. You don’t have any proof but you keep the information around. In fact, you use it. But it’s grounded superstition, and this is a really important point. The symbol grounding problem in AI has been around for a long time. The problem is building a symbolic system like a rule-processing system that uses natural language words but is really doing logical inference on the symbol. How does that apply back to a real system and how do you keep that mapping going? In this system, and we believe in humans, every one of those synapses that form form because two neurons actually fired. It’s real experience, so it’s grounded. It’s not a lie, it just may not be very predictive.
If you look at all that, how does a system that is built this way going to function when it has this tsunami of superstition flying around? The way that we believe it happens is by pulling the plug and opening the drain. I mentioned these two effects that were going on. Thatcher came up with this notion of cyclical cortical reorganization, and from his studies and looking at synaptic density in youths, he found that about every four years there was a major reorganization going on. So, what’s going on? Well, when my body was this big, it worked real differently than when it was this big. I’m glad that I occasionally forget my motor control theory internally because my body changed. This evidence shows that this kind of thing is happening not only in a continual way, because obviously we don’t remember every bit of input that hits our brain. There is roughly somewhere between half a petabyte and ten petabytes of visual information that hits your visual cortex every day. We don’t remember all that. You won’t remember half of this tomorrow, not even close to that.
You take these two things together, does this mean general intelligence? Not really, but we’re getting there a lot faster than expected. We applied these things to the metasemantic network in Joshua. This grounded superstition notion became a very hyperproductive source of new concepts, associations and, effectively, content. At the same time, we vented that combinatorial explosion of superstitious information with very aggressive forgetting. What you end up with is a race between this very optimistic view of your experience and very quickly culling everything that does not meet a certain threshold. This kind of continuous flow creates a very interesting system that allows you to get a view to autonomously adapt in this context of experience and emotional control, and without a homunculus. It can do it on its own, based on its experience.
The project was actually in full-time development between 2000 and 2002. We created a lot of interesting questions. We had some interesting algorithms as a result. We definitely need hardware to come up beneath our algorithms to run these things closer to real time. The question earlier about multi-core processors is in fact something I’m going to be working on in the next couple of years because right now with all this new architecture we have, we don’t know how to program it effectively. To be able to actually exploit that technology, we’ve got to go solve some intermediate problems in programming, but we also looked in this developmental model at not doing it in physical robotics but in virtual systems. The idea there is that you need a high fidelity enough virtual system where you can have a high fidelity body representation with lots of sensors and actuators to be able to put one of these systems in and let it learn. The other thing we came up with is a lot of interesting questions. One of the big ones is, we found we needed our system to have a much more flexible model of experience that accounts for the temporal warping that goes on in human cognition when it’s thinking about its own processes. When it’s looking at some cause-and-effect chain that it’s experienced, and our system could do that, how does it allow that to stretch and compress and still let you do the kind of pattern matching that allows you to figure out that maybe you ought to try this thing again?
To sum up, the pathway to artificial general intelligence, everyone has their favorite one. This is mine. We found following the child, looking at childhood development, taking it seriously. Go across campus and talk to the psychology departmen, talk to the neurophysiology department, find out what they’ve been learning, and then use your engineering skill to try to build analogs of that in the system. Thank you.


