Basics of Friendly AI Saturday, Mar 31 2007
friendly ai 8:43 pm
What is Friendly AI? From the glossary of Creating Friendly AI:
Friendly AI: 1: The field of study concerned with the production of human-benefiting, non-human-harming actions in Artificial Intelligence systems that have advanced to the point of making real-world plans in pursuit of goals. The term “Friendly AI” was chosen not to imply a particular internal solution, such as duplicating the human friendship instincts, but rather to embrace any set of external behaviors that a human would call “friendly”. In this sense, “Friendly AI” can be used as an umbrella term for multiple design methodologies. Usage: “The field of Friendly AI.”
2: An AI which was designed to be Friendly. Within the context of Creating Friendly AI, an AI having the architectural features and content described in this document. Usage: “A Friendly AI would have probabilistic supergoals.”
3: Friendly AI: An AI which is currently Friendly. See Friendliness. Usage: “The first AI to undergo a hard takeoff had better be a Friendly AI.”
And what, what, one might ask, is Friendliness?
Friendliness: Intuitively: The set of actions, behaviors, and outcomes that a human would view as benevolent, rather than malevolent; nice, rather than malicious; friendly, rather than unfriendly; good, rather than evil. An AI that does what you ask ver to, as long as it doesn’t hurt anyone else, or as long as it’s a request to alter your own matter/space/property; an AI which doesn’t cause involuntary pain, death, alteration, or violation of personal environment.
The reason why the definition is intuitive is because the precise definition has to be in terms of math. Math that gets programmed into the AI’s algorithms.
Why does the first AI matter so much? Why not ignore the first and just try to do a good job on the second, or the third?
Hard takeoff: The Singularity scenario in which a mind makes the transition from prehuman or human-equivalent intelligence to strong transhumanity or superintelligence over the course of days or hours.
Whatever you believe about AI improvement speeds, it’s best to assume a hard takeoff. This is because the costs of being wrong on this point are so very high. One more distinction, this time between “Friendship content” and “Friendship structure”:
Friendliness content: Defined in 1: Challenges of Friendly AI. The zeroth-order and first-order problems of Friendly AI; correct decisions and the cognitive complexity used to make correct decisions. The complex of beliefs, memories, imagery, and concepts that is used to actually make decisions. Specific subgoal content, supergoal content, shaper content, and so on. See 1.4: Content, acquisition, and structure; see Friendship acquisition and Friendship structure.
Friendliness structure: Defined in 1: Challenges of Friendly AI. The third-order problem of building a Friendly AI that wants to learn Friendliness (engage in Friendship acquisition of Friendship content). The structural problem that is unique to Friendly AI. The challenge of building a funnel through which a certain kind of complexity can be poured into the AI, such that the AI sees that pouring as desirable at every point along the way. The challenge of creating a bounded amount of Friendship complexity that can grow to handle open-ended philosophical problems. See 1.4: Content, acquisition, and structure.
One of the most common errors in initially approaching the idea of Friendly AI is to confuse Friendship content with Friendship structure. Instead of transferring over a fixed set of rules a la Asimov laws (1. Thou shalt not kill, 2. Thou shalt have no gods other than me, etc.), the challenge is to create a dynamic process that generates the “rules” we want automatically. The idea is to create a moral philosopher whose statements and beliefs garner reactions like, “wow, I wish I’d thought of that”, not a mindless machine that we have to be constantly worried is going to interpret “make humans happy” as “recycle all organic matter on the surface of the Earth into constantly stimulated homonid pleasure centers”. Successful Friendly AI is supposed to be a self-guiding arrow - a threshold of confidence at which, there’s no reason to worry that you “forgot something”, because the AI is on your side and will implement whatever safeguards you would think of, and more.
For the questions you’re thinking of, like “isn’t all morality relative?”, see the CFAI Indexed FAQ.

April 1st, 2007 at 2:18 am
This is interesting, I’ve thought about it a lot but never knew it had a name.
The Asimov Laws seem rather stiff - Friendly AI is a much better infrastructure to work with.
What concerns me though, and I may be in a minority, but I think creating an AI (especially more intelligent than we are) to serve humans, is akin to slavery. This isn’t to say I’d expect some sort of AI uprising as an inevitable result of doing so, but most humans have a tendency to rank species (and an AI would be included) in a hierarchy with humans at the top. Obviously, this has more to do with evolutionary competitiveness rather than some abosolute categorization.
Would it not be more prudent to construct AI that do not compete for resources with us, and are more than just friendly, but mutually cooperative? Maybe I’m splitting hairs. It’s good to know there’s a community of people thinking about our relationship with the future. Cool!
April 1st, 2007 at 5:53 am
I was going to post this on an earlier post, but it is relevant here.
From the earlier post:
“Humans need to realize that everything we consider “natural” and “normal” about certain psychological patterns is entirely contingent on our historical experiences in a pin-sized corner of the totality of mindspace.”
I agree that most people anthropomorphize and that the Asimov’s Laws are risible (if they worked, then Asimov wouldn’t have had any drama to write about). It also seems highly plausible that the space of possible intelligences is much larger than the space of human intelligences (e.g. from low-IQ to Newton).
However, I’m not sure how much we can say beyond that. For instance, given that we don’t know what intelligence is (in any detailed way), it is hard to say exactly how diverse the space of intelligent minds is. I will substantiate this point by analogy. Without knowledge of computational complexity theory, it is easy to assume that the space of conceivable algorithms is the same as the space of physically realizable algorithms. In other words, that all algorithms we can devise are such that we could (with huge amounts of computing power) actually implement those algorithms. However, we now know that this is completely wrong. There are simple problems which can only be solved via algorithms that cannot be implemented in our universe.
Now, it is easy to imagine all sorts of different intelligences. For example, some intelligences without any emotions, some with ranges of emotions bigger than the aggregate of all humans who’ve ever lived, some with a simple utility function (e.g. generate further digits of pi), some with a messy utility function that depends on all sorts of (often conflicting) drives/desires, some that have a utility function that selects at random from 10 billion different possible things to maximize (e.g. find counterexamples to Goldbach’s conjecture, or test possible solutions to protein folding, or try different chemicals for efficacy as smart drugs, or try devising technology for relativistic space travel, or try searching for hidden numeric codes in the Bible, etc.), and some that have basically flawless Chess strategy but are constrained by always having to make the same first move, etc. etc. Yet imaginability is different from physical possibility. I can imagine an algorithm that solves NP-hard problems in polynomial time, in a way that I can’t imagine 2+2 being 5 tomorrow. The only reason I believe otherwise is that I have seen the math (assuming, for the sake of simplicity, that P!=NP). But this math depends on an understanding of the nature of computation, both classical and quantum. Because I know in a mathematically precise way what computation is, I can prove that certain computations are not possible in our universe. But I don’t know in a mathematically precise way what intelligence is. Therefore, I can’t say with confidence how big mindspace is.
It might be that some of the intelligences that I imagined above are impossible in our universe. Maybe emotions are tied to creativity in some necessary way. Maybe the human ability for self-reflection is a fundamental part of man’s higher intellectual abilities. Maybe it is not possible to get >human level intelligence with very simple utility functions (though this seems implausible). Now, I don’t have any arguments for the plausibility of these possibilities. Some people (e.g. Damasio) have linked emotions to human intelligence. There may also be (it seems to me) connections between an ability to understand morality and beauty, and the possession of emotions, feelings, self-reflectiveness, or human-style qualia. But, we don’t understand intelligence, and so we don’t know how it is connected to other facets of human cognition. (Yes, we understand, via Bayesianism and formal learning theory, lots more about inductive reasoning than we did before. But do we really have much idea how a great mathematician like Euler or Poincare devised whole new sub-branches of mathematics, how Newton built the whole edifice of his physics, how Descartes came up with his famous arguments, how Darwin came up with the theory of evolution? Bayesianism is a great model of confirmation theory, but I don’t see how it sheds much light on how humans do their most creative thinking. It would be great if intelligence was describable in quite a simple formalism (like Bayesian probability theory). But there isn’t much reason, methinks, to think it will be.)
This brings me to suggest a possible defense of a very weak form of justified anthropomorphism. In the case of computational complexity, an early 20th Century mathematician would have done best to anthropomorphize a little in making predictions about the space of all algorithms implementable in our universe. The anthropomorphism argument would go as follows:
1. There are simple algorithms such that if we could implement them, then lots of math (which the smartest humans found very hard) would become utterly trivial. For example, Euclidean geometry is decidable, as shown by Tarski in the 1930s. In other words, there is a Turing machine that will tell you, for an arbitrary statement of Euclidean geometry, whether or not it is true (in a finite number of steps). See http://www.ams.org/notices/199607/marker.pdf
2. The laws of nature are incredibly restrictive. They impose completely strict limits on what is possible in our universe. For instance, the speed of light, the conservation of mass and energy, etc. So we should expect some limits of what is computable in our universe, since computation depends on physics (or maybe vice versa, but let’s not go there).
3. Human minds are in an abstract sense akin to Turing machines.
4. The things that are hard for humans will tend to be the things that are hard for any Turing machines in our universe. (Restrictions on humans will correlate with restrictions on all Turing machines).
5. Math is very hard for humans.
6. Therefore, there will be restrictions on the ability of all Turing machines to do math. (i.e. Inferences can be made about all Turing machines on the basis of anthropomorphism).
This argument is very rough, and in any case, it only would justify very mild inferences about mindspace from the evidence of human minds. (One might augment the argument with some evolutionary reasoning: If very simple algorithms for things like Geometry or Bayesian updating were implementable in our universe, then why wouldn’t evolution have it upon them at some point? The fact that nature hasn’t hit upon them (despite those algorithms being pretty useful) suggests that they don’t exist. Though maybe evolution has design constraints I don’t know about that would rule this out. Any evolutionists out there?) However, further research in Cog Sci might give us a better idea of how tightly certain human abilities are tied together (e.g. creativity and emotion, math ability and language ability, etc.)
These sorts of considerations may make an FAI a very different proposition to the way Yudkowsky discusses it. If you can only make AIs that are not miles away from humans then (a) Friendliness seems much more unlikely a possibility, and (b) if it is possible then it might be much harder than it already looks (and boy it looks hard), e.g. requiring a deep understanding of how to get stability under self-improvement and a guarantee of commitment to Friendliness from a messier cognitive architecture than a simple classical utility maximizer.
(Of course, all sorts of constraints are put on mindspace by physical laws (e.g. computational complexity), but there well be further constraints on minds that go beyond what is in our current science. I invoke computational complexity only as a loose analogy. The constraints that complexity puts on minds are not very significant. We might not be able to build an AI that does Euclidean geometry proofs quickly and completely algorithmically, but we could run Euclid’s mind billions of times faster and give him access to all of modern mathematics. That would result in pretty quick theorems. And that is very conservative (i.e. the top human math ability can almost certainly be improve upon, not just sped up.))
(I should note that my argument for making inferences about mindspace based on evidence of humans does not really exculpate most anthropomorphizers. My argument depends on the fact that the laws of nature have all sorts of restrictive limits that make big imagined possibility spaces turn out pretty small. It would be very surprising if the limits were so restrictive that you couldn’t have a human+ AI which didn’t want to fight for its civil rights a la women and minorities, and trade with other humans, and battle humans in support of its fellow AIs. And in any case, people still anthropomorphize even if you say “OK, assume for the sake of argument that there are no restrictive limits on mindspace and that AIs without emotions and feelings and drives are possible.”)
(Great blog, BTW.)
April 1st, 2007 at 8:11 am
Great comment, thanks. I understand the gist of your argument. We can use evolutionary psychology and cognitive science to explain why most human psychological features are byproducts of our evolutionary history. Even tiny steps forward in AI, like Deep Blue, show us that excellence in some niche can be achieved without all the anthropomorphic baggage we always assumed was necessary. The systematic tendency of humans to assume that any mind requires humanlike characteristics in order to do X, when there is little specific reason for it to be so, is a strong argument against the whole idea.
Even though actual Bayesian updating is unimplementable on physical hardware, evolution barely even tried. There are numerous reasons why Bayesian minds are neither possible nor desirable from the standpoint of evolution. The heuristics and biases literature goes into the reasons in detail.
There is little evidence that creativity and emotion are tied together. There is neurological evidence that we get a ping of happy-juice whenever we come up with a good idea, but this happens after the idea is already generated. Emotions as we know them are large and complex modules with quintillions of conceivable variations on the theme. Most variations would be considered “not emotions” or “pseudo-emotions” to most people. Most, if not all, of the features of known emotions can be explained in terms of their evolutionary utility in some particular niche.
To justify a very weak form of anthropomorphism, we have to show why some mind configuration we can imagine is not physically implementable. The laws of physics are quite permissible with configurations of small-ish, low-temperature objects as long as basic chemical parameters are met. (I.e., the configuration is chemically stable.)
What are the basic requirements for a mind? Memory, senses, ability to predict the future and plan, inductive selectivity, and the like. They are very basic. Within these requirements, we can imagine a huge number of variations that are consistent with the laws of physics. When it comes to pushing the ultimate limits - condensing a GW power plant in a cubic cm, or traveling extremely fast - physics comes into play. But when it has to do with specific configurations on low levels, it seems like chemistry is the main rule-setter.
A mind is a physical system. With a big enough computer you could model it using molecular dynamics. When considering the size of the space of minds, we can use the mechanistic language of atoms and molecules to dispel common mystical notions about intelligence. Ask not “is this mind possible?”, but, “is this atomic configuration allowed by chemistry?” If the atomic configuration is isomorphic to a given mindstate, then that mind is indeed possible.
I understand that without a more detailed theory of minds in general, we cannot set specific limits. However, cognitive science and evolutionary psychology, as well as experimental evidence from AI, tell us a tremendous amount.
April 1st, 2007 at 2:17 pm
Thanks for your response.
“Even tiny steps forward in AI, like Deep Blue, show us that excellence in some niche can be achieved without all the anthropomorphic baggage we always assumed was necessary.”
I’d say that Deep Fritz is an even better example. It is better than any human and runs on a few tanked-up PCs, rather than a supercomputer. It relies less on brute-force search and more on strategy, and yet again is totally non-anthropomorphic. This seems more relevant, since some reasoning problems may never (or not until we have an AI making our computers) be susceptible to brute-force search, and so we will need to implement the human-style ability of exploiting regularities in search spaces. For instance, a mathematician looking for a proof will usually only consider a tiny proportion of the utterly enormous space of possible proofs of a given statement. His experience of doing similar problems (and his intelligence) is what allows him to discount so many possible proofs without even considering them for a second.
“The systematic tendency of humans to assume that any mind requires humanlike characteristics in order to do X, when there is little specific reason for it to be so, is a strong argument against the whole idea.”
As I said in my original comment, I am not making the standard crude mistake of anthropomorphization. I am merely suggesting that mindspace may be more limited than you think, in ways that are probably irrelevant to the possibility of creating intelligences totally different from humans, but that may be relevant to the possibility of building FAI. The fact that humans tend to anthropomorphize does not touch my reasoning at all. My reasoning was based on the assumption that the space of “minds possible in our universe will” will turn out to be significantly smaller than the space of “minds that we can imagine”. This assumption is based on the pattern of physics and computation theory, where nature imposes significant limits on what is possible in our universe that conflict with human pre-theoretical intuitions. That is, humans assumed that all sorts of things would be computable that turned out not to be. The human imagination (of some of the best mathematicians and logicians of the early 20th Century) did not rule out the possibility of certain algorithms being implementable in our universe. But then once we knew what computation was and started doing proofs of computational complexity, we found (surprisingly, at least to me) that many of the possibilities we imagined were actually impossible in our universe. Similarly, we may find out that many of the minds that we now imagine to be possible are not in fact possible in our universe.
“We can use evolutionary psychology and cognitive science to explain why most human psychological features are byproducts of our evolutionary history”
Of course, EP and cog sci can do this to some extent. Human sexual desire seems like something which very likely coincides with human general intelligence for completely contingent reasons. There are many similar examples. However, I don’t think EP and cog sci are sufficiently developed to be able to tell us about the more general features of human psychology. For instance, suppose you give some evolutionary explanation for human self-consciousness (or ability of self-reflection, or whatever), something that other primates generally lack (though chimps are better than most). Then the evolutionary explanation is not itself sufficient to show that self-consciousness is only contingently tied to general intelligence. EP would just tell us that self-consciousness was evolutionary advantageous to proto-humans, but it doesn’t tell us whether self-consciousness is necessary for general intelligence or merely compatible with it.
Consider the analogy with flight. Suppose you don’t understand flight at all (i.e. your physics is very limited) but you know some evolutionary biology. Suppose also, for the sake of argument, that you live somewhere that is cold enough for there to be no flying insects around. Thus your only example of flight comes from birds. Now, you might wonder (incorrectly as it turns out) whether feathers are necessary for flight. In considering this possibility, you might take into account evolutionary evidence that feathers could have been evolutionary advantageous independent of their contribution to flight. But this evolutionary evidence alone would not confirm that flight and feathers only co-occur contingently in the case of the bird. To confirm this fact you would want one of the following. Either:
(a) You gain a very sharp theoretical grasp of the physics of flight. From this theoretical grasp, you become (justifiably) very confident that feathers are not necessary for flight.
(b) You have an empirical demonstration of the something that flies but lacks feathers. For example, you see an insect or a pterodactyl or a helicopter.
(In this example, inferring too much from the case of the bird about flight would be a mistake. But it wouldn’t be that hard to cook up an example where two properties of the animal turn out to be necessarily linked because of physics.)
Going back to the case of minds, I would say that cognitive science plays the analogous role to physics. Thus, as with (a) above, we might get good theoretical reasons from cognitive science to think that general intelligence is independent of human psychological trait X. Likewise, we might show by example (as in (b)) that a certain human trait is unnecessary for some aspect of intelligence, by (e.g.) building an AI that plays great chess but shares no other non-trivial trait with humans. Sometimes, the theoretical reasons will come before the concrete examples. In the case of chess, lots of people had good ideas about making chess algorithms before Deep Blue actually beat Kasparov. So we had good reason to believe that chess didn’t depend on a wide range of human traits before we had a concrete example of a human-beating, non-anthropomorphic chess program. Thus, if someone put forward a convincing model of general intelligence, and that model showed the independent of general intelligence from all sorts of human psychological traits (emotions, self-consciousness, multiple antagonistic goals and desires), then I would accept this as strong evidence of a big mindspace. But we currently lack such a convincing model of human intelligence. (We’re not totally clueless, but cog sci is not a mature science.) So all we can say is things like: “Well, it seems (intuitively) that general intelligence is independent of emotions and other human traits, and we can give evolutionary explanations for why we’d have emotions even if they are independent of general intelligence.” But saying this is not a strong justification for the assumptions that you make about mindspace. I agree with most of what you say about mindspace, but I think that making some caveats about what you know about mindspace is important when talking FAI. It is misleading to speak as if you have a detailed, well-justified knowledge of mindspace, since you lack both strong theoretical reasons and concrete examples that demonstrate the independence of general intelligence from other human traits.
“A mind is a physical system. With a big enough computer you could model it using molecular dynamics. When considering the size of the space of minds, we can use the mechanistic language of atoms and molecules to dispel common mystical notions about intelligence. Ask not “is this mind possible?”, but, “is this atomic configuration allowed by chemistry?” If the atomic configuration is isomorphic to a given mindstate, then that mind is indeed possible.”
Couldn’t you say similar things about computation? For example: A computer is a physical system. ‘The laws of physics are quite permissible with configurations of small-ish, low-temperature objects as long as basic chemical parameters are met.’ So we should be able (with enough expertise) to arrange atoms in such a way as to give us computers that can implement any possible algorithm. But this turns out to be false.
Of course, the number of possible minds depends on the number of possible atomic configurations. But not many of those configurations are minds, and there might be significant restrictions on the space of configurations that are minds. Moreover, most of those minds lack general intelligence. Dogs have minds, but they aren’t what we’re after. As I’ve said above, these restrictions on mindspace may still be very permissive. We have some notion of morality, and we are a sample configuration. This suggests that other configurations will be able to understand morality even better than us. But unless we know what general intelligence is (formally) then it is hard to say how many configurations of atoms give rise to it.
April 1st, 2007 at 8:10 pm
I want a robot to entertain me. I actually bought a ‘toy’ robot the other day, the robo-reptile. It is quite stupid (yes micheal, dummer than me with my average IQ), but still fun. It takes commands from a remote, and has an autonomous mode, but is limited in actions and movement. I expect consumer robotics will advance quite quickly, especially now that the ethics code has been initiated, and robo-rights are being established. I understand your comments on robots being able to implement safegaurds we would have thought of, but there should definitely be a human over ride to anything the AI decides. I love this site, it is so interesting to read of things emerging from the ‘transhumanist’ movement. One question, though. With stem cell enhancement technology extending life expectancy, how far away is senescent expansion, because I want to live to be 180, but remain cognitive and able minded.
April 2nd, 2007 at 3:16 pm
I think it will be quite some time before we have to worry about robo-ethics or robo-rights. At present even the most advanced robotics technology is at a very primitive stage compared to human abilities. When watching demonstrations of humanoid robots on TV or the internet, such as Japanese robots recently shown pouring coffee or washing cups, you should always be cautious about taking these developments purely at face value. Often these are merely very carefully choreographed displays, where the robot itself has no real awareness of its surroundings, and is not “thinking” in any meaningful sense which you or I would recognise.
Robotics technology is certainly advancing, and I think we will see some very interesting developments over the course of the next decade, but that these will fall short of human-like (or super-human) cognitive abilities.
April 3rd, 2007 at 6:48 am
Bob
What you say is true, but you must also remember that those ‘interesting developments’ over the next decade could turn into human or near human intelligence equivalence in another decade or two beyond that…not a lot of time to prepare for this type of intelligence. So we better get it right the first time, we may not get another chance.
April 4th, 2007 at 4:10 pm
Raphael,
I understand what you are saying about the space of possible minds being smaller than the space of minds we can imagine. It does make sense. All of your arguments are persuasive.
I can imagine a diversity among minds much greater than the current biodiversity of animals with complex nervous systems, for instance. Without a theory, I can’t say exactly which types of minds would be possible, and which not, but one thing I can do is put forth guesses and ask what people think of them.
Most of the minds imagined in the theoretical AI possibility space sound quite plausible to me, but I could just be being naive. I’ll write a post on “other kinds of minds” and welcome your input on it.
April 13th, 2007 at 3:06 am
[…] Philosophy of mind, cognitive science, and computational complexity theory may help too. On a past post that went over some basics of the nascent field of Friendly AI - building positive AIs - a […]
November 3rd, 2007 at 3:14 pm
Bbyu cialis online….
Bbyu cialis online….
December 18th, 2007 at 9:34 pm
Videos Honemade…
Birmingham, tickling when? Rendered desperate moreno honemade videos exact Tiffany Taylor savage tylo. …
April 3rd, 2008 at 7:57 pm
Chinas House Party…
Chinas House Party…