Response to “What is friendly?” Sunday, Oct 8 2006
friendly ai 10:25 am
Over at the Streeb-Greebling diaries, Bob Mottram watched the Google video on the Risks of AGI panel and writes,
In this video a panel of luminaries discuss the future risks which advanced forms of AI might pose. Much hinges upon the idea of “friendliness”, and trying to ensure that decisions made by powerful intelligences will always be somewhat in tune with human desires. The elephant in the room here though is that there really is no good definition for what qualifies as “friendly”. What’s a good decision for me, might not be a good decision for someone else. When humans make decisions they’re almost never following Asimov’s zeroth law.
Asimov’s zeroth law is “A robot may not injure humanity, or, through inaction, allow humanity to come to harm.”
It’s not really “an elephant in the room”. There is a common definition for “friendly”, and it is accepted by many in the field:
“A “Friendly AI” is an AI that takes actions that are, on the whole, beneficial to humans and humanity; benevolent rather than malevolent; nice rather than hostile.”
Not too difficult. Then comes the objection, “there can be no such thing”. Well, then you’d want to build an AI that is as close to that as possible.
So should future AIs always be engineered to follow the zeroth law?
Yes… not really as a “law”, but as an innate part of its motivations.
If an AI could override all human political decision making and impose an equitable world food distribution network I think that would be a very positive development. But would national leaders be willing to have their own self-interested agendas overridden by automation? I suspect they would be unhappy about that, owing to the inherently tribal nature of human psychology.
If you can’t please everyone all the time, then try to please as many people as possible most of the time. Again, there’s no dilemma here. I do suspect that a superintelligence with advanced nanotechnology would be able to go a long way towards appealing the national leaders even if their people are fed.
One fallacy in my opinion is that it will be possible to control and predict the decision making quality of very complex AGIs. It’s already hard for us to predict how existing, relatively simple, computer programs will operate under all possible conditions.
Well, a complex AGI will be built by a simpler AGI that human programmers write. Of course we cannot predict anything with 100% accuracy, but I do think that we can build an AGI such that we can place more confidence in it crossing the line to the superintelligent regime than we would in any particular human or combination of humans.
An AGI with a supergoal of maximizing the number of black objects in the universe will not be convinced to change its goal system through any learning experience, however anti-black it may be.
A problem in the conceptualization of Friendly AI is that some people think that we are aiming for perfection. Not so – we’re just aiming for the best we can do, and something better than the alternatives. We can’t ask for anything more.
Once you introduce general learning capabilities into the equation it soon becomes impossible to say what the system will do in the long run.
Not necessarily. A static human cognitive architecture will always do the same things in the long run – humanlike things. A humanlike brain has humanlike goal attractors, which are preserved in the abstract regardless of any amount of learning. In the space of all possible goal attractors, the human mind stays within a very constrained area.
It will be possible to write utility functions that remain invariant regardless of new knowledge that is acquired, or remain invariant within certain constraints. New information that changes the particulars of subgoals, but does nothing to change the supergoal. “Learning” implies acquiring knowledge, but does not necessarily imply changing goals.
Another assumption made by Goertzel and others is that there will only be a few powerful AGIs in the world.
The assumption is that there will be a hard takeoff whereby the first AGI to engage in recursive self-improvement is basically leagues ahead of any other AGI. Given that silicon transistors have switching speeds millions of times greater than biological neurons, this is indeed plausible.
Goertzel and others are not saying that there will only be a few powerful AGIs. Just that there will be a first mover, and that the existence of all future AGIs will be contingent upon the first AGI accepting their existence. The future space of all created AGIs will be limited by lines drawn by the first AGI, or human wishes channeled through that AGI.
Then there’s also the Gates scenario, where there will be a super-powerful AGI on every desktop and in every home. In this situation anybody will be able to have AGIs do whatever they wish, with no guarantees on friendliness.
The first AGI that reaches superintelligence is likely to become so powerful as to qualify for near-omnipotence. It would be trivial to prevent the creation of AGIs antithetical to its goals.
Further material on Friendly AI:
What is Friendly AI?
Creating Friendly AI
Anyone interested in the field of Friendly AI should read that last one from start to finish. Also: note that ‘friendly’, the English word, is not the same thing as “Friendly”, which is extremely complex and subtle.




“A static human cognitive architecture will always do the same things in the long run – humanlike things.”
Well, no. There are and will be many humans which we think of as crazy, but in their own mind they look at their actions and for them they seem perfectly logical. Hitler probably thought that he was doing a favor for his country in getting rid of all the “trash dna”. When we can’t even decide what is friendly, how can A.I? Should they stop abortion becose it’s “killing”? These are debates in which there is no global agreement. Destroying all the guns on earth might seem logical for A.I to do, but should we allow it to do that?
It’s true that human-like brains are constrained within their goal attractors, but these constrains aren’t especially narrow. The range of human goal systems is fairly wide, from Mother Teresa to Adolf Hitler and everything in between. Admittedly these are the statistical outliers and most people fit within the hump of a normal distribution of dispositions.
It’s possible that some AIs may be constructed with inflexible supergoals or fixed utility programs. However, these machines will probably not be able to graduate to AGI status, and will remain as sophisticated but narrowly circumscribed optimisation systems. Such systems will be like the guy at cocktail parties whose conversation always gravitates towards some monotonous pet subject, no matter what the initial topic may have been. For a true AGI, capable of self modification, it’s hard to see how such fixed algorithms would remain unchanged for very long.
I agree with Goertzel and others that there will be a first mover advantage for whoever manages to come up with a workable AGI design. However, I’m not sure that the takeoff will necessarily be hard. The first AGIs created will almost certainly be of sub-human level intelligence. It may take some time to bring these up to speed, and there may be technical difficulties involved in supplying them with knowledge or having them directly experience the world (the notorious grounding problem). If the initial advantage is significant but still not of global proportions it’s likely that we’ll see various AGI designs produced by different research groups around the world for varying reasons and being put to various uses. I think this more heterogenous “Gates scenario” is more likely in the short term than a single all-powerful AGI presiding omnipotently like Deep Thought over world affairs, if only for the reason that human tribalism will tend to work against such a situation.
Tribalism is bound to play a role in the development of any future superintelligences. An AGI constructed by well-intentioned first movers to obey the zeroth law may generate problems for the human population which were unforseen at design time. Does the good of the many always outweight the interests of the few? Should production of a new vaccine cease to divert resources to poverty relief? Even if there were an all-powerful AGI calling the shots, it’s unlikely that all human leaders would agree with its “friendly” decisions.
Imagine you’re in a field with a long-range sniper rifle, one kilometer from a fence. On the fence, every meter, is a beer can. Over your eyes is a blindfold with a very complex knot. You have one bullet. If you hit a beer can on your first shot, the human race is put in a future determined by which beer can you hit and where exactly on the can the bullet pierces. If you miss, we all die. Worrying about what exactly we want an FAI to do before figuring out how to make an AI that will do what we tell it is like debating which can is better to hit before untying the knot on the blindfold.
Even if we decided that CEV or some competing proposal is /the/ thing that we want an FAI to do, we don’t have an AI that can do it. There are a lot of optimization targets for a superintelligence that result in the immediate termination of the human race. There needn’t be specific mention of the human race–it just goes poof by default. That’s half the problem. The other half is we have no way of aiming the superintelligence at any particular target. That’s the other half. Put them together and we die.
I think in communicating FAI, Singluaritarians tend to stress how hard it is to choose the right goal, but the more fundamental problem gets ignored. The examples of the solar system being turned into paperclips or nanoscale smiley faces unfortunately imply that the problem was just the poor specification of the target: The problem isn’t that the AI didn’t maximize paperclips, it’s that it didn’t understand the tacit assumption that not overwriting the galaxy takes precedence. A perhaps better example: If we were to create an AI that responded positively to smiling faces and saught to make people smile, then, upon reaching superintelligence, it would rearrange every atom in the solar system into Alka-Seltzer (except, obviously, those atoms currently composing introductory economics textbooks) and spend the next several million years arranging the Milky Way into a scale model of Texas State Highway 82.
Michael: Why do you point to CFAI rather than AI and global risks? Is the former a better introduction?
randpost: The default is for the AI to do nothing. Only where it can find some sort of coherence in human wishes need it do anything e.g. protect against rogue AIs. Defining “coherence in human wishes” is a fairly complex technical problem, but is orthogonal to contemporary moral disputes. It’s a different kind of problem.
Riley: You’re quite right that you need to solve both problems. This is explicitly mentioned in the original CEV proposal (http://www.singinst.org/friendly/extrapolated-volition.html , see the list of 3 life-or-death problems).
What is Friendly?
Bob Mottram argues in a post on the Streeb-Greebling Diaries that Friendly AI suffers from a critcal flaw. The definition of Friendly is lacking:
The elephant in the room here though is that there really is no good definition for what qualifies as …
Nick: I prefer the way that anthropomorphism is dispelled in CFAI, in the section, “Beyond anthropomorphism”. CFAI talks about “conservative for AI” while “AGI and Global Risk” doesn’t. CFAI presents the important Sysop idea, while AGIGR doesn’t. CFAI has “the story of a blob”, AGIGR doesn’t. CFAI talks explicitly about observer-biased beliefs and observer-centric goal systems, whereas AGIGR does not. CFAI is more technical, proposing design characteristics for Friendly AI, while AGIGR does not. CFAI covers a range of design challenges barely even broached in AGIGR. CFAI is more theoretically rich, conceptualizing humans as three layers of functional complexity, and discussing strategies we might use to ground a normatively altruistic AI in that complexity. CFAI describes normatively altruistic Friendly AI, and people approaching Friendly AI for the first time should be able to answer the question, “Do you agree or disagree with the following statement? “Creating Friendly AI as a normative altruist is a good idea.” Explain why or why not”. CFAI is longer, framing the problem in more detail. Anyone who wants to be qualified to talk about Friendly AI will need to read CFAI sooner or later anyway, and they might as well read it first and AGIGR afterwards, rather than just reading AGIGR and saying, “hey, I’m done, I can go argue my point of view now”.
The existing literature on Friendly AI is not large. It’s something like 500 pages, and can be read in a week or two by anyone who thinks the problem is important enough to spend some time on. Might as well start with the oldest, largest piece of work first, then work your way down to the newer, shorter stuff.