I would strongly prefer to avoid the bad-faith discussion/debate with Mike Treder, Managing Director of the Institute for Ethics and Emerging Technologies (how much longer must we be attacked as if we were a cult that is as blinded to reason as the worst fundamentalists?), but in a recent post he raised legitimate questions that may be of interest to those new to the concept of Friendly AI, so I will address them. After defining the basic concept of the intelligence explosion (recursively self-improving superintelligence), Mike writes:

The rub, of course, is that this brainy new intelligence might not necessarily be inclined to work in favor of and in service to humanity. What if it turns out to be selfish, apathetic, despotic, or even psychotic? Our only hope, according to “friendly AI” enthusiasts, is to program the first artificial general intelligence with built-in goals that constrain it toward ends that we would find desirable. As the Singularity Institute puts it, the aim is “to ensure the AI absorbs our virtues, corrects any inadvertently absorbed faults, and goes on to develop along much the same path as a recursively self-improving human altruist.”

So, what we want is a very very smart friend who will always be trustworthy, loyal, and obedient.

(Could obedience be too much to hope for, though, since the thing will not only be more intelligent but also much more powerful than us? When this question is raised to the friendly singularitarian, the answer given is usually something like, because we’ve seeded the AI with our virtues, we’ll have to trust that whatever it does will be to our benefit—or at least will be the right thing to do—even if we can’t comprehend it. Along the same lines as, God works in mysterious ways, and His ways are not for us to understand.)

There are two possible approaches to dealing with the possibility of advanced artificial general intelligence (AGI), which I believe could become a reality within a few decades or less:

1. Ignore it and let AGI happen on its own. Let the chips fall where they may.

2. Try to do something to ensure that the new intelligence is coupled with a human-friendly goal system.

It seems pretty obvious to me that 2 is the way to go. (If anyone disagrees, by all means say so in the comments.) After that, the next question comes along — how?

Since its founding in 2000, the Singularity Institute for Artificial Intelligence (SIAI) has been devoted to that question, as well as the question of how to reformulate decision theory in such a way that it can be reflective (assign utilities to its own cognitive content without wireheading) and handle ambiguities like Pascal’s mugging. In the last few years, our work has been covered by media outlets like Forbes, The New York Times, and The San Francisco Chronicle, including front page mentions in the last two. Pretty good for a marginal Robot Cult.

In 2001, SIAI researcher Eliezer Yudkowsky published “Creating Friendly AI”, the first book-length stab at the challenge of how to program an AI that you can trust with human-surpassing intelligence and the ability to modify its own programming. This treatise, which is now semi-obsolete, served as the background to the much shorter policy document “SIAI Guidelines on Friendly AI”. It describes a possible approach to the problem we call “Friendly AI”, with specifiable features called “Friendliness”, and proposed several good ideas including:

1. Programmer-independent morality. (The programmers should try to write a goal system that treats all humans equally rather than favoring any specific type of human.)

2. Distinguishing Friendliness content, acquisition, and structure as separate pieces of the problem.

3. Arguing that anthropomorphic political rebellion in AI, upon which most science fiction stories involving runaway AI are based, is absurd.

4. Making a distinction between assumptions “conservative” for futurists (AI won’t be here for a while) and assumptions “conservative” for programming advanced AI goal content (the AI could eventually acquire power quickly after which point it would be impossible or difficult to change its programming).

5. Proposing a cleanly causal goal system, topped by a probabilistic supergoal, as the safest in an AI that can reprogram itself. Alternatives would include associative or spreading-activation goal systems.

6. Describing why an observer-centric morality would not emerge automatically in any goal system, as it has in most (but not all) organisms crafted by evolution and natural selection.

7. Layered mistake protection, which is pretty intuitive.

8. The importance of avoiding adversarial injunctions, which would be based on the assumption that an AI with a goal system programmed from scratch would have an inherent tendency to behave like a Machiavellian human being.

9. The danger of subgoal stomps, where a subgoal of the main supergoal acquires so much utility that it swamps the supergoal altogether. An example would be an AI programmed to “help humans” that infers that humans like pleasure, then decides that the best way to help humans would be to lock them up in cages (where they can’t hurt themselves) with their pleasure centers constantly being stimulated electrically.

10. Many others. (You can see some of them by skimming the table of contents.)

A shorter summary of Friendly AI features is also found here, though again, these writings are 8 years old. Many of Eliezer Yudkowsky’s more recent ideas on Friendly AI theory and epistemological grounding can be found in over a year and a half of extensive blog posts (many of which are 5+ pages in length each), which will soon be compiled into a book. Also, other specialists have joined the dialogue since 2001, including Ben Goertzel, Stephen Omohundro, Richard Loosemore, J. Storrs Hall, and a handful of others. In the past few years, two books have come out about the topic, Beyond AI by J. Storrs Hall and Moral Machines by Wendell Wallach and Colin Allen. These are not obscure books — Moral Machines has been reviewed by the The Times Higher Education, Notre Dame Philosophical Reviews, and Computer Now, the periodical of the IEEE Computer Society.

In my opinion, these investigations are a substantial improvement on what came before, which consisted mainly of statements to the effect that the problem was already solved by Asimov’s laws of robotics, or that we would inevitably be treated well or badly by AI and its initial goal system would have nothing to do with it. (Many other roboethicists have stepped forward since to agree that Asimov’s laws are woefully insufficient and the idea of an inevitably positive or negative outcome is foolish.) The problem is not solved by recommendations proposed thus far, remains unsolved to this day, and ultimately any solutions will have to be verified by computational experimentation. But it is a start. It’s only the future of the human species hanging in the balance, after all.

In the transhumanist and futurist community, there is constant discussion and debate about the Friendly AI concept, a discussion that has recently extended well outside its traditional community and into mainstream AI and roboethics circles. The challenge about this discussion is that only a small minority of the participants have even bothered to read the few documents and books in the field which exist, because the field is so new that in most places there is little social pressure to be informed. This reminds me of the approach taken to political issues, where everyone feels qualified to debate even with a bare minimum of information, and everyone’s opinion is supposed to be equally valued even if the knowledge and analysis behind those opinions varies wildly.

With that background, now I can respond to Mike Treder’s comments. The key distinction lies in the difference between 1) what we would prefer, and 2) what we think is actually likely. Many people would prefer that all sentient agents on the planet continue to be roughly on par with regards to intelligence and power forever. That is the position of bioethicists like Wesley J. Smith, Senior Fellow at the Discovery Institute, and articulated on his blog. He believes that a mono-species society of sentient beings is necessary to avoid societal collapse.

Others, like myself, see no hope in trying to preserve that structure. We see an increasing diversity of intelligent beings as inevitable given improving technology, and claim that vast power and intelligence differentials in the mid and long-term future are unavoidable. Instead of delaying the inevitable, we prefer to increase the chances that such beings are friendly to humans by making the starting point — the pebble that starts the avalanche — as human-friendly as possible. That way, the friendlies get a head start on the unfriendlies.

Those uncomfortable with the notion of fundamentally stronger and smarter beings than present-day humans will just have to nuke every major city and research lab on Earth, because it seems like there are dozens of human enhancement and artificial intelligence research paths and economic incentives that would eventually create such beings, given enough time. The drive of humans to create better, stronger, faster, and smarter artifacts and tools is built into our DNA. If you don’t like it, well, maybe you can go off to live in the woods, or eventually leave the solar system. I believe that people have the right to be left alone, as long as they leave others alone. The universe is a pretty damn huge space, and I think there’s room enough for everyone. There are thousands of locales on the planet you can move to today and only run across other people every week or so, if that. Parts of Norway, Canada, Sweden, and Finland come to mind. I won’t even spit on you as you’re leaving, as so many progressives seem to have the need to do.

The point is not to ensure that smarter-than-human beings are “obedient” to us, merely that they respect the rights of all other sentient entities — not rocket science, really. The problem is that those complex social values have been crafted into us by millions of years of evolution, and although they seem simple to us, they ain’t so simple when you break it down into machine language. If we are going to create human-level AI, we’ll need to confer our values to them, or we’re going to have powerful optimization processes with the moral complexity of ants. Creating an AI with a blank slate goal system and then teaching it “moral lessons” will not be enough — every human child is born with a complex set of social instincts that actually enables them to be taught moral lessons — you actually have to program in the cognitive structure yourself, or at least give the AI unambigious directions to acquire that cognitive content on its own, and not go about pursuing goals in the real world until it has reached a certain level of moral sophistication.

Programming a Friendly AI will also teach us more about ourselves. Our own moralities are rife with inconsistencies and blind spots that are pretty much a given in anything constructed in such a haphazard way as the human brain. Evolution’s task of evolving complex organisms from simpler ones has been likened to upgrading a small boat into a huge yacht while still being able to navigate stormy waters effectively every bit of the way. This is especially difficult with the brain, where single mutations can lead to global system changes which may be more adaptive and expedient but hardly elegant or flexible. Evolution has only a set of brutal and simple requirements — outcompete the other guy, find a mate, have children, and die. Instead of survival of the fittest, it should be termed survival of the fitter. The fact that humans degenerate time and time again into heartless animals when the shit truly hits the fan shows how tenuous our “Have a Nice Day!” society truly is.

Mike Treder talks about the idea of “constraining” future AIs towards goals “we would find desirable”. Let me respond to this in two parts. First, an AI has to be given some goals or it will just sit there. Any type of goal system whatsoever is necessarily a constraint on differential desirability and actions. It cuts down the space of possible actions from every possible action to a narrower set of actions that is actually useful. It can be considered giving the agent the structure to do anything at all. Even an AI just programmed to sit still and look at data must have a goal system. We ourselves have goal systems because they were programmed into us by an interaction of nature and nurture. Since AIs will not be born with complex neurologies, they will have to be programmed somehow.

So, if we must give an AI a goal system to prevent it from standing motionless, rusting up, and blowing away in the wind, then we must decide whether to give it a goal system we see as desirable or one we see as undesirable. The answer seems obvious, but Treder seems to insinuate in his post that programming AIs with goals we find desirable would somehow be a bad thing. The insinuation is that by programming an AI with goals we find desirable we would somehow lock it in to our pedestrian, limited, early 21st century human version of morality. Thankfully, our researchers, in communication with other scholars and researchers around the world, have been aware of this problem and thinking about it for over a decade. In fact, the phrase “open-ended” in connection to AI goal content appears in “Creating Friendly AI” over a dozen times. If Mike Treder had read that document ever before, he might remember that, but I don’t think he has.

The point is that for morality to continue to evolve and improve, it will have to be transferred to our “mind children”, or the whole fragile system will break. Sophisticated moralities do not pop out of the ether overnight. Unfortunately, the dominant moral philosophy of human history, moral realism, and its good friend the blank slate strongly imply otherwise, creating a Betelgeuse-sized headache for those of us whose jobs and passions revolve around breaking open the black box of human morality and trying to take a serious look at its components. The underlying structure of morality does not consist of statements such as “Thou shalt not kill”, or “Thou shalt not steal” — it consists of highly complex and evolved cognitive adaptations which were crafted in the furnace of millions of years of heated evolutionary activity on the plains of Africa. Moral statements are the surface products of a complex and subtle suite of underlying neurological processes, just like karate moves are the surface products of a complex and subtle suite of underlying neurological processes in places like the motor cortex and prefrontal lobe.

More recently, Eliezer Yudkowsky has called this complex set of human drives “Godshatter”, after a term in Vernor Vinge’s Fire Upon the Deep. I’m not sure whether I like that term so much (it sort of invokes the idea of God shitting everywhere), but for now it will have to do. The key idea is that evolution’s monomaniacal goal “survive and reproduce!” eventually got “shattered” into thousands of sub-goals (philosophy, music, entertainment, communication, art, etc.) that derive from cognitive adaptations for increased fitness but contribute to fitness in odd and seemingly indirect ways. Evolutionary psychologists build careers out of picking one of these drives and trying to explain why it is adaptive.

The goal is to create an AI that recognizes and understands that complex set of goals and ensures they are not eliminated in a future where creating a new agent from nothing will be as simple as building a computer and giving it the right programming. The goal is an open-ended goal system that develops in a way at least as good as a recursively self-improving human altruist. By using techniques like wisdom tournaments, which are essentially moral and ethical stress tests, we can hope for a system that actually begins its ascent into superintelligence with substantially better-than-human morality. If you don’t like the transhumanist futurist rhetoric sometimes associated with these ideas idea in online discussions, you can see an entirely academic analysis in Moral Machines. Look for it to pop up elsewhere, and remember, we started the serious dialogue! I was still 17 and attending McAteer High School here in San Francisco when I first recognized the importance of the challenge of Friendly AI.

The ultimate goal would be an AI you trust with increased intelligence more than any available human or combination of humans. Some entity must cross the line into superintelligence eventually, unless there is a global thermonuclear war (or something similarly unpleasant) that blows us all to smithereens. In practice, I think that human-equivalent AI will come before substantially enhanced biological intelligence unless the developed countries substantially loosen their restrictions on testing unproven implants in living human brains (not bloody likely), and from human-equivalent AI will quickly follow superintelligent AI. That is another argument, though — even if there is a slow takeoff, it wouldn’t hurt having as friendly of an AI as possible take the first steps in that direction. Would you trust a moral insect with the power that is general intelligence?

The point is not obedience or blind loyalty, it’s handing the torch of a complex morality from one species to the next. Without some moral goal system content to start off with, human-equivalent AIs will be in the moral wilderness. We will eventually live with a fundamentally greater intelligence above us. I don’t think that this can be stopped. Even today, the leaders of Russia, China, France, Israel, Iran, and many other countries could start a World War that kills us and tens or hundreds of millions of people, if not billions. Power disparity, to some extent, is something we have to live with. The question is not whether or not there will be more powerful beings than ourselves, but “will those beings care about us?” If it turns out that AI inevitably loses its empathy for human beings after successive rounds of self-improvement, then we have no choice but to destroy all of our computers, because someone will eventually discover the underlying principles of intelligence (just like scientists in the past discovered the underlying principles of chemistry, biology, physics, and so on) and implement it on whatever computers are available.

I do not buy into the defeatist stance that a more powerful being will invariably see beings below it as inferior and subject to extermination, because 1) there are billions of vivid examples to the contrary, such as humans that care about weaker animals, and 2) if consistent niceness is neurologically possible, and there are mechanisms that switch between niceness and meanness conditional on certain stimuli, then it shouldn’t be outside the realm of possibility for a being to exist where that switch simply doesn’t exist — where it is unconditionally benevolent. Views to the contrary seem to depend on moral realist views where a God-like force “pushes” a benevolent being to Machiavellian, Darwinian-crafted-organism-distinct malevolent/selfish impulses even if the starting point is totally benevolent. This results from a misunderstanding about where human morality comes from, and adherence to that troublesome moral realism/blank slate bugaboo.

To make the point even more simply (TL;DR), benevolence is not obedience. The world’s most powerful being can refrain from killing you without obeying you. I do not refrain from eating pigs or cows or chickens because they have ordered me to. I do it because my moral structure — part of my core identity — led me to that belief. It could also lead me away from it, if I were convinced that these animals did not have the conscious experience of pain.

Will we be “forced to trust” whatever future benevolent AIs do, just because we’ve given these AIs some moral starting point and they have continued to develop it? No. But giving these AIs some sort of morality is still much better than none, and large power differentials will still exist. This concept seem easy enough for IEET Chair Nick Bostrom to understand. In fact, he has been a leader in pushing for the idea internationally. So why do IEET Managing Director Mike Treder and IEET Executive Director James Hughes seem so consistently confused by it? My only guess is because they have not bothered to engage in the most basic reading. They may genuinely disagree, but if so, I would never know, but I never hear object-level arguments against it, only ad hominem arguments, just like Dale Carrico.

So, in summary, I have explained the singularitarian/friendly AI supporter answer to Mike’s concerns in a blog post of over 3,000 words, which is hardly like saying “just trust us”. Though Mike’s view has already been tarnished because he sees it as his duty as a progressive to attack hypotheses he views as un-democratic, regardless of their evidence, others can look at the problems we (as a species) are facing with an open mind, particularly with regard to the question of how to transfer our values to the second intelligent species ever to exist on this planet. Some questions are too complex to be solved by voting alone — someone has to do the math. If you are a startlingly gifted theoretician or programmer with experience in studying decision theory, machine learning, and advanced mathematics, we might be able to use you. Don’t hesitate to get in touch!