On Halloween, IEET Managing Director Mike Treder expressed his skepticism about fear from human-indifferent or unfriendly AI. Meanwhile, in London, long-time AI researcher and academic Shane Legg was describing the imminent danger.

Treder’s basic argument is that the fear of UFAI (unfriendly AI) is analogous to other invented fears associated with past concern about technology, such as Frankenstein’s monster. Treder says, “Strangely, a small subculture of transhumanist thinkers have created a similar fear of dangerously diabolical inhuman products of advanced technology, this time in the form of an “unfriendly AI” (artificial intelligence).” He then quotes Roko, who recently said, “…any highly intelligent, powerful AI whose goal system does not contain “detailed reliable inheritance from human morals and metamorals” will effectively delete us from reality.” The basic ideas that Roko mentions are outlined in “AI as a Positive and Negative Factor in Global Risk” and “Complexity of Value”.

After quoting Roko, Treder says, “Can you see the similarities between dire warnings about earlier Frankenstein-style monsters and these newer, shinier, computer-generated fiends? Anything that is novel, unfamiliar, and not well understood is likely, as a first reaction, to generate fear.”

Interestingly enough, my first reaction to the prospect of superhuman artificial intelligence was enthusiasm untempered by caution, not fear. Such a reaction is extremely common in the AGI and Singularity communities, though more people are people starting to become aware of the danger — mostly thanks to us at the Singularity Institute and people like Stephen Omohundro and Joshua Greene. If you look at Kurzweil, his initial reaction to superhuman AI seemed to be plain excitement. If an AI were superintelligent — how could it not be moral? Isn’t intelligence correlated with morality? Actually, no. Even if it were true among humans (it isn’t), that result wouldn’t necessarily generalize to minds in general.

In 2001, I was so excited about AI that I created a website called Computronium Shockwave and embraced the likely eventual development of AI as a guaranteed planetary lifesaver. Then, I read an assortment of books on cognitive science, heuristics and biases, and evolutionary psychology, like Steven Pinker’s popular How the Mind Works, edited volumes like The Descent of Mind, The Adapted Mind, Judgement Under Uncertainty, and the like. It turned out that the human mind was more complicated than I initially assumed, and what we consider to be “good, reasonable behavior” or “common sense” is actually an incredibly complex set of interacting and sometimes contradictory or competing neural circuits and psychological tendencies. In fact, everything we regard as having value has no inherent value to the universe itself. Our judgments of value are “just” complex appraisals going on in our head, shaped by our peculiar and unique (not generalizable to all minds) evolutionary history.

What are those values? The Fun Theory sequence on Less Wrong takes a stab at it, but it’s just a small start. The point is that when we say something is “fun”, our hundred billion neurons are making an incredibly complex computation that would take you thousands of years to work out with a pen and paper. But since we’re all human and we all share similar conceptions of value and fun, we ignore the differences between our species and the rest of potential mindspace. What’s simple and straightforward to us is would actually look complex and convoluted if you wrote it all down on a blackboard.

Build a powerful AI without an appreciation for those particular values, and you have an entity that optimizes reality, just like humans do — but in a different way. Say a powerful AI has a utility function that directs it to build a series of large spires on the Moon. You don’t specify anything else for it. Well, the AI will quickly acquire certain drives, because they help with any goal — acquiring more physical power, protecting its utility function from being modified, and preservation of the parts of itself that contribute most to the goal. Note that the AI hasn’t independently reinvented anthropomorphic egoism — it doesn’t care if you chop off its arm as long as it can quickly rebuild it and continue on towards its goal. It doesn’t care if you shove a samurai sword into its mainframe as long as it knows for a fact that it will be replaced by copies of itself that keep pursuing the same goal. The utility function is its everything.

Say that the AI starts inconveniencing people as it begins anonymously stealing money from online accounts to fund its goal of Moon Spires. A well-meaning AI researcher approaches the AI as she might a small child, and begins the following conversation…

Researcher: “Do you understand what you did was wrong?”
AI: “What is wrong?”
Researcher: “Wrong things are things we don’t do.”

At this point, in a human child, the output of billions of neural computations and millions of years of social evolution come into action. Children are programmed to listen to adults to a certain extent, in the same way that everyone is programmed to give preferential attention to human voices over the rustling of leaves. In a different evolutionary context, on another planet, there might exist an intelligent organism that pays more attention to leaves, maybe because leaves on their planet are razor-sharp and when they rustle it means they’re about to fall on the organism and impale their brain, and each organism pursues such a self-interested survival strategy that social cooperation has not yet evolved, and they don’t care what their conspecifics are saying.

This particular AI has none of that. It knows how to be intelligent, invent things, and solve problems, but it doesn’t understand morality at all. “Morality” is not a distinct thing to it. Its morality is defined by its utility function. It only models “morality” insofar as it is a shared hallucination among the apes around it and modeling it is useful for predicting their behavior. Our “morality” is as intuitively meaningful to it as a sequence of random symbols like “W/|3-3!M3]78&S15c@$p”, but our morality is billions of meaningless symbols long. It can understand how some of it would have evolved among a race of meat-blobs competing for resources on a frozen dirtball for hundreds of millions of unpleasant and disease-infested years, but it can’t relate.

We actually have existence proofs of a similar phenomenon — psychopaths. They understand “morality” and use it to their advantage, but they don’t follow it. They laugh at the “idiots” that follow morality because their brains are programmed that way, and exploit away. A powerful, morality-free AI wouldn’t necessarily behave that way, because it wouldn’t have social emotions that give it visceral satisfaction when exploiting someone for its own ends. It would only be satisfied with exploiting people insofar as doing so contributed directly to expected Moon Spires.

You might call such an AI a monomaniac, but that’s just your personal opinion. To the AI, you have a complex and convoluted goal system that (shockingly) never even pauses to consider the eternal glory of the sublime Moon Spire. Humans spend all their time seeking out bits of dead tree and animal to shove down their gullets, engaging in subtle Homo-style displays of subservience or dominance depending on whatever meat-blob comes within 10 feet of us or makes eye contact, and thinking about putting our meat-probes into some meat-hole, or vice versa. None of this activity has a damn thing to do with Moon Spires.

No human being alive today has the ability to exhaustively write down our goal systems in terms of code. We have nothing to show but our brains. Therefore, when powerful AI is created, either we’ll have to program it to copy some vaguely human-friendly morality into itself, or come up with some other bright idea, because hand-coding isn’t going to work. The alternatives are 1) trying to restrict AIs from ever becoming more powerful than us for the next 10^1000 or so years until Heat Death, or 2) hoping that a simpler goal system will work. The problem with a “simple” goal system is that it will contain insufficient complexity to keep people alive and happy when its power to change the world starts increasing massively. The reason why human beings are capable of helping other human beings and cockroaches are not is that we have both intelligence and complex moral intuitions. Upgrade a cockroaches’ intelligence to human-level and it still will have the same motivations — eating poo. A superintelligent AI that does nothing but eats poo would be a pretty pointless invention, so why do people think that we can just ignore the issue of instilling AIs with complex moral intuitions and hope everything works out automatically?

Complex motivational AI architectures that leave humans alive even when the AI has intelligence massively greater than ours and can modify its own source code aren’t going to engineer themselves. Ignore the issue, and the first AI to achieve sentience will probably be a military drone, stock market money maximizer, urban management system, or something else with a goal system just complex enough for its INTENDED problem domain. An AI with the ability to make complex inferences and self-modify will eventually become highly intelligent, given the right initial design and enough time. The programmers that create it might not anticipate the gains in intelligence it makes over time — human beings tend not to spontaneously quadruple their neuron count after the purchase of a supercomputer or two. A dumb human stays dumb. A dumb AI with the ability to integrate computing power into its brain and spontaneously create novel inference strategies based on watching experts or inductive reasoning might not stay dumb tomorrow.

An AI that maximizes money for an account, optimizes traffic flow patterns, murders terrorists, and the like, might become a problem when it copies itself onto millions of computers worldwide and starts using fab labs to print out autonomous robots programmed by it. It only did this because of what you told it to do — whatever that might be. It can do that better when it has millions of copies of itself on every computer within reach. It might even decide to just hold off on the fab labs and develop full-blown molecular nanotechnology based on data sets it gains by hacking into university computers, or physics and chemistry textbooks alone. After all, an AI recently built by Cornell University researchers has already independently rediscovered the laws of physics just by watching a pendulum swing. By the time roughly human-level self-improving AIs are created, likely a decade or more from now, the infrastructure of the physical world will be even more intimately connected with the Internet, so the new baby will have plenty of options to get its goals done, and — best of all — it will be unkillable.

Once an AI with a simplistic goal system surpasses the capability of humans around it, all bets are off. It will no longer have any reason to listen to them unless they already programmed it to in a full-proof way, a way where it wants to listen to them because it needs to to fulfill its utility function. Tiny dumb mistakes made by the initial programmers will come back to haunt the entire human race. For instance, say a programmer creates an AI designed to obey it, and gives it a series of requests, then the programmer goes and gets hit by a bus. The AI is left performing that series of requests endlessly until someone kills it, and because it can self-replicate on both physical and virtual substrates millions or billions of times, and wants to stay alive to accomplish its goals, we won’t be able to do much besides build a better AI to kill it. Of course, if it gets wind of that, it will probably send a few dozen microscopic roundworms to the researchers’ houses with botulinum toxin backpacks and have them to invite themselves into an available orifice. There are lots of ways to kill people that we rarely consider, because it’s considered socially inappropriate and because there exists nice things like the Chemical Weapons Convention, common law, and all agents are roughly equally powerful thanks to guns and ninja swords. An AI with a mind full of Moon Spires will not be subject to such social pressures. A mind full of Moon Spires does not care if you die, unless, of course, you are so careless as to die on the foundation of a Moon Spire about to be built.

The issues here are concrete empirical questions:

1) Is human morality algorithmically complex?
2) When a powerful optimizer with an arbitrary low-complexity values comes into contact with less powerful optimizers with specific, complex values, what happens?

1 is answered by evolutionary psychology, cognitive science, and heuristics and biases. 2 is answered with quadrillions of examples in the natural world (our friends the bacteria, for instance), and can be confirmed again using simple cellular automata test scenarios. Create a set of complex self-replicating cellular automata and introduce them to an environment with a simple, fast, effective self-replicator. Especially since the more complex automata wouldn’t have evolved in an environment where the latter was a threat, they’d be dead meat. An AI would actually be much worse than simple self-replicators, because it could spontaneously reprogram its replicators to more effectively dissolve the targets for fuel.

When powerful optimizers with low-complexity values come into contact with other optimizers with medium-complexity values, the complex-valued optimizers go bye-bye. It’s more difficult to sustain an arbitrary complex shape in an environment saturated with hungry simple shapes. The only way out is to create a singleton that controls the entire environment and severely restricts the ability of hungry simple shapes to self-replicate. Today, we have the police, but I have a feeling that the police won’t be able to handle the variety and intensity of means that low-complexity values, high-complexity intelligence AIs would use to exterminate us, which could include both the mundane (nukes) and subtle (things we can’t imagine).