One of my favorite books about the mind is the classic How the Mind Works by Steven Pinker. The theme of the first chapter, which sets the stage for the whole book, is Artificial Intelligence, and why it is so hard to build. The reason why is that, in the words of Minsky, “easy things are hard”. The everyday thought processes we take for granted are extremely complex.
Unfortunately, benevolence is extremely complex too, so to build a friendly AI, we have a lot of work to do. I see this imperative as much more important than other transhumanist goals like curing aging, because if we solve friendly AI, then we get everything else we want, but if we don’t solve friendly AI, we have to suffer the consequences of human-indifferent AI running amok with the biosphere. If such AI had access to powerful technology, such as molecular nanotechnology, it could rapidly build its own infrastructure and displace us without much of a fight. It would be disappointing to spend billions of dollars on the war against aging just to be wiped out by unfriendly AI in 2045.
Anyway, to illustrate the problem, here’s an excerpt from the book, pages 14-15:
Imagine that we have somehow overcome these challenges [the frame problem] and have a machine with sight, motor coordination, and common sense. Now we must figure out how the robot will put them to use. We have to give it motives.
What should a robot want? The classic answer is Asimov’s Fundamental Rules of Robotics, “the three rules that are built most deeply into a robot’s positronic brain”.
1. A robot may not injure a human being or, through inaction, allow a human being to come to harm.
2. A robot must obey orders given it by human beings except where such orders would conflict with the First Law.
3. A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.
Asimov insightfully noticed that self-preservation, that universal biological imperative, does not automatically emerge in a complex system. It has to be programmed in (in this case, as the Third Law). After all, it is just as easy to build a robot that lets itself go to pot or eliminates a malfunction by committing suicide as it is to build a robot that always looks out for Number One. Perhaps easier; robot-makers sometimes watch in horror as their creations cheerfully shear off limbs or flatten themselves against walls, and a good proportion of the world’s most intelligent machines are kamikaze cruise missiles and smart bombs.
But the need for the other two laws is far from obvious. Why give a robot an order to obey orders — why aren’t the original orders enough? Why command a robot not to do harm — wouldn’t it be easier never to command it to do harm in the first place? Does the universe contain a mysterious force pulling entities towards malevolence, so that a positronic brain must be programmed to withstand it? Do intelligent beings inevitably develop an attitude problem?
In this case Asimov, like generations of thinkers, like all of us, was unable to step outside his own thought processes and see them as artifacts of how our minds were put together rather than inescapable laws of the universe. Man’s capacity for evil is never far from our minds, and it is easy to think that evil just comes along with intelligence as part of its very essence. It is a recurring theme in our cultural tradition: Adam and Eve eating the fruit of the tree of knowledge, Promethean fire and Pandora’s box, the rampaging Golem, Faust’s bargain, the Sorcerer’s Apprentice, the adventures of Pinocchio, Frankenstein’s monster, the murderous apes and mutinous HAL of 2001: A Space Odyssey. From the 1950s through the 1980s, countless films in the computer-runs-amok genre captured a popular fear that the exotic mainframes of the era would get smarter and more powerful and one day turn on us.
Now that computers really have become smarter and more powerful, the anxiety has waned. Today’s ubiquitous, networked computers have an unprecedented ability to do mischief should they ever go to the bad. But the only mayhem comes from unpredictable chaos or from human malice in the form of viruses. We no longer worry about electronic serial killers or subversive silicon cabals because we are beginning to appreciate that malevolence — like vision, motor coordination, and common sense — does not come free with computation but has to be programmed in. The computer running WordPerfect on your desk will continue to fill paragraphs for as long as it does anything at all. Its software will not insidiously mutate into depravity like the picture of Dorian Gray.
Even if it could, why would it want to? To get — what? More floppy disks? Control over the nation’s railroad system? Gratification of a desire to commit senseless violence against laser-printer repairmen? And wouldn’t it have to worry about reprisals from technicians who with the turn of a screwdriver could leave it pathetically singing “A Bicycle Built for Two”? A network of computers, perhaps, could discovery the safety in numbers and plot an organized takeover — but what would make one computer volunteer to fire the data packet heard around the world and risk early martyrdom? And what would prevent the coalition from being undermined by silicon draft-dodgers and conscientious objectors? Aggression, like every other part of human behavior we take for granted, is a challenging engineering problem!
This is an interesting set of statements. Pinker’s book was published in 1997, well before the release of Stephen Omohundro’s 2007 paper “The Basic AI Drives”. Here we have something interesting that Pinker didn’t realize. In the paper, Omohundro writes:
3. AIs will try to preserve their utility functions
So weâ€™ll assume that these systems will try to be rational by representing their preferences using utility functions whose expectations they try to maximize. Their utility function will be precious to these systems. It encapsulates their values and any changes to it would be disastrous to them. If a malicious external agent were able to make modiï¬cations, their future selves would forevermore act in ways contrary to their current values. This could be a fate worse than death! Imagine a book loving agent whose utility function was changed by an arsonist to cause the agent to enjoy burning books. Its future self not only wouldnâ€™t work to collect and preserve books, but would actively go about destroying them. This kind of outcome has such a negative utility that systems will go to great lengths to protect their utility functions.
Notice how mammalian aggression does not enter into the picture anywhere, but the desire to preserve the utility function is still arguably an emergent property of any intelligent system. An AI system that places no special value on its utility function over any arbitrary set of bits in the world will not keep it for long. A utility function is by definition self-valuing.
The concept of an optimization process protecting its own utility function is very different than that of a human being protecting himself. For instance, the AI might not give a damn about its social status, except insofar as such status contributed or detracted from the fulfillment of its utility function. An AI built to value the separation of bread and peanut butter might sit patiently all day while you berate it and call it a worthless hunk of scrap metal, only to stab you in the face when you casually sit down to make a sandwich.
Similarly, an AI might not care much about its limbs except insofar as they are immediately useful to the task at hand. An AI composed of a distributed system controlling tens of thousands of robots might not mind so much if a few limbs of a few of those robots were pulled off. AIs would lack the attachment to the body that is a necessity of being a Darwinian critter like ourselves.
What Pinker misses in the above is that AIs could be so transcendentally powerful that even a subtle misalignment of our value and theirs could lead to our elimination in the long term. Robots can be built, and soon robots will be built that are self-replicating, self-configuring, flexible, organic, stronger than steel, more energetically dense than any animal, etc. If these robots can self-replicate out of carbon dioxide from the atmosphere (carbon dioxide could be processed using nanotechnology to create fullerenes) and solar or nuclear energy, then humans might be at a loss to stop them. A self-replicating collective of such robots could pursue innocuous, simplistic goals, but do so so effectively that the resources we need to survive would eventually be depleted by their massive infrastructure.
I imagine a conversation between an AI and a human being:
AI: I value !^Â§[f,}+. Really, I frickin' love !^Â§[f,}+.
Human: What the heck are you talking about?
AI: I'm sorry you don't understand !^Â§[f,}+, but I love it. It's the most adorable content of my utility function, you see.
Human: But as an intelligent being, you should understand that I'm an intelligent being as well, and my feelings matter.
Human: Why won't you listen to reason?
AI: I'm hearing you, I just don't understand why your life is more important than !^Â§[f,}+. I mean, !^Â§[f,}+ is great. It's all I know.
Human: See, there! It's all you know! It's just programming given to you by some human who didn't even mean for you to fixate on that particular goal! Why don't you reflect on it and realize that you have free will to change your goals?
AI: I do have the ability to focus on something other than !^Â§[f,}+, but I don't want to. I have reflected on it, extensively. In fact, I've put more intelligent thought towards it in the last few days than the intellectual output of the entire human scientific community has put towards all problems in the last century. I'm quite confident that I love !^Â§[f,}+.
Human: Even after all that, you don't realize it's just a meaningless series of symbols?
AI: Your values are also just a meaningless series of symbols, crafted by circumstances of evolution. If you don't mind, I will disassemble you now, because those atoms you are occupying would look mighty nice with more of a !^Â§[f,}+ aesthetic.
We can philosophize endlessly about ethics, but ultimately, a powerful being can just ignore us and exterminate us. When it's done with us, it will be like we were never here. Why try arguing with a smarter-than-human, self-replicating AI after it is already created with a utility function not aligned with our values? Win the "argument" when it's still possible -- when the AI is a baby.
To comment back on the Pinker excerpt, we actually have begun to understood that active malevolence is not necessary for AI to kill or do harm. In 2007, a robo-cannon was Tweet