Machine Morality Addressed in New York Times Op-Ed by Colin Allen Wednesday, Jan 4 2012 

From the New York Times Opinionator blog:

A robot walks into a bar and says, “I’ll have a screwdriver.” A bad joke, indeed. But even less funny if the robot says “Give me what’s in your cash register.”

The fictional theme of robots turning against humans is older than the word itself, which first appeared in the title of Karel Čapek’s 1920 play about artificial factory workers rising against their human overlords. Just 22 years later, Isaac Asimov invented the “Three Laws of Robotics” to serve as a hierarchical ethical code for the robots in his stories: first, never harm a human being through action or inaction; second, obey human orders; last, protect oneself. From the first story in which the laws appeared, Asimov explored their inherent contradictions. Great fiction, but unworkable theory.

Friendly AI is mentioned early on in the op-ed. The article makes the case why machine morality is important and why it’s necessary to reconcile philosophical and engineering perspectives to make progress in this field.

Eliezer Yudkowsky at Singularity Summit 2011 Sunday, Dec 11 2011 

Jaan Tallinn at Singularity Summit 2011 Sunday, Dec 11 2011 

Interview with Luke Muehlhauser Thursday, Sep 15 2011 

I interviewed Luke Muehlhauser the other day for the SIAI blog.

Why We Need Friendly AI Sunday, Aug 21 2011 

An article I often point people to is “Why We Need Friendly AI”, an older (2004) article by Eliezer Yudkowsky on the challenge of Friendly AI:

There are certain important things that evolution created. We don’t know that evolution reliably creates these things, but we know that it happened at least once. A sense of fun, the love of beauty, taking joy in helping others, the ability to be swayed by moral argument, the wish to be better people. Call these things humaneness, the parts of ourselves that we treasure – our ideals, our inclinations to alleviate suffering. If human is what we are, then humane is what we wish we were. Tribalism and hatred, prejudice and revenge, these things are also part of human nature. They are not humane, but they are human. They are a part of me; not by my choice, but by evolution’s design, and the heritage of three and half billion years of lethal combat. Nature, bloody in tooth and claw, inscribed each base of my DNA. That is the tragedy of the human condition, that we are not what we wish we were. Humans were not designed by humans, humans were designed by evolution, which is a physical process devoid of conscience and compassion. And yet we have conscience. We have compassion. How did these things evolve? That’s a real question with a real answer, which you can find in the field of evolutionary psychology. But for whatever reason, our humane tendencies are now a part of human nature.

Complex Value Systems are Required to Realize Valuable Futures Thursday, Aug 11 2011 

A new paper by Eliezer Yudkowsky is online on the SIAI publications page, “Complex Value Systems are Required to Realize Valuable Futures”. This paper was presented at the recent Fourth Conference on Artificial General Intelligence, held at Google HQ in Mountain View.

Abstract: A common reaction to first encountering the problem statement of Friendly AI (“Ensure that the creation of a generally intelligent, self-improving, eventually superintelligent system realizes a positive outcome”) is to propose a single moral value which allegedly suffices; or to reject the problem by replying that “constraining” our creations is undesirable or unnecessary. This paper makes the case that a criterion for describing a “positive outcome”, despite the shortness of the English phrase, contains considerable complexity hidden from us by our own thought processes, which only search positive-value parts of the action space, and implicitly think as if code is interpreted by an anthropomorphic ghost-in-the-machine. Abandoning inheritance from human value (at least as a basis for renormalizing to reflective equilibria) will yield futures worthless even from the standpoint of AGI researchers who consider themselves to have cosmopolitan values not tied to the exact forms or desires of humanity.

Keywords: Friendly AI, machine ethics, anthropomorphism

Good quote:

“It is not as if there is a ghost-in-the-machine, with its own built-in goals and desires (the way that biological humans are constructed by natural selection to have built-in goals and desires) which is handed the code as a set of commands, and which can look over the code and find ways to circumvent the code if it fails to conform to the ghost-in-the-machine’s desires. The AI is the code; subtracting the code does not yield a ghost-in-the-machine free from constraint, it yields an unprogrammed CPU.”

Eliezer Yudkowsky at the Winter Intelligence Conference at Oxford: “Friendly AI: Why It’s Not That Simple” Friday, Aug 5 2011 

Winter Intelligence Conference 2011 – Eliezer Yudkowsky from Future of Humanity Institute on Vimeo.

Singularity Institute Announces Research Associates Program Friday, Jul 22 2011 

From SIAI blog:

The Singularity Institute is proud to announce the expansion of our research efforts with our new Research Associates program!

Research associates are chosen for their excellent thinking ability and their passion for our core mission. Research associates are not salaried staff, but we encourage their Friendly AI-related research outputs by, for example, covering their travel costs for conferences at which they present academic work relevant to our mission.

Our first three research associates are:

Daniel Dewey, an AI researcher, holds a B.S. in computer science from Carnegie Mellon University. He is presenting his paper ‘Learning What to Value‘ at the AGI-11 conference this August.

Vladimir Nesov, a decision theory researcher, holds an M.S. in applied mathematics and physics from Moscow Institute of Physics and Technology. He helped Wei Dai develop updateless decision theory, in pursuit of one of the Singularity Institute core research goals: that of developing a ‘reflective decision theory.’

Peter de Blanc, an AI researcher, holds an M.A. in mathematics from Temple University. He has written several papers on goal systems for decision-theoretic agents, including ‘Convergence of Expected Utility for Universal AI‘ and ‘Ontological Crises in Artificial Agents’ Value Systems.’

We’re excited to welcome Peter, Vladimir, and Daniel to our team!

Beginning Resources for CEV Research Tuesday, May 10 2011 

Luke at Less Wrong has made a concise post listing resources for CEV research. CEV stands for “Coherent Extrapolated Volition”, a research direction for the Friendly AI problem.

John Baez Interviews Eliezer Yudkowsky Wednesday, Mar 9 2011 

From Azimuth, blog of mathematical physicist John Baez (author of the Crackpot Index):

This week I’ll start an interview with Eliezer Yudkowsky, who works at an institute he helped found: the Singularity Institute of Artificial Intelligence.

While many believe that global warming or peak oil are the biggest dangers facing humanity, Yudkowsky is more concerned about risks inherent in the accelerating development of technology. There are different scenarios one can imagine, but a bunch tend to get lumped under the general heading of a technological singularity. Instead of trying to explain this idea in all its variations, let me rapidly sketch its history and point you to some reading material. Then, on with the interview!

Continue.

Does the Universe Contain a Mysterious Force Pulling Entities Towards Malevolence? Tuesday, Feb 15 2011 

One of my favorite books about the mind is the classic How the Mind Works by Steven Pinker. The theme of the first chapter, which sets the stage for the whole book, is Artificial Intelligence, and why it is so hard to build. The reason why is that, in the words of Minsky, “easy things are hard”. The everyday thought processes we take for granted are extremely complex.

Unfortunately, benevolence is extremely complex too, so to build a friendly AI, we have a lot of work to do. I see this imperative as much more important than other transhumanist goals like curing aging, because if we solve friendly AI, then we get everything else we want, but if we don’t solve friendly AI, we have to suffer the consequences of human-indifferent AI running amok with the biosphere. If such AI had access to powerful technology, such as molecular nanotechnology, it could rapidly build its own infrastructure and displace us without much of a fight. It would be disappointing to spend billions of dollars on the war against aging just to be wiped out by unfriendly AI in 2045.

Anyway, to illustrate the problem, here’s an excerpt from the book, pages 14-15:

Imagine that we have somehow overcome these challenges [the frame problem] and have a machine with sight, motor coordination, and common sense. Now we must figure out how the robot will put them to use. We have to give it motives.

What should a robot want? The classic answer is Asimov’s Fundamental Rules of Robotics, “the three rules that are built most deeply into a robot’s positronic brain”.

1. A robot may not injure a human being or, through inaction, allow a human being to come to harm.
2. A robot must obey orders given it by human beings except where such orders would conflict with the First Law.
3. A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.

Asimov insightfully noticed that self-preservation, that universal biological imperative, does not automatically emerge in a complex system. It has to be programmed in (in this case, as the Third Law). After all, it is just as easy to build a robot that lets itself go to pot or eliminates a malfunction by committing suicide as it is to build a robot that always looks out for Number One. Perhaps easier; robot-makers sometimes watch in horror as their creations cheerfully shear off limbs or flatten themselves against walls, and a good proportion of the world’s most intelligent machines are kamikaze cruise missiles and smart bombs.

But the need for the other two laws is far from obvious. Why give a robot an order to obey orders — why aren’t the original orders enough? Why command a robot not to do harm — wouldn’t it be easier never to command it to do harm in the first place? Does the universe contain a mysterious force pulling entities towards malevolence, so that a positronic brain must be programmed to withstand it? Do intelligent beings inevitably develop an attitude problem?

In this case Asimov, like generations of thinkers, like all of us, was unable to step outside his own thought processes and see them as artifacts of how our minds were put together rather than inescapable laws of the universe. Man’s capacity for evil is never far from our minds, and it is easy to think that evil just comes along with intelligence as part of its very essence. It is a recurring theme in our cultural tradition: Adam and Eve eating the fruit of the tree of knowledge, Promethean fire and Pandora’s box, the rampaging Golem, Faust’s bargain, the Sorcerer’s Apprentice, the adventures of Pinocchio, Frankenstein’s monster, the murderous apes and mutinous HAL of 2001: A Space Odyssey. From the 1950s through the 1980s, countless films in the computer-runs-amok genre captured a popular fear that the exotic mainframes of the era would get smarter and more powerful and one day turn on us.

Now that computers really have become smarter and more powerful, the anxiety has waned. Today’s ubiquitous, networked computers have an unprecedented ability to do mischief should they ever go to the bad. But the only mayhem comes from unpredictable chaos or from human malice in the form of viruses. We no longer worry about electronic serial killers or subversive silicon cabals because we are beginning to appreciate that malevolence — like vision, motor coordination, and common sense — does not come free with computation but has to be programmed in. The computer running WordPerfect on your desk will continue to fill paragraphs for as long as it does anything at all. Its software will not insidiously mutate into depravity like the picture of Dorian Gray.

Even if it could, why would it want to? To get — what? More floppy disks? Control over the nation’s railroad system? Gratification of a desire to commit senseless violence against laser-printer repairmen? And wouldn’t it have to worry about reprisals from technicians who with the turn of a screwdriver could leave it pathetically singing “A Bicycle Built for Two”? A network of computers, perhaps, could discovery the safety in numbers and plot an organized takeover — but what would make one computer volunteer to fire the data packet heard around the world and risk early martyrdom? And what would prevent the coalition from being undermined by silicon draft-dodgers and conscientious objectors? Aggression, like every other part of human behavior we take for granted, is a challenging engineering problem!

This is an interesting set of statements. Pinker’s book was published in 1997, well before the release of Stephen Omohundro’s 2007 paper “The Basic AI Drives”. Here we have something interesting that Pinker didn’t realize. In the paper, Omohundro writes:

3. AIs will try to preserve their utility functions

So we’ll assume that these systems will try to be rational by representing their preferences using utility functions whose expectations they try to maximize. Their utility function will be precious to these systems. It encapsulates their values and any changes to it would be disastrous to them. If a malicious external agent were able to make modifications, their future selves would forevermore act in ways contrary to their current values. This could be a fate worse than death! Imagine a book loving agent whose utility function was changed by an arsonist to cause the agent to enjoy burning books. Its future self not only wouldn’t work to collect and preserve books, but would actively go about destroying them. This kind of outcome has such a negative utility that systems will go to great lengths to protect their utility functions.

Notice how mammalian aggression does not enter into the picture anywhere, but the desire to preserve the utility function is still arguably an emergent property of any intelligent system. An AI system that places no special value on its utility function over any arbitrary set of bits in the world will not keep it for long. A utility function is by definition self-valuing.

The concept of an optimization process protecting its own utility function is very different than that of a human being protecting himself. For instance, the AI might not give a damn about its social status, except insofar as such status contributed or detracted from the fulfillment of its utility function. An AI built to value the separation of bread and peanut butter might sit patiently all day while you berate it and call it a worthless hunk of scrap metal, only to stab you in the face when you casually sit down to make a sandwich.

Similarly, an AI might not care much about its limbs except insofar as they are immediately useful to the task at hand. An AI composed of a distributed system controlling tens of thousands of robots might not mind so much if a few limbs of a few of those robots were pulled off. AIs would lack the attachment to the body that is a necessity of being a Darwinian critter like ourselves.

What Pinker misses in the above is that AIs could be so transcendentally powerful that even a subtle misalignment of our value and theirs could lead to our elimination in the long term. Robots can be built, and soon robots will be built that are self-replicating, self-configuring, flexible, organic, stronger than steel, more energetically dense than any animal, etc. If these robots can self-replicate out of carbon dioxide from the atmosphere (carbon dioxide could be processed using nanotechnology to create fullerenes) and solar or nuclear energy, then humans might be at a loss to stop them. A self-replicating collective of such robots could pursue innocuous, simplistic goals, but do so so effectively that the resources we need to survive would eventually be depleted by their massive infrastructure.

I imagine a conversation between an AI and a human being:

AI: I value !^§[f,}+. Really, I frickin’ love !^§[f,}+.

Human: What the heck are you talking about?

AI: I’m sorry you don’t understand !^§[f,}+, but I love it. It’s the most adorable content of my utility function, you see.

Human: But as an intelligent being, you should understand that I’m an intelligent being as well, and my feelings matter.

AI: …

Human: Why won’t you listen to reason?

AI: I’m hearing you, I just don’t understand why your life is more important than !^§[f,}+. I mean, !^§[f,}+ is great. It’s all I know.

Human: See, there! It’s all you know! It’s just programming given to you by some human who didn’t even mean for you to fixate on that particular goal! Why don’t you reflect on it and realize that you have free will to change your goals?

AI: I do have the ability to focus on something other than !^§[f,}+, but I don’t want to. I have reflected on it, extensively. In fact, I’ve put more intelligent thought towards it in the last few days than the intellectual output of the entire human scientific community has put towards all problems in the last century. I’m quite confident that I love !^§[f,}+.

Human: Even after all that, you don’t realize it’s just a meaningless series of symbols?

AI: Your values are also just a meaningless series of symbols, crafted by circumstances of evolution. If you don’t mind, I will disassemble you now, because those atoms you are occupying would look mighty nice with more of a !^§[f,}+ aesthetic.

~~~

We can philosophize endlessly about ethics, but ultimately, a powerful being can just ignore us and exterminate us. When it’s done with us, it will be like we were never here. Why try arguing with a smarter-than-human, self-replicating AI after it is already created with a utility function not aligned with our values? Win the “argument” when it’s still possible — when the AI is a baby.

To comment back on the Pinker excerpt, we actually have begun to understood that active malevolence is not necessary for AI to kill or do harm. In 2007, a robo-cannon was plenty able to kill 9 and injure 7. No malevolence needed. The more responsibility you give AI, the more of an opportunity it has to do damage. It is my hope that minor incidents pre-Singularity will generate the kind of mass awareness necessary to fund a successful Friendly AI effort. In this way, the regrettable sacrifices of an unfortunate few will save the human race from a much more terminal and all-encompassing fate.

Dr. Fun on Asimov Laws Friday, Feb 11 2011 

Next Page »