My Talk at Foresight 2010: “Don’t Fear the Singularity, but Be Careful: Friendly AI Design” Thursday, Mar 4 2010 

Michael Anissimov: “Don’t Fear the Singularity, but Be Careful: Friendly AI Design” at Foresight 2010 Conference from Foresight Institute on Vimeo.

Here’s my talk from Foresight! If you read this blog, there won’t be much new to you. I probably should have summarized the talk at the beginning. Unfortunately I got cut off at around slide 40 out of 55 due to schedule problems, so I missed the opportunity to summarize some of SIAI’s recent work and ended up mainly talking about 1) generic progress in AI, 2) media coverage of AI and Singularity, 3) the intelligence explosion idea, 4) the AI advantage, and 5) the inherent unconnectedness of morality and intelligence (Hume). Ignore the title; I didn’t really get into Friendly AI design at all. It was more of an introduction to why Friendly AI may be required. (I’m not sure I would have even used the term “Friendly AI” if I were making up the talk title again, because it’s been argued by a number of people that the term sounds silly and unserious.)

If I could redo this talk (I plan to do so on video) I would focus a little more on ideas and less on AI advancements, and throw out all the quotes, just quickly summarizing them instead. I would also try even harder to avoid looking down at my laptop during the talk, and would have removed my nametag. I need to buy one of those remote clicker things. I realize I spent a fair amount of time summarizing other people’s AI research rather than ideas unique to me or SIAI, but at the time it seemed necessary because I assumed that few people in the audience would be familiar with the range of advances in AI over the last year alone. People have to understand that AI is making steady progress, otherwise why worry about more advanced AI? If I thought AI really were stuck in the mud, then I wouldn’t be as frantic about the need for safe AI.

Several people pointed out to me that the talk title also seems odd because I am all about getting people to “fear the Singularity” — or fear a negative Singularity where humanity gets steamrolled by indifferent superintelligence. My idea here was that we don’t have to fear the Singularity if we’re careful. I often get the impression that people’s minds just shut down when considering the prospect of an AI Singularity, even if they don’t object to the plausibility of human-level AI in principle, just because they see it as extremely alien in comparison to a human-sparked Singularity. Part of the idea I was going for was that an AI-sparked Singularity can be managed effectively, but as I mentioned, I didn’t even get around to talking about that.

Thinking about my comment on the superficial mundaneity of analyzing the genetic expression of baker’s yeast, I realize that it may not be considered that mundane to some scientists, but I’m not sure because I’m not a biologist that researches microbial genetic expression. I just figured that since yeast is a model organism, we already know a fair amount about its patterns of genetic expression and that the experiments were mainly for show.

You can follow along with the talk with my slides here.

Kevin Warwick: Terminator Scenario “Realistic”, Singularity Likely in “Not Too Distant Future” Thursday, Feb 18 2010 

Kevin Warwick, though obviously is a Singularitarian, portrays the same adversarial stance against AI as other human chauvinists, such as James Hughes. I paraphrase it as: “If there’s an entity around that’s smarter and more powerful than me, then I’m going to equate that with me being subservient and freak the fuck out!”

My suggestion: calm down. Let’s do what we can to develop AIs that are nice people. There is no way we are going to outrace AI in the long run, so have to pursue this path, whether we like it or not. We are not going to eliminate all computers in the world, or keep power in the hands of humans forever. The question is not, “will the most powerful and capable entities in the world eventually be AIs?” (the answer is yes), the question is, “what the heck can we do to ensure our continued survival and prosperity once these entities inevitably become more capable than us?”

Sooner or later, positive experiences with AI programs or robots will cause these AI adversaries to understand that AIs could potentially become people too: worthy of our trust and love. The longer they keep up their adversarial attitude, the more time is wasted ignoring the challenge of engineering Friendly AIs. The year is 2010 and the clock is ticking.

Revisiting ‘Beyond Anthropomorphism’ Tuesday, Feb 16 2010 

My understanding of the concept of anthropomorphism really “clicked” when I first read “Beyond anthropomorphism”, part of Creating Friendly AI, an early (2000) Singularity Institute document. I strongly recommend it for those who are interested in better understanding the concept of non-anthropomorphic artificial intelligence. Here is the opening:

If you punch a human in the nose, he or she will punch back. If the human doesn’t punch back, it’s an admirable act of self-restraint, something worthy of note.

Imagine, for a moment, that you walk up and punch an AI in the nose. Does the AI punch back? Perhaps and perhaps not, but punching back will not be instinctive. A sufficiently young AI might stand there and think: “Hm. Someone’s fist just bumped into my nose.” In a punched human, blood races, adrenaline pumps, the hands form fists, the stance changes, all without conscious attention. For a young AI, focus of attention shifts in response to an unexpected negative event - and that’s all.

As the AI thinks about the fist that bumped into vis nose, it may occur to the AI that this experience may be a repeatable event rather than a one-time event, and since a punch is a negative event, it may be worth thinking about how to prevent future punches, or soften the negativity. An infant AI - one that hasn’t learned about social concepts yet - will probably think something like: “Hm. A fist just hit my nose. I’d better not stand here next time.”

The more I study nature and biology, the more I see that anthropomorphism gets in the way of understanding animals as well. Certain birds, cats, dogs, and even rodents are intelligent, but thinking of their intelligence merely as inferior to humans is not the whole story. Different forms of intelligence have to be understood on their own terms — not through starting with an archetype of human intelligence and making incremental modifications to that archetype. That sort of thinking can lead to anchoring.

Friendly AI Discussion with James Hughes Wednesday, Feb 10 2010 

James Hughes was gracious enough to post a comment on my recent post on the disagreements between pro-Friendly AI and anti-Friendly AI transhumanists, so I thought I would repost it here along with my response. Hughes said:

Michael

If your example of a purely selfless creature is a worker drone then we are indeed talking past one another on several levels.

I do believe it is possible for their to be expert systems which facilitate human communication and decision-making without imposing any goals of their own.

I do not believe that is what is intended when your group talks about “artificial general intelligence” which is supposed to be not only self-aware at a human level, but inconceivably more complex and powerful.

Your proposal is that if you start with “kernel code” that is as selfless as a worker drone or iPhone app that it will remain so when it becomes godlike.

I don’t buy it, and neither do most other people who hear the idea. It is, as I’ve said, a form of displaced religious faith in the purity and immutability of good code. In the beginning was the Code, and the Code was good…

Here is my response:

James,

Say that you place some amount of probability on a hard takeoff from the first superintelligence, say 5%.

Say that you aren’t sure that the superintelligence will lead to a hard takeoff, but to be “conservative”, you assume that it will, so you take as many precautions as you can.

You nominally have two choices: AI or IA?

I tentatively welcome either transition as long as the first superintelligence has human interests deeply in mind.

I so happen to think that AI superintelligence is probably easier than IA superintelligence, so it is in my best interest to maximize the probability that said AI superintelligence at least starts off human-friendly.

Even if we have no long-term control, we have control over the starting point.

I applaud anyone who is interested in making human-friendly IA superintelligence, but I don’t see that strong a movement in that direction, currently. Many people in SIAI are interested in IA and keep a close eye on it, so it’s the best place to be for those concerned about both IA and AI superintelligence.

Maybe the “Code” will fail, and will lead to our destruction. The goal of the Friendly AI movement is to increase our understanding as much as possible and promote the creation of seed AI with human-friendly initial motivations. If there is some cosmic force that automatically transforms human-friendly motivations into human-unfriendly motivations during the self-improvement process, then we are doomed either way. But, if human-friendly motivations give rise to self-modification choices that preserve the human-friendly utility function, then we will be in good shape.

The question boils down to: which would you rather have the first superintelligence be, AI or IA? Either you think the question doesn’t matter all that much, or you may have some preference. My preference is for AI, for a lot of reasons, but it’s unfair to imply that we are traitors to the human race just because we are working towards an AI Singularity. The very reason we want an AI Singularity to begin with is that we consider it the easiest way to preserve human values across the transition.

The background perspective for all of this is Bostrom’s “Future of Human Evolution” paper.

Your proposal is that if you start with “kernel code” that is as selfless as a worker drone or iPhone app that it will remain so when it becomes godlike.

I don’t buy it, and neither do most other people who hear the idea.

Why not? We know why humans get more selfish when they get power: because humans are programmed to pass on their genes, be subservient when weak, and destroy their enemies when they have the chance. Hence beta males in chimp clans sometimes band together and kill the alpha. The “more power = more selfishness” connection makes sense for Darwinian organisms, but we have specific mental routines that drive this behavior. Why do you think these mental routines would emerge de novo in an AI specifically uninterested in them? Wouldn’t they have to be deliberately programmed in? Otherwise, where would they come from? Remember that the AI has complete control over its own source code — it can enforce tyrannical control over its own mental content. Do you think it would just sublimely slip into another state of mind without even knowing it?

Are you familiar with the idea of the Blank Slate? I think that you, Mike Treder, and some others in H+ might have a Blank Slate view of intelligence, where mental properties unique to human minds are assumed to be properties of minds-in-general.

Another disagreement of ours seems to be around the ethics of building a selfless superintelligent Transition Guide to begin with. We don’t see it as ethically troublesome to build a selfless superintelligence, but you seem to imply that it’s both 1) unethical, and 2) extremely difficult. If it’s so difficult as to be impossible, why bother with condemning it ethically? Please clarify.

Disagreements Between Pro-and Anti-Friendly AI Transhumanists Tuesday, Feb 9 2010 

One of the greatest divisions between pro-Friendly AI transhumanists and anti-Friendly AI transhumanists may be a disagreement about whether unconditional kindness is physically possible. In a recent comment at the IEET, James Hughes said in response to Kaj Sotala:

I also do not believe in the possibility of a super-AI of the type you imagine capable of doing these tasks which did not have some kind of self-interest, or was not programmed to serve the interests of some group more than others. I think the notion of such a purely altruistic creatures is sublimated religion.

I’m not so convinced, but do note that SIAI threw out the idea of normative altruism as a goal system for Friendly AI some time ago, and replaced it with Coherent Extrapolated Volition (CEV). Still, I consider it plausible that the CEV output will result in some version of an unconditionally altruistic agent, so the question is important.

In a comment I made to the IEET that never appeared on the page (due to spam filter issues), I pointed out that our level of altruism towards other beings is roughly contingent upon how much shared genetic material we have with them. This is called kin selection, and is best expressed by the population genetics joke where one person asks another, “Would you give your life for your brother?”, and the other responds, “No, but I would give my life for four nephews or for eight cousins.”

From the Wikipedia page on inclusive fitness:

From the gene’s point of view, evolutionary success ultimately depends on leaving behind the maximum number of copies of itself in the population. Until 1964 it was generally believed that genes only achieved this by causing the individual to leave the maximum number of viable offspring possible. However, in 1964 W. D. Hamilton proved mathematically that because close relatives of an organism have some replica genes, the gene can also increase its evolutionary success by promoting the reproduction and survival of these related or otherwise similar individuals. This leads individuals to behave in a manner maximizing their inclusive fitness, rather than their individual fitness.

I suspect that Dr. Hughes may be almost a half-century behind the times in that he considers the notion of a purely altruistic being to be “sublimated religion”. After all, eusocial insects, and the two eusocial mammals, two species of mole rats, have taken kin selection to such an extreme that they will engage in self-sacrificial behavior to protect their colony. This is because genes only “care” about perpetuating copies of themselves — if a unique copy is only found within one individual, then individuals in that species will be self-interested, but if genetic material is shared to a high extent among the group, then selfless social behavior will evolve.

At the moment, I can conceive of several objections for why this behavior could not extend, even in principle, to superintelligence for humanity.

1. Superintelligence and humanity would be on two different levels, whereas eusocial insects and the like are of the same species.

2. For some reason, for organisms more intelligent than mole rats, eusociality cannot evolve, even in principle. There is just something about high intelligence that is inherently antagonistic to unconditional kindness. Perhaps it has to do with unconditional kindness being inherently “dumb” in some way.

3. Humans are inherently nasty creatures such that even a being programmed to love us would find it impossible to do so.

4. AI cannot do X, because AI lacks the magic juice that makes altruism possible in some humans and other organisms.

Maybe there are better objections than the above, but that’s just what I came up with off the top of my head. Note that at the moment I’m just addressing the feasibility of universal altruism question rather than whether it is practical to program. I have actually heard some version of all the above objections, so they are not straw men, though perhaps I am rewording them uncharitably for what I consider to be clarification.

Let me respond to each in turn. Claim #1 has evidence both for and against. Obviously, we have no superintelligence in front of us, so testing the claim explicitly is impossible. But I disagree with Vinge that no evidence can ever change our opinion on the issue. That is a topic worthy of another post, and I encourage anyone who is reading to think about predictions of the behavior or needs of superintelligence that we can make with a relatively high confidence. For instance, that superintelligence would need energy. Therefore, superintelligence would have to engage in some energy-seeking behavior. Bam, the “unpredictability horizon” hypothesis is false.

There is at least one piece of evidence mildly against claim #1, that too large of a gap would render eusociality impossible. Notice the gap of power between soldier ants and drones or queens. The soldier ants in a colony could easily kill drones or queens, but don’t, because of evolutionary motivations sculpted by kin selection. The same can apply to the power differential between parents and children. Parents could kill their children, but mostly don’t, because of evolutionary motivations. Observe how step-parents are more likely to abuse their children: why? Probably because the child is not genetically derived from the adult and therefore the adult has less evolutionary motivation to preserve or help the child.

The flow of kindness in ecological systems is obviously crafted by evolution, and can be estimated quantitatively using the concept of inclusive fitness. Some aggressive fish refrain from eating other, smaller fish that clean them because they have a symbiotic relationship. Symbiotic relationships evolve in all sorts of places. They exist because of cognitive programming and evolutionary pressures. The multiplicity of examples of “kindness” in nature makes a strong case that an agent that is universally altruistic is probably possible. The way that such kindness evolved through evolution also provides a strong argument that we will eventually duplicate it on our own, through reverse-engineering or abstracting the appropriate dynamics.

I will refrain from going into #2-4 because I am afraid of them being called straw arguments. If any of the commenters believe in any of these points, or have other ones to share, I encourage you to do so.

Roko Mijic on “Strong moral realism, meta-ethics and pseudo-questions” Friday, Feb 5 2010 

At Less Wrong, Roko Mijic claims that despite survey results, most philosophers are not really “strong” moral realists, and in fact their “non-realist” moral stance is often anti-realist for all practical purposes. Perhaps unsurprisingly, I totally agree.

Roko Mijic on the Friendly AI Problem at UKH+ Tuesday, Feb 2 2010 

Singularity Institute for Artificial Intelligence 2009 Accomplishments Saturday, Dec 26 2009 

Here is a summary of the Singularity Institute’s 2009 accomplishments that myself and other SIAI staff and friends compiled recently in preparation for our 2010 Singularity Research Challenge, where every dollar donated up to $100,000 will be matched. You can also select which research you choose to support, if you like. We compiled almost 20 grants to choose from. Without further adieu, here is the summary, and be sure to visit SIAI’s website for proper formatting and links:

2009 has been a year of growth and new horizons for the Singularity Institute for Artificial Intelligence (SIAI). We achieved a number of milestones relevant to our mission — pursuing dialogue, research, and activism to promote a beneficial Singularity. The response we’ve received has been considerable — SIAI is more high-profile and frequently-mentioned now than it has ever been.

Our key accomplishments in 2009 were holding the Singularity Summit in New York, hiring three new employees (Michael Vassar, Michael Anissimov, and Amy Willey), establishing a continuous SIAI Visiting Fellows Program, delivering eight presentations across four conferences, improving cooperation with allied organizations such as the Future of Humanity Institute, and establishing the Less Wrong web community, which receives thousands of visitors per day and fosters many high-quality discussions on philosophical and practical issues related to decision theory and rationality. The Uncertain Future, an interactive web application for quantitatively modeling future possibilities such as human-level AI, human intelligence enhancement, and global catastrophic risk, was also released as a beta version in December.

In April, Eliezer Yudkowsky completed two years of posting sequences on Less Wrong (which will be edited into a book on rationality and Singularity-relevant topics like reductionism and decision theory), drafting strategy documents for improving internal organization and long-term planning. Throughout the year, we continued consolidating SIAI staff, Visiting Fellows, volunteers, and interns in the San Francisco Bay Area. SIAI Visiting Fellow Peter de Blanc revised a paper on unbounded utility functions. The Singularity Institute received media coverage for its work in The New York Times, Popular Mechanics, Popular Science, Forbes, and many other venues. An article by SIAI President Michael Vassar, “Machine Minds”, made it into the Forbes special “The AI Report”.

The Singularity Institute’s long-term mission is to maximize the probability of a beneficial Singularity, through dialogue, research, and activism. All of our activities are ultimately chosen to further this purpose. The Singularity Institute particularly focuses on the possibility of a Singularity through artificial general intelligence, but also analyzes other potential pathways, including whole brain emulation and human cognitive enhancement.

To summarize our major accomplishments over the past year:

1. Singularity Summit 2009 in New York. Our fourth annual Singularity Summit was the first Singularity-focused conference ever held on the East Coast. Held October 3-4, the Singularity Summit featured 25 excellent speakers on topics including biotechnology, futurism, decision theory, artificial intelligence, quantum computing. the scientific method, cognitive ability, philosophy, computer science, and even synthetic neurobiology. Over 800 people attended, and the conference attracted reporters from over two dozen news organizations, including the New York Times. Coverage was provided by Popular Mechanics, Popular Science, Forbes, and many other media venues. Speakers this year included venture capitalist Peter Thiel, Wired magazine contributing editor Gary Wolf, AI researchers Juergen Schmidhuber, Marcus Hutter, and Itamar Arel, SIAI employees Anna Salamon, Ben Goertzel, and Eliezer Yudkowsky, philosopher David Chalmers, futurist Ray Kurzweil, Stephen Wolfram of Mathematica and Wolfram Alpha fame, and many others. Videos from the Summit are online at Vimeo. After the Summit, SIAI held an in-depth workshop, which allowed the speakers and SIAI staff to share ideas and brainstorm about the risks and benefits of a possible Singularity.

2. Hiring of new employees. Early in the year, Executive Director Tyler Emerson departed the Singularity Institute and his role was filled by a new President, Michael Vassar. Mr. Vassar holds a B.S. in biochemistry from Penn State and an MBA from Drexel University, and was previously Founder and Chief Strategist at Sir Groovy, an online music licensing firm. Prior to that, he held positions with Aon, the Peace Corps, and the National Institute of Standards and Technology. Throughout the year, he participated in numerous interviews and podcasts on behalf of SIAI, including interviews at Accelerating Future, The Futurist, Future Blogger, and h+ magazine.

Two new research fellows, Anna Salamon and Steve Rayhawk, were hired by SIAI in late 2008. Salamon and Rayhawk had previously participated in the 2008 SIAI Summer Program, which was led by Salamon. Salamon holds degrees in mathematics from UC Santa Barbara and Great Books from St. John’s, and Rayhawk holds a degree in mathematics from UC Santa Barbara. Salamon and Rayhawk are both focusing on applying computational Bayesian decision theory to problems in technological forecasting, risk management policy, and social epistemology, and form the core of our Visiting Fellows Program, bringing visiting scholars up to speed on the work that SIAI does. In early 2009, SIAI also hired a Media Director, Michael Anissimov, responsible for compiling, distributing, and promoting SIAI media materials including our writing, websites, and videos, and communicating the activities of SIAI to the public. Anissimov is author of Accelerating Future, a popular blog focused on science and futurism. Most recently, in December, SIAI hired Amy Willey, who holds a law degree from New York University, as Chief Compliance Officer.

With the addition of these new employees, SIAI brought its total full-time employee count to six, including Research Fellow Eliezer Yudkowsky, who has worked for SIAI since he co-founded the organization in 2000.

3. In 2009, SIAI established a Visiting Fellows Program, based in Silicon Valley. The program began with SIAI’s 2009 Summer Fellows, brought together to work on challenging projects in decision theory, philosophy, technological forecasting, heuristics and biases, and planning for the Singularity Summit 2009. Primarily graduate students, the Fellows came from educational backgrounds in mathematics, computer science, and physics, with the remainder ranging from philosophy to economics and biochemistry. They attend or hold degrees from universities including Harvard, Stanford, Yale, Cambridge, Carnegie Mellon, Auckland University, Moscow Institute of Physics and Technology, and the University of California-Santa Barbara. Fellows traveled to Silicon Valley from throughout the United States and from Russia, Belgium, the Netherlands, Sweden, Australia, New Zealand, the United Kingdom, and Canada. Some of these researchers stayed on past the summer or joined shortly thereafter to work with SIAI as volunteers or Visiting Fellows on a more extended basis. Some of the work that came out of the Visiting Fellows Program has been presented in papers and talks at venues like the European Conference on Computing and Philosophy, the Asia-Pacific Conference on Computing and Philosophy, and a Santa Fe Institute conference on forecasting. The Visiting Fellows Program has been instrumental in fostering a devoted community of Singularity Institute supporters making useful contributions towards SIAI’s ultimate goal, and SIAI recently put out a fresh call for new SIAI Visiting Fellows.

4. SIAI researchers, volunteers, and Visiting Fellows presented the following nine talks and papers throughout 2009:

* “Changing the frame of AI futurism: From storytelling to heavy-tailed, high-dimensional probability distributions”, by Steve Rayhawk, Anna Salamon, Tom McCabe, Rolf Nelson, and Michael Anissimov. (Presented at the European Conference of Computing and Philosophy in July ‘09 (ECAP))
* “Arms Control and Intelligence Explosions”, by Carl Shulman (Also presented at ECAP)
*“Machine Ethics and Superintelligence”, by Carl Shulman and Henrik Jonsson (Presented at the Asia-Pacific Conference of Computing and Philosophy in October ‘09 (APCAP))
*“Which Consequentialism? Machine Ethics and Moral Divergence”, by Carl Shulman and Nick Tarleton (Also presented at APCAP);
*“Long-term AI forecasting: Building methodologies that work”, an invited presentation by Anna Salamon at the Santa Fe Institute conference on forecasting;
*“Shaping the Intelligence Explosion” and “How much it matters to know what matters: A back of the envelope calculation”, presentations by Anna Salamon at the Singularity Summit 2009 in October;
* “Pathways to Beneficial Artificial General Intelligence: Virtual Pets, Robot Children, Artificial Bioscientists, and Beyond”, a presentation by SIAI Director of Research Ben Goertzel at Singularity Summit 2009;
* “Cognitive Biases and Giant Risks”, a presentation by SIAI Research Fellow Eliezer Yudkowsky at Singularity Summit 2009;
* “Convergence of Expected Utility for Universal Artificial Intelligence”, a paper by Peter de Blanc, an SIAI Visiting Fellow.

Many more talks and papers are in the works for 2010, including a talk by SIAI Media Director Michael Anissimov at the Foresight 2010 conference in January.

5. One of the primary goals of the Singularity Institute in 2009 was to strengthen our ties to academia and allied organizations, which was accomplished through talks, papers, and direct dialogue. SIAI researchers and representatives built closer ties to organizations such as the Future of Humanity Institute at Oxford University, Santa Fe Institute, American Association for Artificial Intelligence, Foresight Institute, and many others. SIAI researcher Anna Salamon was invited to give a talk at an exclusive conference on technological forecasting held by the Santa Fe Institute. The Singularity Institute has been using videoconferencing, blogs, and mailing lists to keep in contact with our supporters and collaborators around the globe. SIAI more than tripled its representatives through the Visiting Fellows program, allowing it to better interface with a larger network.

6. 2009 saw the founding of the Less Wrong web community. Less Wrong was founded as a rationalist community to “systematically improve on the art, craft, and science of human rationality”. Thousands of people visit the site every day, with hundreds participating regularly in the comments sections. Less Wrong grew out of Overcoming Bias, a blog co-authored by SIAI Research Fellow Eliezer Yudkowsky and George Mason University economist Robin Hanson. Yudkowsky wrote extensively on Overcoming Bias from 2007-2009, and his posts have been ported over to Less Wrong, where they are organized into sequences that address topics such as reductionism, determinism, human rationality, metaethics, mathematics, and many others.

Less Wrong is important to the Singularity Institute’s work towards a beneficial Singularity in providing an introduction to issues of cognitive biases and rationality relevant for careful thinking about optimal philanthropy and many of the problems that must be solved in advance of the creation of provably human-friendly powerful artificial intelligence. At the same time, it has gathered a community that can provide rapid feedback and significant progress on such problems. For instance, Less Wrong participants Wei Dai and Vladimir Nesov proposed decision algorithms that can deal with a certain classes of problems where Bayesian updating seems to lead decisionmakers astray. This work was closely related to decision theory work done in-house at SIAI, namely Eliezer Yudkowsky’s timeless decision theory, an algorithm that computes the counterfactual consequences of possible actions using an extension of Judea Pearl’s formalism of causal networks to logical uncertainties, and additional work by Anna Salamon and Steve Rayhawk. These developments have received positive attention from Gary Drescher and philosopher David Chalmers, and will be written up for peer review in the coming year.

Besides providing a home for an intellectual community dialoguing on rationality and decision theory, Less Wrong is also a key venue for SIAI recruitment. Many of the participants in SIAI’s Visiting Fellows Program first discovered the organization through Less Wrong.

7. This year Eliezer Yudkowsky finished his posting sequences on Less Wrong, which attracted thousands of enthusiastic readers and came to serve as the seed of a new community. Yudkowsky used the blogging format to write the substantive content of a book on rationality, enabling that work to be read and receive feedback as it progressed. Throughout the summer, Yudkowsky engaged in Friendly AI research with Marcello Herreshoff, a Stanford mathematics student who previously spent his gap year working for SIAI. Yudkowsky is now converting his blog sequences into the planned rationality book, which he hopes will significantly assist in attracting and inspiring talented individuals to effectively work towards the aims of a beneficial Singularity and reduced existential risk.

8. In December, a subset of SIAI researchers and volunteers finished improving The Uncertain Future web application to officially announce it as a beta version. The Uncertain Future represents a new kind of futurism — futurism with heavy-tailed, high-dimensional probability distributions. The purpose is to provide a tool for use by futurists and the informed public to input probability distributions over quantitative questions like, “how much computing power would be necessary to implement neuromorphic AI?”, combining them into a “picture of the future according to you”. Another goal of the project is to provide an alternative to the futurist methodologies of storytelling and scenario building, which dominate the field even though they often cause futurists to overestimate the probability of precise, vivid stories at the expense of a wider space of neglected possibilities.

Peter Singer and Agata Sagan’s Roboethics Article Appears in Japan Times Monday, Dec 21 2009 

The roboethics article I linked on the 15th subsequently appeared in the Japan Times on the 17th.

Tim Tyler on the Risks of Caution Sunday, Dec 20 2009 

H/t to Joshua Fox for the link.

I have been watching some of Tim’s videos over the last few months, but I definitely haven’t seen them all. This one is nice because it summarizes a poignant feeling of concern.

In this video, he builds a model of AGI development using construction paper and Post-It notes.

Peter Singer on Roboethics — Mentions the Singularity Institute Tuesday, Dec 15 2009 

Peter Singer, one of the world’s most influential public intellectuals, and co-author, independent Warsaw-based ethicist Agata Sagan, have published an article called “Rights for Robots?” at the Project Syndicate website. Project Syndicate is “the world’s foremost provider of original commentaries, bringing distinguished voices from around the planet to readers of 432 newspapers in 150 countries.” Other contributors to the site include Bjørn Lomborg, George Soros, Mikhail Gorbachev, and other distinguished persons. Here is the excerpt from the article that mentions SIAI and Eliezer Yudkowsky:

A more ominous question is familiar from novels and movies: Will we have to defend our civilization against intelligent machines of our own creation? Some consider the development of superhuman artificial intelligence inevitable, and expect it to happen no later than 2070. They refer to this moment as “the singularity,” and see it as a world-changing event.

Eliezer Yudkowsky, one of the founders of The Singularity Institute for Artificial Intelligence, believes that singularity will lead to an “intelligence explosion” as super-intelligent machines design even more intelligent machines, with each generation repeating this process. The more cautious Association for the Advancement of Artificial Intelligence has set up a special panel to study what it calls “the potential for loss of human control of computer-based intelligences.”

The panel found that the probability of an intelligence explosion was not great. But see Steve Rayhawk’s analysis for reasons why the chance they would find otherwise would indeed be quite low.

A Short Introduction to Coherent Extrapolated Volition (CEV) Tuesday, Dec 15 2009 

In 2004, Eliezer Yudkowsky of the Singularity Institute presented Coherent Extrapolated Volition (CEV) as a solution to the AI Friendliness Problem. The basic idea is to extrapolate the preferences of all humanity in such a way that we obtain an output that satisfices those preferences, then the CEV shuts down, its role finished. CEV is currently the most promising theory for building a Friendly AI.

A point I haven’t seen advanced before outside this document, though it seems pretty obvious, is that any AI, to be of any use to humans whatsoever, must use some variation of volition to fulfill human directives. Volition is introduced as follows: there are two boxes, box A and box B. One of the boxes has a diamond. Fred wants the diamond, and asks us for box A. We want him to have the diamond. One problem: the diamond is in box B. The document points out the problem with handing Fred box A:

But I do not simply say: “Well, Fred chose box A, and he got box A, so I fail to see why there is a problem.” There are several ways of stating my perceived problem:

1. Fred was disappointed on opening box A, and would have been happier on opening box B.
2. It is possible to predict that if Fred chooses box A, Fred will look back and wish he had chosen box B instead; while if Fred chooses box B, Fred will be satisfied with his choice.
3. Fred wanted “the box containing the diamond”, not “box A”, and chose box A only because he guessed that box A contained the diamond.
4. If Fred had known the correct answer to the question of simple fact, “Which box contains the diamond?”, Fred would have chosen box B.


Hence my intuitive sense that giving Fred box A, as he literally requested, is not actually helping Fred.


If you find a genie bottle that gives you three wishes, it’s probably a good idea to seal the genie bottle in a locked safety box under your bed, unless the genie pays attention to your volition, not just your decision.

A powerful AI, or genie (big difference), must follow our volition, not just our direct decisions, or it would be dangerous. It is easy to imagine even worse failures based on interpreting the letter rather than the spirit of our requests — for instance, a robot chauffeur designed to take one’s children to school would be viewed as an idiotic or evil entity if it took children to school even if the school were on fire, or covered in two feet of snow. Just decisions are never enough — an AI needs an interpretation of volition. I see some connection between the idea of volition and revealed preference — people often say one thing, for social signaling purposes (often subconsciously), when they actually mean something else, which can sometimes be inferred from how they act, not what they say.

To me, the question is not whether we’ll use some form of extrapolated volition to pilot and direct AGI, but what kind we choose to use. In his paper, Eliezer proposed the following:

In poetic terms, our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.

Phew! That’s a mouthful. What does he mean by “cohere”? What about “growing up farther together”? (I think that should read “further” — “farther” refers to physical distance.) How can we model growing up further together without actually modeling all 6 billion humans interacting socially? Not all these questions are answered in the document. (Some are.) I still regard it as a good starting point. It’s superior to the prior idea that Eliezer had, which was to create an AI that is a “normative altruist” and uses various “anchors and shapers” to craft a “normative morality”. CEV “cheats” by sucking the metamoral content out of the entire human race, like a gigantic infovorous vacuum machine.

The alternative to these sorts of extrapolation schemes all involve a programmer directly dictating the goal content of the AI in one way or another, which leaves you wide open to programmer-biased goal systems. Since the goal system of the first self-improving AI could quite plausibly dictate the fate of the universe from that point on, this is probably a bad idea. Other alternatives, like the one proposed by Bill Hibbard, involve direct feedback where humans essentially push buttons for what they like and the AI is eventually supposed to figure out moral philosophy. (Presumably.) The problem with this is that human metamorality is extremely complex and a system that absorbs the surface features without an eye for deep structure is destined to fail in stupid ways.

Humans can learn more or less what moral behavior is from other humans because much of our metamoral framework is already programmed in from an early age. When a child steals a plate full of cookies that are meant for after dinner, and a parent says, “don’t do that!”, unless the child is extremely young, he or she will generally know what they did wrong and why the adult has a problem with it. A poorly programmed AI, on the other hand, would have no metamoral framework. Was it wrong because cookies are inherently evil? Because the AI did not bake the cookies itself? Because AIs are not meant to have cookies? An AI might know all the facts about cookies and their historical context, but that still won’t give it the background it needs to find out why taking the cookies was “wrong”, and to what extent it was “wrong”. If eating the cookies saved a life, it might not be wrong. What if eating a cookie saved a billion lives a billion years in the the future? A purely utilitarian AI might exterminate the human race today if it thought that doing so would create the greatest utility in the long term. AIs with hand-coded value systems may not have the “moral common sense” that humans do. Moral facts do not follow physical facts. In some cases, we are morally biased for meaningless reasons like small changes in the wording of a hypothetical moral dilemma, or other framing effects. How is an AI supposed to make heads or tails of “right” and “wrong”? Giving up is not a choice — we need an AI we can trust with nuclear weapons or worse. More sophisticated extrapolations of revealed preferences seem to be the most sensible pathway.

The moral realists suppose that a sufficiently intelligent AI will figure out “right” and “wrong” because they are self-evident. This is suicide. Right and wrong are not objective things-in-the-world, but human constructions. Murder is not wrong because it’s objectively wrong, but because human moral development over the course of thousands of years has decided that it is wrong most of the time. People worry about this interpretation of morality because they believe it’s a slippery slope, but Joshua Greene’s PhD thesis goes over all the reasons we might be afraid of moral anti-realism and shows that none of them are really compelling. Whether or not we consider moral anti-realism to be good for society, evolutionary psychology and cognitive science show us that it is true, whether we like it or not.

There is a lot of confusion around the idea of Coherent Extrapolated Volition, which I attribute mostly just to people commenting on the concept without reading the easily digestible 28-page document. People will comment on a concept for years without reading a short document actually explaining it. The way this works is that you read the first page, or less, then fill in the gaps with your imagination.

To dispel some of the worst misconceptions about CEV, here is a short list of “6 points about Coherent Extrapolated Volition” that was posted to the SL4 mailing list in July 2005:

1. Coherent Extrapolated Volition is not a majority vote. No human being is asked to actually decide anything.

2. The key word in “Coherent Extrapolated Volition” is “extrapolated”. CEV does not use judgments produced by the sort of human beings that exist today.

3. The CEV writes an AI. This AI may or may not work in any way remotely resembling a volition-extrapolator.

4. The CEV returns one coherent answer. The AI it returns may or may not display any given sort of coherence in how it treats different people, or create any given sort of coherent world.

5. The CEV runs for five minutes before producing an output. It is not meant to govern for centuries.

6. The CEV by itself does not mess around with your life. The CEV just decides which AI to replace itself with.

For a jumping-off point into one discussion about CEV, see this SL4 thread from Oct. 2008: “Just how coherent does CEV have to be?”, which began with a question proposed by Alex Bokov. Kaj Sotala points out that the initial question is answered in the CEV document itself.

If you have any burning questions, check out the PAQ (Previously Asked Questions) portion of the CEV document first. For another very short summary of the CEV concept, see its Wikipedia entry.

Next Page »