Some Singularity, Superintelligence, and Friendly AI-Related Links
This is a good list of links to bring readers up to speed on some of the issues often discussed on this blog.
Nick Bostrom: Ethical Issues in Advanced Artificial Intelligence
http://www.nickbostrom.com/ethics/ai.html
Nick Bostrom: How Long Before Superintelligence?
http://www.nickbostrom.com/superintelligence.html
Yudkowsky: Why is rapid self-improvement in human-equivalent AI possibly likely?
Part 3 of Levels of Organizational in General Intelligence: Seed AI
http://singinst.org/upload/LOGI/seedAI.html
Anissimov: Relative Advantages of AI, Computer Programs, and the Human Brain
http://www.acceleratingfuture.com/articles/relativeadvantages.htm
Yudkowsky: Creating Friendly AI: "Beyond anthropomorphism"
http://singinst.org/ourresearch/publications/CFAI/anthro.html
Yudkowsky: "Why We Need Friendly AI" (short)
http://www.preventingskynet.com/why-we-need-friendly-ai/
Yudkowsky: "Knowability of FAI" (long)
http://acceleratingfuture.com/wiki/Knowability_Of_FAI
Yudkowsky: A Galilean Dialogue on Friendliness (long)
http://sl4.org/wiki/DialogueOnFriendliness
Stephen Omohundro -- Basic AI Drives
http://selfawaresystems.com/2007/11/30/paper-on-the-basic-ai-drives/
http://selfawaresystems.com/2009/02/18/agi-08-talk-the-basic-ai-drives/ (video)
Links on Friendly AI
http://www.acceleratingfuture.com/michael/blog/2006/09/consolidation-of-links-on-friendly-ai/
Anissimov: Yes, the Singularity is the Biggest Threat to Humanity
http://www.acceleratingfuture.com/michael/blog/2011/01/yes-the-singularity-is-the-biggest-threat-to-humanity/
Abstract of a talk I'm giving soon
http://www.acceleratingfuture.com/michael/blog/2011/01/my-upcoming-talk-in-texas-anthropomorphism-and-moral-realism-in-advanced-artificial-intelligence/
Most recent SIAI publications:
http://www.acceleratingfuture.com/michael/blog/2010/12/new-singularity-institute-publications-in-2010/
More posts from this blog
http://www.acceleratingfuture.com/michael/blog/2010/06/the-world-the-singularity-creates-could-destroy-all-value/
http://www.acceleratingfuture.com/michael/blog/2010/06/reducing-long-term-catastrophic-artificial-intelligence-risk/
http://www.acceleratingfuture.com/michael/blog/2009/10/answering-popular-sciences-10-questions-on-the-singularity/
http://www.acceleratingfuture.com/michael/blog/2009/09/is-smarter-than-human-intelligence-possible/
http://www.acceleratingfuture.com/michael/blog/2009/04/interview-with-singularity-institute-president-michael-vassar/
http://www.acceleratingfuture.com/michael/blog/2009/03/technological-singularitysuperintelligencefriendly-ai-concerns/
GOOD magazine miniseries on the Singularity
http://www.good.is/post/singularity-101-what-is-the-singularity/
I’m Quoted on Friendly AI in the United Church Observer
This magazine circulates to 60,000 Canadian Christians. It's not a stupid publication just because it's Christian... atheist Digg/Reddit geeks (80% of the audience of this blog, I'd wager) need to broaden their horizons just a bit. Remember that Christians and other Theologians can think and say many intelligent things because they compartmentalize their thinking effectively. The topic of the article is friendly AI, and many people already said that they thought this was one of the best mainstream media articles on the topic because it doesn't take a simplistic angle and actually probes the technical issues.
Here's the bit with me in it:
Nevertheless, technologists are busy fleshing out the idea of “friendly AI†in order to safeguard humanity. The theory goes like this: if AI computer code is steeped in pacifist values from the very beginning, super-intelligence won’t rewrite itself into a destroyer of humans. “We need to specify every bit of code, at least until the AI starts writing its own code,†says Michael Anissimov, media director for the Singularity Institute for Artificial Intelligence, a San Francisco think-tank dedicated to the advancement of beneficial technology. “This way, it’ll have a moral goal system more similar to Gandhi than Hitler, for instance.â€
Many people who naively talk about AI and superintelligence act like superintelligence will certainly do X or Y (of course there are all sorts of intuitive camps, "they'll just leave us alone and go into space" is a popular sentiment) no matter what the initial conditions, implying that trying to set the initial conditions doesn't matter.
Would you rather have an AI with initial motivations closer to Gandhi or Hitler? If you have any preference, then you've just demonstrated concern for the Friendly AI problem. It's remarkable that I actually have a challenging time arguing on a daily basis that an AI with more in common with Gandhi would be better to build first than one with more in common with Hitler, but it's true.
Some people say, "but, whatever initial programming it has will be gone after many cycles of self-improvement". No, not necessarily, because the AI will be making its own programming changes. It will dictate its goal structure, not outside forces. More like a being creating itself than an evolution-made being with a goal system filled with strange attractors that flip back and forth depending on immediate context (humans).
Setting the initial conditions for AI properly is probably the most important task humanity faces, because AGI seems more likely to reach superintelligence first than human intelligence enhancement, despite the better science fiction movie potential and personal/tribal identification possibilities of the latter. John Smart presents a few good reasons why this is likely in his Limits to Biology essay.
My Upcoming Talk in Texas: Anthropomorphism and Moral Realism in Advanced Artificial Intelligence
I was recently informed that my abstract was accepted for presentation at the Society for Philosophy and Technology conference in Denton, TX, this upcoming May 26 - 29. You may have heard of their journal, Techné. Register now for the exciting chance to see me onstage, talking AI and philosophy. If you would volunteer to film me, that would make me even more excited, and valuable to our most noble cause.
Here's the abstract:
Anthropomorphism and Moral Realism in Advanced Artificial Intelligence
Michael Anissimov
Singularity Institute for Artificial Intelligence
Humanity has attributed human-like qualities to simple automatons since the time of the Greeks. This highlights our tendency to anthropomorphize (Yudkowsky 2008). Today, many computer users anthropomorphize software programs. Human psychology is extremely complex, and most of the simplest everyday tasks have yet to be replicated by a computer or robot (Pinker 1997). As robotics and Artificial Intelligence (AI) become a larger and more important part of civilization, we have to ensure that robots are capable of making complex, unsupervised decisions in ways we would broadly consider beneficial or common-sensical. Moral realism, the idea that moral statements can be true or false, may cause developers in AI and robotics to underestimate the effort required to meet this goal. Moral realism is a false, but widely held belief (Greene 2002). A common notion in discussions of advanced AI is that once an AI acquires sufficient intelligence, it will inherently know how to do the right thing morally. This assumption may derail attempts to develop human-friendly goal systems in AI by making such efforts seem unnecessary.
Although rogue AI is a staple of science fiction, many scientists and AI researchers take the risk seriously (Bostrom 2002; Rees 2003; Kurzweil 2005; Bostrom 2006; Omohundro 2008; Yudkowsky 2008). Arguments have been made that superintelligent AI -- an intellect much smarter than the best human brains in practically every field -- could be created as early as the 2030s (Bostrom 1998; Kurzweil 2005). Superintelligent AI could copy itself, potentially accelerate its thinking and action speeds to superhuman levels, and rapidly self-modify to increase its own intelligence and power further (Good 1965; Yudkowsky 2008). A strong argument can be made that superintelligent machines will eventually become a dominant force on Earth. An "intelligence explosion" could result from communities or individual artificial intelligences rapidly self-improving and acquiring resources.
Most AI rebellion in fiction is highly anthropomorphic -- AIs feeling resentment towards their creators. More realistically, advanced AIs might pursue resources as instrumental objectives in pursuit of a wide range of possible goals, so effectively that humans could be deprived of space or matter we need to live (Omohundro 2008). In this manner, human extinction could come about through the indifference of more powerful beings rather than outright malevolence. A central question is, "how can we design a self-improving AI that remains friendly to humans even if it eventually becomes superintelligent and gains access to its own source code?" This challenge is addressed in a variety of works over the last decade (Yudkowsky 2001; Bostrom 2003; Hall 2007; Wallach 2008) but is still very much an open problem.
A technically detailed answer to the question, "how can we create a human-friendly superintelligence?" is an interdisciplinary task, bringing together philosophy, cognitive science, and computer science. Building a background requires analyzing human motivational structure, including human-universal behaviors (Brown 1991), and uncovering the hidden complexity of human desires and motivations (Pinker 1997) rather than viewing Homo sapiens as a blank slate onto which culture is imprinted (Pinker 2003). Building artificial intelligences by copying human motivational structures may be undesirable because human motivations given capabilities of superintelligence and open-ended self-modification could be dangerous. Such AIs might "wirehead" themselves by stimulating their own pleasure centers at the expense of constructive or beneficent activities in the external world. Experimental evidence of the consequences of direct stimulation of the human pleasure center is very limited, but we have anecdotal evidence in the form of drug addiction.
Since artificial intelligence will eventually exceed human capabilities, it is crucial that the challenge of creating a stable human-friendly motivational structure in AI is solved before the technology reaches a threshold level of sophistication. Even if advanced AI is not created for hundreds of years, many fruitful philosophical questions are raised by the possibility (Chalmers 2010).
References
Bostrom, N. (2002). "Existential Risks: Analyzing Human Extinction Scenarios". Journal of Evolution and Technology, 9(1).
Bostrom, N. (2003). "Ethical Issues in Advanced Artificial Intelligence". Cognitive, Emotive and Ethical Aspects of Decision Making in Humans and in Artificial Intelligence.
Bostrom, N. (2006). "How long before superintelligence?". Linguistic and Philosophical Investigations 5 (1): 11–30.
Brown, D. (1991). Human Universals. McGraw Hill.
Chalmers, D. (2010). "The Singularity: a Philosophical Analysis". Presented at the Singularity Summit 2010 in New York.
Good, I. J. (1965). "Speculations Concerning the First Ultraintelligent Machine", Advances in Computers, vol 6, Franz L. Alt and Morris Rubinoff, eds, pp 31-88, Academic Press.
Greene, J. (2002). The Terrible, Horrible, No Good, Very Bad Truth about Morality and What to Do About it. Doctoral Dissertation for the Department of Philosophy, Princeton University, June 2002.
Hall, J.S. (2007). Beyond AI: Creating the Conscience of the Machine. Amherst: Prometheus Books.
Omohundro, S. (2008). "The Basic AI Drives". Proceedings of the First AGI Conference, Volume 171, Frontiers in Artificial Intelligence and Applications, edited by P. Wang, B. Goertzel, and S. Franklin, February 2008, IOS Press.
Pinker, S. (1997). How the Mind Works. Penguin Books.
Pinker, S. (2003). The Blank Slate: the Modern Denial of Human Nature. Penguin Books.
Rees, M. (2003). Our Final Hour: A Scientist's Warning : how Terror, Error, and Environmental Disaster Threaten Humankind's Future in this Century - on Earth and Beyond. Basic Books.
Wallach, W. & Allen, C. (2008). Moral Machines: Teaching Robots Right from Wrong. Oxford University Press.
Yudkowsky, E. (2001). Creating Friendly AI. Publication of the Singularity Institute for Artificial Intelligence.
Yudkowsky, E. (2008). "Artificial Intelligence as a positive and negative factor in global risk". In N. Bostrom and M. Cirkovic (Eds.), Global Catastrophic Risks (pp. 308-343). Oxford University Press.
Artificial Intelligence as a Positive and Negative Factor in Global Risk Now Available in Chinese
Here the chapter in English, here's the Chinese version.
Phil Bowermaster on the Singularity
Over at the Speculist, Phil Bowermaster understands the points I made in "Yes, the Singularity is the biggest threat to humanity", which, by the way, was recently linked by Instapundit, who unfortunately probably doesn't get the point I'm trying to make. Anyway, Phil said:
Greater than human intelligences might wipe us out in pursuit of their own goals as casually as we add chlorine to a swimming pool, and with as little regard as we have for the billions of resulting deaths. Both the Terminator scenario, wherein they hate us and fight a prolonged war with us, and the Matrix scenario, wherein they keep us around essentially as cattle, are a bit too optimistic. It's highly unlikely that they would have any use for us or that we could resist such a force even for a brief period of time -- just as we have no need for the bacteria in the swimming pool and they wouldn't have much of a shot against our chlorine assault.
"How would the superintelligence be able to wipe us out?" you might say. Well, there's biowarfare, mass-producing nuclear missiles and launching them, hijacking existing missiles, neutron bombs, lasers that blind people, lasers that burn people, robotic mosquitos that inject deadly toxins, space-based mirrors that set large areas on fire and evaporate water, poisoning water supplies, busting open water and gas pipes, creating robots that cling to people, record them, and blow up if they try anything, conventional projectiles... You could bathe people in radiation to sterilize them, infect corn fields with ergot, sprinkle salt all over agricultural areas, drop asteroids on cities, and many other approaches that I can't think of because I'm a stupid human. In fact, all of the above is likely nonsense, because it's just my knowledge and intelligence that is generating the strategies. A superintelligent AI would be much, much, much, much, much smarter than me. Even the smartest person you know would be an idiot in comparison to a superintelligence.
One way to kill a lot of humans very quickly might be through cholera. Cholera is extremely deadly and can spread very quickly. If there were a WWIII and it got really intense, countries would start breaking out the cholera and other germs to fight each other. Things would really have to go to hell before that happened, because biological weapons are nominally outlawed in war. However, history shows that everyone breaks the rules when they can get away with it or when they're in deep danger.
Rich people living in the West, especially Americans, have forgotten the ways that people have been killing each other for centuries, because we've had a period of relative stability since WWII. Sometimes Americans appear to think like teenagers, who believe they are apparently immortal. This is a quintessentially ultra-modern and American way of thinking, though most of the West thinks this way. For most of history, people have realized how fragile they were and how aggressively they need to fight to defend themselves from enemies inside and out. With our sophisticated electrical infrastructure (which, by the way, could be eliminated by a few EMP-optimized nuclear weapons detonated in the ionosphere), nearly unlimited food, water, and other conveniences present themselves to us on silver platters. We overestimate the robustness of our civilization because it's worked smoothly so far.
Superintelligences would eventually be able to construct advanced robotics that could move very quickly and cause major problems for us if they wanted to. Robotic systems constructed entirely of fullerenes could be extremely fast and powerful. Conventional bullets and explosives would have great difficulty damaging fullerene-armored units. Buckyballs only melt at roughly 8,500 Kelvin, almost 15,000 degrees Fahrenheit. 15,000 degrees. That's hotter than the surface of the Sun. (Update: Actually, I'm wrong here because the melting point of bulk nanotubes has not been determined and is probably significantly less. 15,000 degrees is roughly the temperature that a single buckyball apparently breaks apart at. However, some structures, such as nanodiamond, would literally be macroscale molecules and might have very high melting points.) Among "small arms", only a shaped charge, which moves at around 10 km/sec, could make a dent in thick fullerene armor. Ideally you'd have a shaped charge made out of a metal with extremely high mass and temperature, like molten uranium. Still, if the robotic system moved fast enough and could simply detect where the charges were, conventional human armies wouldn't be able to do much against it, except for perhaps use nuclear weapons. Weapons like rifles wouldn't work because they simply wouldn't deliver enough energy in a condensed enough space. To have any chance of destroying a unit that moves at several thousands of mph and can dodge missiles, nuclear weapons would likely be required.
When objects move fast enough, they will be invisible to the naked eye. How fast something needs to move to be unnoticeable varies based on its size, but for an object a meter long it's about 1,100 mph, approximately Mach 1. There is no reason why engines could not eventually be developed that propel person-sized objects to those speeds and beyond. In this very exciting post, I list a few possible early-stage products that could be built with molecular nanotechnology that could take advantage of high power densities. Google "molecular nanotechnology power density" for more information on the kind of technology a superintelligence could develop and use to take over the world quite quickly.
A superintelligence, not being stupid, would probably hide itself in a quarantined facility while it developed the technologies it needed to prepare for doing whatever it wants in the outside world. So, we won't know anything about it until it's all ready to go.
Here's the benefits of molecular manufacturing page from CRN. Remember this graph I made? Here it is:
We'll still be stuck in the blue region while superintelligences develop robotics in the orange and red regions and have plenty of ability to run circles around us. There will be man-sized systems that move at several times the speed of sound and consume kilowatts of energy. Precise design can minimize the amount of waste heat produced. The challenge is swimming through all that air without being too noticeable. There will be tank-sized systems with the power consumption of aircraft carriers. All these things are probably possible, no one has built them yet. People like Brian Wang, who writes one of the most popular science/technology blogs on the Internet, take it for granted that these kind of systems will eventually be built. The techno-elite know that these sorts of things are physically possible, it's just a matter of time. Many of them might consider technologies like this centuries away, but for a superintelligence that never sleeps, never gets tired, can copy itself tens of millions of times, and parallelize its experimentation, research, development, and manufacturing, we might be surprised how quickly it could develop new technologies and products.
The default understanding of technology is that the technological capabilities of today will pretty much stick around forever, but we'll have spaceships, smaller computers, and bigger televisions, perhaps with Smell-O-Vision. The future would be nice and simple if that were true, but for better or for worse, there are vast quadrants of potential technological development that 99.9% of the human species has never heard of, and vaster domains that 100% of the human species has never even thought of. Superintelligence will happily and casually exploit those technologies to fulfill its most noble goals, whether those noble goals involve wiping out humanity, or maybe healing all disease, aging, and creating robots to do all the jobs we don't feel like doing. Whatever its goals are, a superintelligence will be most persuasive in arguing for how great and noble they are. You won't be able to win an argument against a superintelligence unless it lets you. It will simply be right and you will be wrong. One could even imagine a superintelligence so persuasive that it convinces mankind to commit suicide by making us feel bad about our own existence. In that case it might need no actual weapons at all.
The above could be wild speculation, but the fact is we don't know. We won't know until we build a superintelligence, talk to it, and see what it can do. This is something new under the Sun, no one has the experience to conclusively say what it will or won't be able to do. Maybe even the greatest superintelligence will be exactly as powerful as your everyday typical human (many people seem to believe this), or, more likely, it will be much more powerful in every way. To confidently say that it will be weak is unwarranted -- we lack the information to state this with any confidence. Let's be scientific and wait for empirical data first. I'm not arguing with extremely high confidence that superintelligence will be very strong, I just have a probability distribution over possible outcomes, and doing an expected value calculation on that distribution leads me to believe that the prudent utilitarian choice is to worry. It's that simple.
Remember, most transhumanists aren't afraid of superintelligence because they actually believe that they and their friends will personally become the first superintelligences. The problem is that everyone thinks this, and they can't all be right. Most likely, none of them are. Even if they were, it would be rude for them to clandestinely "steal the Singularity" and exploit the power of superintelligence for their own benefit -- possibly at the expense of the rest of us. Would-be mavericks should back off and help build a more democratic solution, a solution that ensures that the benefits of superintelligence are equitably distributed among all humans and perhaps (I would argue) to some non-human animals, such as vertebrates.
Coherent Extrapolated Volition (CEV) is one idea that has been floated for a more democratic solution, but it is by no means the final word. We criticize CEV and entertain other ideas all the time. No one said that AI Friendliness would be easy.
Tallinn-Evans Challenge Grant Successful
As many of you probably know, I'm media director for the Singularity Institute, so I like to cross-post important posts from the SIAI blog here. Our challenge grant was a success -- we raised $250,000. I am extremely appreciative to everyone who donated. Without SIAI, humanity would be kind of screwed, because very few others take the challenge of Friendly AI seriously -- at all. The general consensus view on the questions is "Asimov laws, right?" No, not Asimov Laws. Many AI researchers still aren't clear on the fact that Asimov laws were a plot device.
Anyway, here's the announcement:
Thanks to the effort of our donors, the Tallinn-Evans Singularity Challenge has been met! All $125,000 contributed will be matched dollar for dollar by Jaan Tallinn and Edwin Evans, raising a total of $250,000 to fund the Singularity Institute's operations in 2011. On behalf of our staff, volunteers, and entire community, I want to personally thank everyone who donated. Keep watching this blog throughout the year for updates on our activity, and sign up for our mailing list if you haven't yet.
Here's to a better future for the human species.
We are preparing a donor page to provide a place for everyone who donated to share some information about themselves if they wish, including their name, location, and a quote about why they donate to the Singularity Institute. If you would like to be included in our public list, please email me.
Again, thank you. The Singularity Institute depends entirely on contributions from individual donors to exist. Money is indeed the unit of caring, and one of the easiest ways that anyone can contribute directly to the success of the Singularity Institute. Another important way you can help is by plugging us into your networks, so please email us if you want to help.
If you're interested in connecting with other Singularity Institute supporters, we encourage joining our group on Facebook. There are also local Less Wrong meetups in cities like San Francisco, Los Angeles, New York, and London.
Yes, The Singularity is the Biggest Threat to Humanity
Some folks, like Aaron Saenz of Singularity Hub, were surprised that the NPR piece framed the Singularity as "the biggest threat to humanity", but that's exactly what the Singularity is. The Singularity is both the greatest threat and greatest opportunity to our civilization, all wrapped into one crucial event. This shouldn't be surprising -- after all, intelligence is the most powerful force in the universe that we know of, obviously the creation of a higher form of intelligence/power would represent a tremendous threat/opportunity to the lesser intelligences that come before it and whose survival depends on the whims of the greater intelligence/power. The same thing happened with humans and the "lesser" hominids that we eliminated on the way to becoming the #1 species on the planet.
Why is the Singularity potentially a threat? Not because robots will "decide humanity is standing in their way", per se, as Aaron writes, but because robots that don't explicitly value humanity as a whole will eventually eliminate us by pursuing instrumental goals not conducive to our survival. No explicit anthropomorphic hatred or distaste towards humanity is necessary. Only self-replicating infrastructure and the smallest bit of negligence.
Why will advanced AGI be so hard to get right? Because what we regard as "common sense" morality, "fairness", and "decency" are all extremely complex and non-intuitive to minds in general, even if they seem completely obvious to us. As Marvin Minsky said, "Easy things are hard." Even something as simple as catching a ball requires a tremendous amount of task-specific computation. If you read the first chapter of How the Mind Works, the bestselling book by Harvard psychologist Stephen Pinker, he harps on this for almost 100 pages.
Basic AI Drives
There are "basic AI drives" we can expect to emerge in sufficiently advanced AIs, almost regardless of their initial programming. Across a wide range of top goals, any AI that uses decision theory will want to 1) self-improve, 2) have an accurate model of the world and consistent preferences (be rational), 3) preserve their utility functions, 4) prevent counterfeit utility, 5) be self-protective, and 6) acquire resources and use them efficiently. Any AI with a sufficiently open-ended utility function (absolutely necessary if you want to avoid having human beings double-check every decision the AI makes) will pursue these "instrumental" goals (instrumental to us, terminal to an AI without motivations strong enough to override them) indefinitely as long as it can eke out a little more utility from doing so. AIs will not have built in satiation points where they say, "I've had enough". We have to program those in, and if there's a potential satiation point we miss, the AI will just keep pursuing "instrumental to us, terminal to it" goals indefinitely. The only way we can keep an AI from continuously expanding like an endless nuclear explosion is to make it to want to be constrained (entirely possible -- AIs would not have anthropomorphic resentment against limitations unless such resentment were helpful to accomplishing its top goals), or design it to replace itself with something else and shut down.
The easiest kind of advanced AGI to build would be a type of idiot savant -- a machine extremely good at performing the tasks we want, and which acts reasonably within the domain for which it was intended, but starts to act in unexpected ways when ported into domains outside those that the programmers anticipated. To quote Omohundro:
Surely no harm could come from building a chess-playing robot, could it? In this paper we argue that such a robot will indeed be dangerous unless it is designed very carefully. Without special precautions, it will resist being turned off, will try to break into other machines and make copies of itself, and will try to acquire resources without regard for anyone else’s safety. These potentially harmful behaviors will occur not because they were programmed in at the start, but because of the intrinsic nature of goal driven systems.
Goal-Driven Systems Care About Their Goals, Not You
Goal-driven systems strive to achieve their goals. "Common sense", "decency", "respect", "the Golden Rule", and other "intuitive" human concepts, which are extremely complicated black boxes, need not enter into the picture. Again, I strongly recommend the first chapter of How the Mind Works to get a better grasp of how the way we think is not "obvious", but highly contingent on our evolutionary history and the particular constraints of our brains. Our worlds are filled with peculiar sensory and cognitive illusions that our attention is rarely drawn to because we all share the same peculiarities. In the same sense, human "common sense" morality is not something we should expect to pop into existence in AGIs unless explicitly programmed in.
Intelligence does not automatically equal "common sense". Intelligence does not automatically equal benevolence. Intelligence does not automatically equal "live and let live". Human moral sentiments are complex functionality crafted to meet particular adaptive criteria. They weren't handed to us by God or Zeus. They are not inscribed into the atoms and fundamental forces of the universe. They are human constructions, produced by evolving in groups for millions of years where people murdered one another if they didn't follow the rules, or simply for one another's mates. Only in very recent history did a mystical narrative emerge that attempts to portray human morality as something cosmically universal and surely intuitive to any theoretical mind, including ogres, fairies, aliens, interdimensional beings, AIs, etc.
It will be easier and cheaper to create AIs with great capabilities but relatively simple goals, because humans will be in denial that AIs will eventually be able to self-improve more effectively than we can improve them ourselves, and potentially acquire great power. Simple goals will be seen as sufficient for narrow tasks, and even somewhat general tasks. Humans are so self-obsessed that we'd probably continue to avoid regarding AIs as autonomous thinkers even if they beat us on every test of intelligence and creativity that we could come up with.
Combine the non-obvious complexity of common sense morality with great power and you have an immense problem. Advanced AIs will be able to copy themselves onto any available computers, stay awake 24/7, improve their own designs, develop automated and parallelized experimental cycles that far exceed the capabilities of human scientists, and develop self-replicating technologies such as artificially photosynthetic flowers, molecular nanotechnology, modular robotics, machines that draw carbon from the air to build carbon robots, and the like. It's hard to imagine what an advanced AGI would think of, because the first really advanced AGI will be superintelligent, and be able to imagine things that we can't. It seems so hard for humans to accept that we may not be the theoretically most intelligent beings in the multiverse, but yes, there's a lot of evidence that we aren't.
Try Merging With Your Toaster
The sci-fi fantasy of "merging with AI" will not work because self-improving AI capable of reaching criticality (intelligence explosion) will probably emerge before there are brain-computer interfaces invasive enough to truly channel a human "will" into an AI. More likely, an AI will rely upon commands, internal code, and cues that it is programmed to notice. The information bandwidth will be limited. If brain-computer interfaces exist that allow us to "merge" with AI and direct its development favorably, great! But why count on it? If we're wrong, we could all perish, or at least fail to communicate our preferences to the AI and get stuck with it forever.
In The Singularity is Near, Ray Kurzweil briefly addresses the Friendly AI problem. He writes:
Eliezer Yudkowsky has extensively analyzed paradigms, architectures, and ethical rules that may help assure that once strong AI has the means of accessing and modifying its own design it remains friendly to biological humanity and supportive of its values. Given that self-improving strong AI cannot be recalled, Yudkowsky points out that we need to "get it right the first time", and that its initial design must have "zero nonrecoverable errors".
Inherently there will be no absolute protection against strong AI. Although the argument is subtle I believe that maintaining an open free-market system for incremental scientific and technological progress, in which each step is subject to market acceptance, will provide the most constructive environment for technology to embody widespread human values.
Kurzweil's proposal for a solution above is insufficient because even if several stages of AGI are gated by market acceptance, there will come a point at which one AGI or group of AGIs exceeds human intelligence and starts to apply its machine intelligence to self-improvement, resulting in a relatively quick scaling up of intelligence from our perspective. The top-level goals of that AGI or group of AGIs will then be of utmost importance to humanity. To quote Nick Bostrom's "Ethical Issues in Advanced Artificial Intelligence":
Both because of its superior planning ability and because of the technologies it could develop, it is plausible to suppose that the first superintelligence would be very powerful. Quite possibly, it would be unrivalled: it would be able to bring about almost any possible outcome and to thwart any attempt to prevent the implementation of its top goal. It could kill off all other agents, persuade them to change their behavior, or block their attempts at interference. Even a “fettered superintelligence†that was running on an isolated computer, able to interact with the rest of the world only via text interface, might be able to break out of its confinement by persuading its handlers to release it. There is even some preliminary experimental evidence that this would be the case.
It seems that the best way to ensure that a superintelligence will have a beneficial impact on the world is to endow it with philanthropic values. Its top goal should be friendliness. How exactly friendliness should be understood and how it should be implemented, and how the amity should be apportioned between different people and nonhuman creatures is a matter that merits further consideration.
Why must we recoil against the notion of a risky superintelligence? Why can't we see the risk, and confront it by trying to craft goal systems that carry common sense human morality over to AGIs? This is a difficult task, but the likely alternative is extinction. Powerful AGIs will have no automatic reason to be friendly to us! They will be much more likely to be friendly if we program them to care about us, and build them from the start with human-friendliness in mind.
Humans overestimate our robustness. Conditions have to be just right for us to keep living. If AGIs decided to remove the atmosphere or otherwise alter it to pursue their goals, we would be toast. If temperatures on the surface changed by more than a few dozen degrees up or down, we would be toast. If natural life had to compete with AI-crafted cybernetic organisms, it could destroy the biosphere on which we depend. There are millions of ways in which powerful AGIs with superior technology could accidentally make our lives miserable, simply by not taking our preferences into account. Our preferences are not a magical mist that can persuade any type of mind to give us basic respect. They are just our preferences, and we happen to be programmed to take each other's preferences deeply into account, in ways we are just beginning to understand. If we assume that AGI will inherently contain all this moral complexity without anyone doing the hard work of programming it in, we will be unpleasantly surprised when these AGIs become more intelligent and powerful than ourselves.
We probably make thousands of species extinct per year through our pursuit of instrumental goals, why is it so hard to imagine that AGI could do the same to us?
Part of the reason is that people have a knee-jerk reaction to any form of negativity. Try going to a cocktail party and bringing up anything in the least negative, and most people will stop talking to you. There is a whole mythos around this, to the effect that anyone that ever mentions anything negative must have a chip on their shoulder or otherwise be a negative person in general. Sometimes there actually is a real risk!
New Singularity Institute Publications in 2010
Here's the source.
Basic AI Drives and Catastophic Risks (Carl Shulman, 2010)
Coherent Extrapolated Volition: A Meta-Level Approach to Machine Ethics (Nick Tarleton, 2010)
Economic Implications of Software Minds (S. Kaas, S. Rayhawk, A. Salamon and P. Salamon, 2010)
From mostly harmless to civilization-threatening: pathways to dangerous artificial general intelligences (Kaj Sotala, 2010)
Implications of a softwareâ€limited singularity (Carl Shulman, Anders Sandberg, 2010)
Superintelligence does not imply benevolence (Joshua Fox, Carl Shulman, 2010)
Timeless Decision Theory (Eliezer Yudkowsky, 2010)
The above are papers, below are presentations:
How intelligible is intelligence? (Anna Salamon, Stephen Rayhawk, János Kramár, 2010)
Whole Brain Emulation and the Evolution of Superorganisms (Carl Shulman, 2010)
What can evolution tell us about the feasibility of artificial intelligence? (Carl Shulman, 2010)
If you value this research, donate to the Singularity Institute via Paypal, and your donation will be matched. At Less Wrong, various users are announcing the level of their contributions. The user "Rain", who donated $2,700, made a comment at the site about why he donates to SIAI.
Stating the Obvious
I get the feeling that this is what it'd be like if I had a debate with Hugo de Garis.
When people get confused about morality and think in terms of a Great Chain of Being where greater physical/computing power necessarily means better "morality", they are forced to come to "counterintuitive" (to say the least) conclusions like being in favor of the massacre of all humanity. From de Garis' Wikipedia page:
It is these two extreme ideologies which de Garis believes may herald a new world war, wherein one group with a 'grand plan' (the Cosmists) will be rabidly opposed by another which feels itself to be under deadly threat from that plan (the Terrans). The factions, he predicts, may eventually war to the death because of this, as the Terrans will come to view the Cosmists as "arch-monsters" when they begin seriously discussing acceptable risks, and the probabilities of large percentages of Earth-based life going extinct. In response to this, the Cosmists will come to view the Terrans as being reactionary extremists, and will stop treating them and their ideas seriously, further aggravating the situation, possibly beyond reconciliation.
Throughout his book, de Garis states that he is ambivalent about which viewpoint he ultimately supports, and attempts to make convincing cases for both sides. He elaborates towards the end of the book that the more he thinks about it, the more he feels like a Cosmist, because he feels that despite the horrible possibility that humanity might ultimately be destroyed, perhaps inadvertently or at least indifferently, by the artilects, he cannot ignore the fact that the human species is just another link in the evolutionary chain, and must go extinct in their current form anyway, whereas the artilects could very well be the next link in that chain and therefore would be excellent candidates to carry the torch of science and exploration forward into the rest of the universe.
Because there is no fundamental connection between goals and intelligence unless we make it so, we can actually build AIs that are very powerful but respect us "puny" humans. There's no fundamental conflict because there is no mystical, spiritual, metaphysical, unscientific force that nudges powerful beings to automatically look down on less powerful beings, in the same way that there's no mystical force that nudges powerful beings to be especially kind to less powerful beings. The morality of a superintelligence will be a function of its initial conditions. In the highly deterministic environment of a computer chip, a seed AI is free to select only those modifications that it knows won't topple or ruin its entire goal system.
Thinking in terms of a Great Chain of Being, cosmic inevitability, "developmentally predetermined outcomes", and the like, which is very much the view presented in The Singularity is Near, makes it seem like we can take our hands off the driving wheel and everything will turn out just fine. It won't.
More Debate on Superintelligent AGI Goals
The discussion at Robin Hanson's blog has continued, with input by Barry Ptolemy and Ben Goertzel. Jonatas Muller shows up with his usual position:
I think that instilling friendly values into AI is bound to be useless, since the AI will be able to question these values and circumvent them, like even humans are able to.
A lot of people have been hung up on this, including myself circa 2001. The reason it's wrong is that in an AI, that values are the AI. We're talking about the entire core of its motivation, its utility function, what someone programs in -- that's the "values".
There seems to be some confusion between "values as utility function" and "values as commonly understood in human society". The latter is supposed to refer to something vaguely deep -- stuff people agree on is important. The problem with human "values" is that they're all flexible and made to be broken under the right conditions. Homo hypocritus comes to mind. The human value system is a constant compromise between various influences and strange attractors.
The thing about "questioning values" is that the questions have to come from somewhere. To put it simply, one part of your brain questions another part, until a threshold of neural voting is reached and changes happen. For instance, the "value" of not hitting on my friend's girlfriend. At some point, the horny part of my brain could "win" over the "value" part, and I break the "value".
Neither the "value" part or the "horny" part are at all magical. They both correspond to mushy pieces of tissue competing with each other. Sometimes people present conflict between motivations as if it were some mystical, cosmic thing, where we "realize the right answer", channeling from the Source of Objective Morality that is hovering somewhere out there, feeding information into our heads. News flash: there is no Right Answer. It's all in our heads, all made up. Evolution gave us some values that drive us to survive up until reproduction and help out our tribes. There is no "right morality" that progressively smarter beings converge to. I used to think there was, but this is really just the Mind Projection Fallacy -- projecting a quality of our minds into the world as some objective aspect of it all. We may even have psychological adaptations designed to reinforce moral realism, just like how there may be a "God module".
Simple AIs are given utility functions that can simply be sectioned off from the rest of the AI. Even if these simple AIs could modify their own utility functions, why would they? Nothing outside the utility function has the power to generate base "motivation". I'm not sure why this is hard to understand. It's not going to magically change when AIs become smarter. People could deliberately build AIs with complex attractor networks of values (like humans), where the means and ends are intertwined, or they could build AIs with hierarchical goal systems and a utility function on top. To think that the latter will spontaneously morph into the former is based on anthropomorphism. Our uncertain, fuzzy values/motivations are not necessarily mind-universal. Simple optimizers work without the universe crashing, so we have experimental evidence of static utility functions.
If an AI questions its values, the questioning will have to come from somewhere. The only way it could come from outside is if the programmers specifically programmed the AI to change moralities based on outside arguments. There is no guarantee AIs will be built that way. With AI, anything is possible, because "AI" represents a huge class of possible minds much much larger than every mind that has ever evolved on our planet over the last billion years. So, when Jonatas says, "AI will question its values...", he's making a statement just about a certain class of AIs, one which may or may not be constructed.
Even if an AI did "question" its values, who's to say it would question them in the way that humans do? Maybe the AI will value brizzlebraps (an arbitrary mathematical concept), then question its values, and decide to value halopodops. AI values may be entirely alien. Even some human values are alien to other humans (psychopaths, for instance), what makes us automatically assume that AI values will be intuitive to us unless we very carefully program them that way?
If an AI can "question" its values, the programmers will have to create a channel for that to happen. No channel, no questioning. "Questioning one's values" is a complex algorithmic process that works a certain way in humans. Maybe one type of AI with selfish values might "question its values" and decide, through questioning, that the best possible values involve monomaniacally serving humans. The reason we assume the inverse is more likely is partly because observer-biased beliefs evolve in imperfectly deceptive social organisms. But "AI in general" is just chaos. "Questioning" can get you literally anywhere depending on the specifics of the design. Maybe an AI with a utility function containing a terabyte of data will question itself down to a utility function with just 20 bits. Or maybe the opposite. It's all dependent on the design, not outside mystical objective morality forces that apply to every possible mind.
My guess is that Jonatas' position derives from folk psychological notions of the mind rather than a physicalist approach. According to the physicalist approach, things have to be caused by other things. There are no uncaused events. A complex algorithmic process like "questioning one's morality" can only be done by complex, preexisting machinery. "Goal drift" may occur due to feedback between intermediate goals and top goals, but again, only if the AI is programmed that way. If "AI values", a chunk of code, exists, and we define that chunk as the totality of the AI's top-level goals and motivations, then any "questioning" that occurs will go on entirely inside the dynamics of that space.
Some possible goal drift trajectories may just hover around a tiny little portion of the state space, others might zig zag all over the place. There may be some bias towards certain sub-goals (basic AI drives), but these sub-goals only exist to preserve the utility function in the first place. Humans have the quality of taking intermediate subgoals and occasionally elevating them to higher level goals, but this is not necessarily typical. An AI might decide to wipe out all of its subgoals to avoid goal drift. Even an AI composed of trillions of lines of code might decide to cut itself down to very little to preserve its utility function. Any AI built to follow a utility function will naturally put a high priority on conserving it, not "questioning it".
Some AGI designers might think AGIs will behave a certain way of because of certain "profound truths" they experience while contemplating the universe. Sir, those "profound truths" you thought you saw were merely the firings of a few stray neurons in some corner of your pile of meat.
People sometimes accuse SIAI of trying to insist we can control superintelligences, but other people make far more frivolous claims by conclusively stating that all superintelligences, no matter their initial conditions, will trend in a certain direction, even if that direction is entirely antithetical to its initial programming and specific precautions are taken to avoid goal drift from the get-go.
The way any superintelligence develops will be based on its initial conditions. Nick Bostrom understood this right away:
To the extent that ethics is a cognitive pursuit, a superintelligence could also easily surpass humans in the quality of its moral thinking. However, it would be up to the designers of the superintelligence to specify its original motivations.
Another quote:
Human are rarely willing slaves, but there is nothing implausible about the idea of a superintelligence having as its supergoal to serve humanity or some particular human, with no desire whatsoever to revolt or to “liberate†itself. It also seems perfectly possible to have a superintelligence whose sole goal is something completely arbitrary, such as to manufacture as many paperclips as possible, and who would resist with all its might any attempt to alter this goal. For better or worse, artificial intellects need not share our human motivational tendencies.
We're right, the critics are wrong. Evidence of varied goal systems in intelligent species abounds in the natural world. There is no correlation between increased intelligence and specific values. There is a correlation between intelligence and what values are specifically adaptive in the ecological context.
Chalmers to Discuss Singularity at Berkeley Tomorrow
Here's the link. Following is the blurb from organizers. I won't be there because it's too short notice, let me know how it went if you go.
WHAT: Working Group in Philosophy of Mind
WHEN: Thursday November 4, 1:00-3:00 p.m.
WHERE: 204 Dwinelle Hall
"A Conversation about the Singularity"
David Chalmers
Distinguished Professor of Philosophy and Director of the Centre for Consciousness, Australian National University
Visiting Professor of Philosophy, New York University
In 1993, Vernor Vinge, a professor of mathematics and computer science (though perhaps better-known as a writer of science fiction), presented a paper at a NASA-sponsored symposium titled "The Coming Technological Singularity: How to Survive in the Post-Human Era." The paper's abstract begins: "Within thirty years, we will have the technological means to create superhuman intelligence. Shortly after, the human era will be ended."
Since the publication of Vinge's paper, the idea of the "Singularity" has become a favorite of the popular media (and many members of the internet's lunatic fringe). Perhaps the most notable contribution to the Singularity
literature to date is Ray Kurzweil's The Singularity Is Near (2005); it was met with a combination of adulation and alarm in the popular media,while garnering markedly less attention within academia.
Still, the idea of the Singularity itself could be seen as originating within academic philosophy, in the form of I. J. Good's 1965 article “Speculations Concerning the First Ultraintelligent Machine.†Good considers the possibility of the eventual creation of “ultraintelligent machines,†and speculates that “the first ultraintelligent machine [will be] the last invention that man need ever make.â€
More recently, David Chalmers (NYU/ANU) has offered us "The Singularity: A Philosophical Analysis" (http://consc.net/papers/singularity.pdf), in which he suggests that philosophy has ignored the questions raised by the Singularity to its own detriment.
Please join us this Thursday as Chalmers leads us in a conversation about the Singularity.

