I was recently informed that my abstract was accepted for presentation at the Society for Philosophy and Technology conference in Denton, TX, this upcoming May 26 – 29. You may have heard of their journal, TechnÃ©. Register now for the exciting chance to see me onstage, talking AI and philosophy. If you would volunteer to film me, that would make me even more excited, and valuable to our most noble cause.
Here’s the abstract:
Anthropomorphism and Moral Realism in Advanced Artificial Intelligence
Singularity Institute for Artificial Intelligence
Humanity has attributed human-like qualities to simple automatons since the time of the Greeks. This highlights our tendency to anthropomorphize (Yudkowsky 2008). Today, many computer users anthropomorphize software programs. Human psychology is extremely complex, and most of the simplest everyday tasks have yet to be replicated by a computer or robot (Pinker 1997). As robotics and Artificial Intelligence (AI) become a larger and more important part of civilization, we have to ensure that robots are capable of making complex, unsupervised decisions in ways we would broadly consider beneficial or common-sensical. Moral realism, the idea that moral statements can be true or false, may cause developers in AI and robotics to underestimate the effort required to meet this goal. Moral realism is a false, but widely held belief (Greene 2002). A common notion in discussions of advanced AI is that once an AI acquires sufficient intelligence, it will inherently know how to do the right thing morally. This assumption may derail attempts to develop human-friendly goal systems in AI by making such efforts seem unnecessary.
Although rogue AI is a staple of science fiction, many scientists and AI researchers take the risk seriously (Bostrom 2002; Rees 2003; Kurzweil 2005; Bostrom 2006; Omohundro 2008; Yudkowsky 2008). Arguments have been made that superintelligent AI — an intellect much smarter than the best human brains in practically every field — could be created as early as the 2030s (Bostrom 1998; Kurzweil 2005). Superintelligent AI could copy itself, potentially accelerate its thinking and action speeds to superhuman levels, and rapidly self-modify to increase its own intelligence and power further (Good 1965; Yudkowsky 2008). A strong argument can be made that superintelligent machines will eventually become a dominant force on Earth. An “intelligence explosion” could result from communities or individual artificial intelligences rapidly self-improving and acquiring resources.
Most AI rebellion in fiction is highly anthropomorphic — AIs feeling resentment towards their creators. More realistically, advanced AIs might pursue resources as instrumental objectives in pursuit of a wide range of possible goals, so effectively that humans could be deprived of space or matter we need to live (Omohundro 2008). In this manner, human extinction could come about through the indifference of more powerful beings rather than outright malevolence. A central question is, “how can we design a self-improving AI that remains friendly to humans even if it eventually becomes superintelligent and gains access to its own source code?” This challenge is addressed in a variety of works over the last decade (Yudkowsky 2001; Bostrom 2003; Hall 2007; Wallach 2008) but is still very much an open problem.
A technically detailed answer to the question, “how can we create a human-friendly superintelligence?” is an interdisciplinary task, bringing together philosophy, cognitive science, and computer science. Building a background requires analyzing human motivational structure, including human-universal behaviors (Brown 1991), and uncovering the hidden complexity of human desires and motivations (Pinker 1997) rather than viewing Homo sapiens as a blank slate onto which culture is imprinted (Pinker 2003). Building artificial intelligences by copying human motivational structures may be undesirable because human motivations given capabilities of superintelligence and open-ended self-modification could be dangerous. Such AIs might “wirehead” themselves by stimulating their own pleasure centers at the expense of constructive or beneficent activities in the external world. Experimental evidence of the consequences of direct stimulation of the human pleasure center is very limited, but we have anecdotal evidence in the form of drug addiction.
Since artificial intelligence will eventually exceed human capabilities, it is crucial that the challenge of creating a stable human-friendly motivational structure in AI is solved before the technology reaches a threshold level of sophistication. Even if advanced AI is not created for hundreds of years, many fruitful philosophical questions are raised by the possibility (Chalmers 2010).
Bostrom, N. (2002). “Existential Risks: Analyzing Human Extinction Scenarios”. Journal of Evolution and Technology, 9(1).
Bostrom, N. (2003). “Ethical Issues in Advanced Artificial Intelligence”. Cognitive, Emotive and Ethical Aspects of Decision Making in Humans and in Artificial Intelligence.
Bostrom, N. (2006). “How long before superintelligence?”. Linguistic and Philosophical Investigations 5 (1): 11â€“30.
Brown, D. (1991). Human Universals. McGraw Hill.
Chalmers, D. (2010). “The Singularity: a Philosophical Analysis”. Presented at the Singularity Summit 2010 in New York.
Good, I. J. (1965). “Speculations Concerning the First Ultraintelligent Machine”, Advances in Computers, vol 6, Franz L. Alt and Morris Rubinoff, eds, pp 31-88, Academic Press.
Greene, J. (2002). The Terrible, Horrible, No Good, Very Bad Truth about Morality and What to Do About it. Doctoral Dissertation for the Department of Philosophy, Princeton University, June 2002.
Hall, J.S. (2007). Beyond AI: Creating the Conscience of the Machine. Amherst: Prometheus Books.
Omohundro, S. (2008). “The Basic AI Drives”. Proceedings of the First AGI Conference, Volume 171, Frontiers in Artificial Intelligence and Applications, edited by P. Wang, B. Goertzel, and S. Franklin, February 2008, IOS Press.
Pinker, S. (1997). How the Mind Works. Penguin Books.
Pinker, S. (2003). The Blank Slate: the Modern Denial of Human Nature. Penguin Books.
Rees, M. (2003). Our Final Hour: A Scientist’s Warning : how Terror, Error, and Environmental Disaster Threaten Humankind’s Future in this Century – on Earth and Beyond. Basic Books.
Wallach, W. & Allen, C. (2008). Moral Machines: Teaching Robots Right from Wrong. Oxford University Press.
Yudkowsky, E. (2001). Creating Friendly AI. Publication of the Singularity Institute for Artificial Intelligence.
Yudkowsky, E. (2008). “Artificial Intelligence as a positive and negative factor in global risk”. In N. Bostrom and M. Cirkovic (Eds.), Global Catastrophic Risks (pp. 308-343). Oxford University Press.