Some folks, like Aaron Saenz of Singularity Hub, were surprised that the NPR piece framed the Singularity as “the biggest threat to humanity”, but that’s exactly what the Singularity is. The Singularity is both the greatest threat and greatest opportunity to our civilization, all wrapped into one crucial event. This shouldn’t be surprising — after all, intelligence is the most powerful force in the universe that we know of, obviously the creation of a higher form of intelligence/power would represent a tremendous threat/opportunity to the lesser intelligences that come before it and whose survival depends on the whims of the greater intelligence/power. The same thing happened with humans and the “lesser” hominids that we eliminated on the way to becoming the #1 species on the planet.
Why is the Singularity potentially a threat? Not because robots will “decide humanity is standing in their way”, per se, as Aaron writes, but because robots that don’t explicitly value humanity as a whole will eventually eliminate us by pursuing instrumental goals not conducive to our survival. No explicit anthropomorphic hatred or distaste towards humanity is necessary. Only self-replicating infrastructure and the smallest bit of negligence.
Why will advanced AGI be so hard to get right? Because what we regard as “common sense” morality, “fairness”, and “decency” are all extremely complex and non-intuitive to minds in general, even if they seem completely obvious to us. As Marvin Minsky said, “Easy things are hard.” Even something as simple as catching a ball requires a tremendous amount of task-specific computation. If you read the first chapter of How the Mind Works, the bestselling book by Harvard psychologist Stephen Pinker, he harps on this for almost 100 pages.
Basic AI Drives
There are “basic AI drives” we can expect to emerge in sufficiently advanced AIs, almost regardless of their initial programming. Across a wide range of top goals, any AI that uses decision theory will want to 1) self-improve, 2) have an accurate model of the world and consistent preferences (be rational), 3) preserve their utility functions, 4) prevent counterfeit utility, 5) be self-protective, and 6) acquire resources and use them efficiently. Any AI with a sufficiently open-ended utility function (absolutely necessary if you want to avoid having human beings double-check every decision the AI makes) will pursue these “instrumental” goals (instrumental to us, terminal to an AI without motivations strong enough to override them) indefinitely as long as it can eke out a little more utility from doing so. AIs will not have built in satiation points where they say, “I’ve had enough”. We have to program those in, and if there’s a potential satiation point we miss, the AI will just keep pursuing “instrumental to us, terminal to it” goals indefinitely. The only way we can keep an AI from continuously expanding like an endless nuclear explosion is to make it to want to be constrained (entirely possible — AIs would not have anthropomorphic resentment against limitations unless such resentment were helpful to accomplishing its top goals), or design it to replace itself with something else and shut down.
The easiest kind of advanced AGI to build would be a type of idiot savant — a machine extremely good at performing the tasks we want, and which acts reasonably within the domain for which it was intended, but starts to act in unexpected ways when ported into domains outside those that the programmers anticipated. To quote Omohundro:
Surely no harm could come from building a chess-playing robot, could it? In this paper we argue that such a robot will indeed be dangerous unless it is designed very carefully. Without special precautions, it will resist being turned off, will try to break into other machines and make copies of itself, and will try to acquire resources without regard for anyone elseâ€™s safety. These potentially harmful behaviors will occur not because they were programmed in at the start, but because of the intrinsic nature of goal driven systems.
Goal-Driven Systems Care About Their Goals, Not You
Goal-driven systems strive to achieve their goals. “Common sense”, “decency”, “respect”, “the Golden Rule”, and other “intuitive” human concepts, which are extremely complicated black boxes, need not enter into the picture. Again, I strongly recommend the first chapter of How the Mind Works to get a better grasp of how the way we think is not “obvious”, but highly contingent on our evolutionary history and the particular constraints of our brains. Our worlds are filled with peculiar sensory and cognitive illusions that our attention is rarely drawn to because we all share the same peculiarities. In the same sense, human “common sense” morality is not something we should expect to pop into existence in AGIs unless explicitly programmed in.
Intelligence does not automatically equal “common sense”. Intelligence does not automatically equal benevolence. Intelligence does not automatically equal “live and let live”. Human moral sentiments are complex functionality crafted to meet particular adaptive criteria. They weren’t handed to us by God or Zeus. They are not inscribed into the atoms and fundamental forces of the universe. They are human constructions, produced by evolving in groups for millions of years where people murdered one another if they didn’t follow the rules, or simply for one another’s mates. Only in very recent history did a mystical narrative emerge that attempts to portray human morality as something cosmically universal and surely intuitive to any theoretical mind, including ogres, fairies, aliens, interdimensional beings, AIs, etc.
It will be easier and cheaper to create AIs with great capabilities but relatively simple goals, because humans will be in denial that AIs will eventually be able to self-improve more effectively than we can improve them ourselves, and potentially acquire great power. Simple goals will be seen as sufficient for narrow tasks, and even somewhat general tasks. Humans are so self-obsessed that we’d probably continue to avoid regarding AIs as autonomous thinkers even if they beat us on every test of intelligence and creativity that we could come up with.
Combine the non-obvious complexity of common sense morality with great power and you have an immense problem. Advanced AIs will be able to copy themselves onto any available computers, stay awake 24/7, improve their own designs, develop automated and parallelized experimental cycles that far exceed the capabilities of human scientists, and develop self-replicating technologies such as artificially photosynthetic flowers, molecular nanotechnology, modular robotics, machines that draw carbon from the air to build carbon robots, and the like. It’s hard to imagine what an advanced AGI would think of, because the first really advanced AGI will be superintelligent, and be able to imagine things that we can’t. It seems so hard for humans to accept that we may not be the theoretically most intelligent beings in the multiverse, but yes, there’s a lot of evidence that we aren’t.
Try Merging With Your Toaster
The sci-fi fantasy of “merging with AI” will not work because self-improving AI capable of reaching criticality (intelligence explosion) will probably emerge before there are brain-computer interfaces invasive enough to truly channel a human “will” into an AI. More likely, an AI will rely upon commands, internal code, and cues that it is programmed to notice. The information bandwidth will be limited. If brain-computer interfaces exist that allow us to “merge” with AI and direct its development favorably, great! But why count on it? If we’re wrong, we could all perish, or at least fail to communicate our preferences to the AI and get stuck with it forever.
In The Singularity is Near, Ray Kurzweil briefly addresses the Friendly AI problem. He writes:
Eliezer Yudkowsky has extensively analyzed paradigms, architectures, and ethical rules that may help assure that once strong AI has the means of accessing and modifying its own design it remains friendly to biological humanity and supportive of its values. Given that self-improving strong AI cannot be recalled, Yudkowsky points out that we need to “get it right the first time”, and that its initial design must have “zero nonrecoverable errors”.
Inherently there will be no absolute protection against strong AI. Although the argument is subtle I believe that maintaining an open free-market system for incremental scientific and technological progress, in which each step is subject to market acceptance, will provide the most constructive environment for technology to embody widespread human values.
Kurzweil’s proposal for a solution above is insufficient because even if several stages of AGI are gated by market acceptance, there will come a point at which one AGI or group of AGIs exceeds human intelligence and starts to apply its machine intelligence to self-improvement, resulting in a relatively quick scaling up of intelligence from our perspective. The top-level goals of that AGI or group of AGIs will then be of utmost importance to humanity. To quote Nick Bostrom’s “Ethical Issues in Advanced Artificial Intelligence”:
Both because of its superior planning ability and because of the technologies it could develop, it is plausible to suppose that the first superintelligence would be very powerful. Quite possibly, it would be unrivalled: it would be able to bring about almost any possible outcome and to thwart any attempt to prevent the implementation of its top goal. It could kill off all other agents, persuade them to change their behavior, or block their attempts at interference. Even a â€œfettered superintelligenceâ€ that was running on an isolated computer, able to interact with the rest of the world only via text interface, might be able to break out of its confinement by persuading its handlers to release it. There is even some preliminary experimental evidence that this would be the case.
It seems that the best way to ensure that a superintelligence will have a beneficial impact on the world is to endow it with philanthropic values. Its top goal should be friendliness. How exactly friendliness should be understood and how it should be implemented, and how the amity should be apportioned between different people and nonhuman creatures is a matter that merits further consideration.
Why must we recoil against the notion of a risky superintelligence? Why can’t we see the risk, and confront it by trying to craft goal systems that carry common sense human morality over to AGIs? This is a difficult task, but the likely alternative is extinction. Powerful AGIs will have no automatic reason to be friendly to us! They will be much more likely to be friendly if we program them to care about us, and build them from the start with human-friendliness in mind.
Humans overestimate our robustness. Conditions have to be just right for us to keep living. If AGIs decided to remove the atmosphere or otherwise alter it to pursue their goals, we would be toast. If temperatures on the surface changed by more than a few dozen degrees up or down, we would be toast. If natural life had to compete with AI-crafted cybernetic organisms, it could destroy the biosphere on which we depend. There are millions of ways in which powerful AGIs with superior technology could accidentally make our lives miserable, simply by not taking our preferences into account. Our preferences are not a magical mist that can persuade any type of mind to give us basic respect. They are just our preferences, and we happen to be programmed to take each other’s preferences deeply into account, in ways we are just beginning to understand. If we assume that AGI will inherently contain all this moral complexity without anyone doing the hard work of programming it in, we will be unpleasantly surprised when these AGIs become more intelligent and powerful than ourselves.
We probably make thousands of species extinct per year through our pursuit of instrumental goals, why is it so hard to imagine that AGI could do the same to us?
Part of the reason is that people have a knee-jerk reaction to any form of negativity. Try going to a cocktail party and bringing up anything in the least negative, and most people will stop talking to you. There is a whole mythos around this, to the effect that anyone that ever mentions anything negative must have a chip on their shoulder or otherwise be a negative person in general. Sometimes there actually is a real risk!