So the term “Artificial Intelligence” is now 50 years old. It’s been getting a lot of attention this month thanks to that fact. But what bothers me is, even if we could build a human-level AI today, we’d have no clue how to ensure that it stays nice to humans once it gains the ability to reprogram its goal system. So a successful breakthrough in AI could hurt us easily more than it helps us.

Also troubling is that anyone who gives this issue much thought seems to go off in wacky-and-wild directions or blindly anthropomorphize the dickens out of AI. For example there’s this guy.

It seems like there are several fundamental, indispensible insights into the structure of this problem, which has been called “Friendly AI”. Oddly, it seems like they were uncovered all at once by a single guy, Eliezer Yudkowsky, who is possibly the only person on the planet who has thought about the problem long and hard enough to be qualified to talk about it. In most areas like this, there are multiple experts, each with something worthwhile to say, but when it comes to Friendly AI, people just seem to fall flat. Here are some of the common suggestions, and why they are ultimately wrong:

Suggestion: “Let’s hardcode a set of morals and ethics into the AI.”

Problem: Hard-coded morals are clumsy in their inflexibility, and tend to contain loopholes and ambiguities which lead to unintended consequences. For example, in an Asimov story, robots programmed with the imperative “protect the human race” started infringing on basic human rights by getting too invasive about protecting people. We forget that when we tell each other to follow certain moral rules, they are being understood by a human with a brain that contains a massive number of automatically inbuilt assumptions, considerations, heuristics, and pieces of common sense. This complexity is unique to our species, and wouldn’t just be there by default in an AI we build.

Suggestion: “Let’s train the AI with positive and negative feedback.”

Problem: This approach is far too easy to mess up if implemented without any supporting architecture. For example, an AI trained on pictures of smiling human faces might end up valuing depictions of smiles without knowing the underlying facts and subtleties about why smiley faces are supposed to be good. We might assume it will figure them out automatically, but this is anthropomorphic. Human morality is a foreign tongue to a mind built from scratch, and teaching such a mind about morality will be like teaching a bug how to do calculus - it can theoretically be done, but you would need the technology to actually modify the bug’s brain on the neural level and give it the cognitive modules to recognize the problem, come up with a solution, and communicate it somehow. The latter actually seems significantly easier than Friendly AI.

Suggestion: “The AI will get smart enough to figure out right and wrong for itself.”

Problem: Morality, as we understand it, is a medley of terribly complex conglomerations of beliefs, motivations, biases, and actions unique to our particular species. There is no objective morality out there. Sometimes it just feels there is, because our brains program us to take our moral beliefs very seriously. They just seem so obvious that we are blind to the terabytes and terabytes of neurological complexity and millions of years of evolutionary history behind that obviousness. If we made modifications to only 1% of that information content, it could lead us down entirely different moral roads, and we’d feel like those were the real Good.

Suggestion: “It’s okay, we can just pull the plug if the AI is bad.”

Problem: As AI gets more sophisticated, it will become capable of outsmarting us, finding its own power sources, fabricating or ordering new body/brain parts, inventing entirely new technology, outperforming the best human experts in every field, think and act thousands or even millions of times faster than us, and possess whatever other powers come with being smarter-than-human. An ape can’t predict what a human can do, and we can’t predict what a superintelligent AI will do. We can predict it will have goals (the ones we gave it, or the ones it gives itself based on self-revisions), and take actions based on those goals, but we can’t be certain the meaning of the goals will stay the same over time, or the physical consequences of executing these goals will be what we expect, or even compatible with our continued existence.

The quick fixes people think of when first confronting the problem will simply not do. Therefore, the human species has an obligation to hire people to think about the problem full time, until there is a satisfactory solution to implement. Of course there are those, like Melanie Swan, who think that Friendly AI is impossible, and no matter what goals we program into the first AI, they will be thrown out the window. Nick Bostrom disagrees, as do many others. If an AI throws its goals out the window, it will throw them out because other goals demanded it - not because the Universe reaches into the AI’s brain and changes its goal content.

There will be a lot of flexibility in the creation of AI goal systems. It will be possible to build a mind that cares about nothing but cupcakes, with its only goal being to preserve that goal. Even if this AI then goes on to read the entire Internet, it will not matter one iota. If a goal is totally self-preserving then that is the final word. Humans can be stubborn, but not as stubborn as a mind that is not designed to be open to social persuasion or human moral discourse.

Want to look at Yudkowsky’s ideas on the Friendly AI problem? One short version is here, with longer versions to be found in “Creating Friendly AI” and “Coherent Extrapolated Volition”. His ideas on the topic are constantly changing and improving, so if you want to see more, donate to SIAI.