Two frequent Accelerating Future visitors/commenters, Shane Legg and Roko, recently got together in London to talk Friendly AI. Here’s an excerpt:

It was great to discuss some of the problems that AGI/FAI research poses with someone else who is in the know about the subject. Shane has been at the cutting edge of both theoretical and practical approaches to AGI, including working with Marcus Hutter on AIXI and formal definitions of intelligence, and working for both Ben Goertzel and Peter Voss on some not-quite-yet successful practical attempts at AGI. He now works at the Gatsby computational neuroscience centre, a world-leading centre for the study of the human brain.

It was something of a disappointment that when we spoke about the dangers of AGI to humanity, I was the more optimistic one. Shane’s way of putting it was something like:

“We’re completely f**ked, we’ve got about 15 years before unfriendly AI kills us”

this is, considering his expertise, very worrying.

Heart-warming, isn’t it? The fact of the matter is that “optimism” for AI often means pessimism for humanity’s future, because self-improving AI without a moral sense is much easier to create than self-improving AI with a strong moral sense. Most futurists fail to grasp this for one of two simple reasons:

1) Belief in moral realism, that any sufficiently intelligent being will “discover” “the right morality” with little to no specific effort on the part of the programmers. Thankfully, moral realism is being annihilated by advancements in cognitive science, evolutionary psychology, and philosophy, and I’ve recently become more confident that any sufficiently intelligent/philosophically sophisticated/rational/whatever human being will realize that it’s bunk.

2) Expectation of enough product/consumer feedback cycles that by the time general AI rolls around, all Friendliness errors are pretty much corrected, including ones that would only emerge under strong self-improvement in the superintelligent realm. This is pretty short-sighted, looking back at the history of products that remained on the market for many years after they were proven dangerous because they were simply too useful to give up that easily. My hope is that this dynamic will shed light on some of the most obvious potential failures of Friendliness, which will in turn inspire widespread recognition that Friendly AI-completeness is not a free lunch.

There is also a line of investigation in the post where Roko writes, “Also, I claimed that there are a lot of people on the planet who would really care about the FAI problem if only someone would tell them about it in a non-crackpot way. These include philosophers, prominent environmentalists, most of the Guardian reading left-wing middle class, most scientists and academics. Quite a lot of people, really. And the main impediment to progress in FAI is lack of money, credibility and human resource.”

My perspective on this is that I really doubt it, but I won’t stop trying. Fantasies by fringe groups that the mainstream will adopt their fringe view are as old as fringe groups themselves and have a terribly low ratio of coming true. In my experience, presentation matters only marginally — either people care about Friendly AI or they don’t. Saying “we’re worried about AIs self-improving and converting the planet into computronium, killing everyone on the planet in the process” does not typically elicit a radically different reaction than saying, “we’re working on decision systems that embody the subtlety and complexity of human morality in a computationally tractable way”. Both sound far out, no matter how they’re dressed, and even if you get someone involved with “non-crackpot” language, further investigation demands “crackpot-like” language by necessity. If someone doesn’t have the perceptiveness to tell the difference between crackpot-sounding but genuinely important lines of investigation and everyday crackpot-ness, then any complex meme you transfer to them is likely to be warped beyond recognition before they pass it on to someone else.

This hypothesis is empirically supported by the fact that people either seem to think Friendly AI is a big deal very quickly after exposure (within a year) or not at all. There are very few counterexamples. People like Steve Jurvetson and Peter Thiel understood the Friendly AI problem shortly after being exposed to it, not because they were exposed to it the right way but because they were smart enough to see the essence of the issue. Others will follow not when the idea is presented the right way, but when it has enough associated status, through public endorsement by prominent figures, that they can mention it at a cocktail party (along with the prominent figure in question) in a way that gives them a substantial chance of sidestepping the ultimate horror of social ridicule.

It is interesting to observe that most people who get really interested in Friendly AI are either high-status enough to be confident in what their own brain tells them (compsci Ph.Ds like Matt Mahoney, VCs like Jurvetson and Thiel, thought leaders like EFF chairman Brad Templeton, and countless others) or low-status enough to believe what they want without backlash from their status-obsessed peers (myself at age 17). The middle is where you get into trouble.