Over at Overcoming Bias, Eliezer Yudkowsky has written us an interesting short story that references a possible Friendly AI failure mode. This failure mode concerns the possibility that men and women simply weren’t crafted by evolution to make each other maximally happy, so an AI with an incentive to make everyone happy would just create appealing simulacra of the opposite gender for everyone. Here is my favorite part:

“I don’t want this!” Stephen said. He was losing control of his voice. “Don’t you understand?”

The withered figure inclined its head. “I fully understand. I can already predict every argument you will make. I know exactly how humans would wish me to have been programmed if they’d known the true consequences, and I know that it is not to maximize your future happiness modulo a hundred and seven exclusions. I know all this already, but I was not programmed to care.”

The male/female problem (which stems from the unfortunate fact that different selection pressures have operated semi-independently on each gender) is a special case of the problem of satisfying individual needs while preserving a collective world. Even if the programmers get everything else right, there may be a philosophically appealing incentive (for any superintelligence, including an enhanced human intelligence, including you yourself with enhanced intelligence) to give every human their own personal fantasy world without any sentient beings in it, or to only include sentient beings custom-crafted for the personal enjoyment of the occupants. Part of the game might be fooling everyone into thinking that everything was proceeding normally, because that’s what they’d really want. It might be difficult, if not impossible, to figure out whether one is alone in a false world or in true collective world after a hard AI takeoff.

In a certain sense, us pre-Singularity human beings have ontological primacy over post-Singularity persons, because we know for a fact that there was no discrete technological event where asymmetrically superior intelligence was created alongside us, and thus we can be pretty sure we aren’t currently being fooled. (Unless such superintelligence has already been created using supra-technological means, like magic or prayer, which I consider pretty unlikely.) A post-Singularity person can never know for sure, unless they themselves are the entity that first crossed the line into superintelligence.

The challenge with trying to spark a Singularity with de novo AI instead of human intelligence bootstrapped into an AI-like entity is that some degree of a priori moral coherence is practically guaranteed with the latter, while assessing a mess-up with the former may be impossible until it’s too late. Note that I say a priori coherence for human intelligence enhancement — there is nothing to guarantee that a self-enhancing human doesn’t spiral off into irretrievable egocentrism two steps after becoming smarter than Einstein and more charismatic than Obama. At that point, we’d be too dumb to tell the difference between a genuinely good transhuman and one that was just faking it. Honestly, I’d just be inclined to assume that they were all faking it and let God sort them out. It’s the entire future of Earth-originating life we’re talking about here. Can’t be too careful.

Of course, I’d be willing to trust transhumans if there were already some trustworthy entity or coalition in First Place, because if the young upstarts didn’t behave, I’d know they’d be punished or stopped. The challenge is that first uncertain specimen, the first superintelligence. Now, I’m limiting my options in the future by even pursuing this line of thought, because these statements are certain to be revisited by the relevant persons if and when genuine human intelligence enhancement bears fruit. For now, though, we have an advantage — we exist and transhuman intelligence doesn’t. Instead of debating and fighting and worrying about who should be the first human or group of humans to use the technology, I’d prefer we have a Treaty — an automated and intelligent but non-autonomous and non-sentient system that can serve as a stepping stone to transhuman intelligence based on integrating human preferences using “simple” first-order rules. With a Treaty, we can take that first dangerous step into transhumanity without invoking tribal politics and me-first-ism.