Over at Overcoming Bias, Eliezer Yudkowsky has written us an interesting short story that references a possible Friendly AI failure mode. This failure mode concerns the possibility that men and women simply weren’t crafted by evolution to make each other maximally happy, so an AI with an incentive to make everyone happy would just create appealing simulacra of the opposite gender for everyone. Here is my favorite part:
“I don’t want this!” Stephen said. He was losing control of his voice. “Don’t you understand?”
The withered figure inclined its head. “I fully understand. I can already predict every argument you will make. I know exactly how humans would wish me to have been programmed if they’d known the true consequences, and I know that it is not to maximize your future happiness modulo a hundred and seven exclusions. I know all this already, but I was not programmed to care.”
The male/female problem (which stems from the unfortunate fact that different selection pressures have operated semi-independently on each gender) is a special case of the problem of satisfying individual needs while preserving a collective world. Even if the programmers get everything else right, there may be a philosophically appealing incentive (for any superintelligence, including an enhanced human intelligence, including you yourself with enhanced intelligence) to give every human their own personal fantasy world without any sentient beings in it, or to only include sentient beings custom-crafted for the personal enjoyment of the occupants. Part of the game might be fooling everyone into thinking that everything was proceeding normally, because that’s what they’d really want. It might be difficult, if not impossible, to figure out whether one is alone in a false world or in true collective world after a hard AI takeoff.
In a certain sense, us pre-Singularity human beings have ontological primacy over post-Singularity persons, because we know for a fact that there was no discrete technological event where asymmetrically superior intelligence was created alongside us, and thus we can be pretty sure we aren’t currently being fooled. (Unless such superintelligence has already been created using supra-technological means, like magic or prayer, which I consider pretty unlikely.) A post-Singularity person can never know for sure, unless they themselves are the entity that first crossed the line into superintelligence.
The challenge with trying to spark a Singularity with de novo AI instead of human intelligence bootstrapped into an AI-like entity is that some degree of a priori moral coherence is practically guaranteed with the latter, while assessing a mess-up with the former may be impossible until it’s too late. Note that I say a priori coherence for human intelligence enhancement — there is nothing to guarantee that a self-enhancing human doesn’t spiral off into irretrievable egocentrism two steps after becoming smarter than Einstein and more charismatic than Obama. At that point, we’d be too dumb to tell the difference between a genuinely good transhuman and one that was just faking it. Honestly, I’d just be inclined to assume that they were all faking it and let God sort them out. It’s the entire future of Earth-originating life we’re talking about here. Can’t be too careful.
Of course, I’d be willing to trust transhumans if there were already some trustworthy entity or coalition in First Place, because if the young upstarts didn’t behave, I’d know they’d be punished or stopped. The challenge is that first uncertain specimen, the first superintelligence. Now, I’m limiting my options in the future by even pursuing this line of thought, because these statements are certain to be revisited by the relevant persons if and when genuine human intelligence enhancement bears fruit. For now, though, we have an advantage — we exist and transhuman intelligence doesn’t. Instead of debating and fighting and worrying about who should be the first human or group of humans to use the technology, I’d prefer we have a Treaty — an automated and intelligent but non-autonomous and non-sentient system that can serve as a stepping stone to transhuman intelligence based on integrating human preferences using “simple” first-order rules. With a Treaty, we can take that first dangerous step into transhumanity without invoking tribal politics and me-first-ism.
Nice post.
“In a certain sense, us pre-Singularity human beings have ontological primacy over post-Singularity persons, because we know for a fact that there was no discrete technological event where asymmetrically superior intelligence was created alongside us, and thus we can be pretty sure we aren’t currently being fooled.”
How would we know that we aren’t simulations, without a convincing logical argument about all instances of simulations across the multiverse to estimate the fraction of beings in our epistemic position who are fooled?
“This failure mode concerns the possibility that men and women simply weren’t crafted by evolution to make each other maximally happy, so an AI with an incentive to make everyone happy would just create appealing simulacra of the opposite gender for everyone. Here is my favorite part”
– I would not consider this an outright failure mode. I suspect that a majority of people on the planet would prefer this “failure” to their current lives. I also suspect that a very significant portion of people in the UK would prefer it to their current lives.
I think that we will find that as we get into more subtle “FAI Failure modes”, the question as to whether there has been a failure or a success will lose any objective answer. This is because of moral anti-realism and the natural spread of human preferences, beliefs and opinions.
The same argument applies to the “personal fantasy world” failure mode. A lot of people would not count that as a failure.
A Treaty only works so long as all bodies/organizations capable of “ascending” obey it. Even an autonomous yet non-sentient Treaty Entity/Agent could be negated by sentient humans following their own rules. The best and possibly full-proof solution would be to simply have a genuinely good/honest/benevolent First Transhuman or better yet a network of such people from a variety of specializations that can aid the period of transition from humanity into transhumanity.
Carl, thanks! We don’t know, but even if we are simulations, it doesn’t seem like the simulators are intervening very enthusiastically. A bunch of people believe in God, maybe if God exists he is the alien(s) simulating us? It’s just that after the Singularity I hazard to guess that everything would be conspicuously interleaved with intelligence, so determining whether its source is deceiving us or not would be a key issue. Now, our world seems to be devoid of ambient intelligence, so while we’re free to speculate that such intelligence may exist and may be deceiving us, even if it does, it doesn’t seem to be having much of an impact on human interactions relative to the situation if it were entirely absent. So, to such hypothetical simulators, I double dog dare them to do something conspicuous. (It doesn’t hurt to ask for the sake of checking, even though others have before.)
Roko, interesting points. I’m happy you came around to moral anti-realism, by the way. That will make it easier for us to understand each other. Even though some might initially find these false utopias appealing, the character in Eliezer’s story didn’t, and I don’t think I would. As an independent moral agent, I am proposing directing the outcome into a different segment of state space. In the end, you’re right, making a distinction between a “false” utopia and a “real” utopia will be difficult if not impossible.
Isaac, it would be self-enforcing, at least initially. If you do go ahead and try to implement the First Transhuman idea, though, consider making me your test subject. I’m all set and ready to go.
You’d be a great choice.
good article (especially the pointer to yudkowsky porn)
No intervention in simulation because there are 2 many simulations to bother with individual ones? Has anyone written on simulation and the multiverse?
Do you think that in 20 years time the beings still around will have any major interest in us at all, about as much as we have with neolithic man, and about as much in common. Are we proceeding there any way (whether we know how or not)?. Fun ride, hope we get there.
I think you have spoke out some very interesting points. Not so many people would really have such thoughts you just had. I’m really amazed what you published. I will check out continuely on your website.