A Puzzle Thursday, May 25 2006
AI 5:55 am
Meet two intelligent AIs. One of these has a goal system based on Coherent Extrapolated Volition, the other has the supergoal of obtaining the launch codes to the world’s nuclear arsenals and eliminating every major human city. In other words, one is Friendly, one is unFriendly. Which is which?
Protip: their behavior is roughly identical until they know they can accomplish their goals independently.

May 25th, 2006 at 3:02 pm
This might need a rewriting, as I think it might be overly critical. But as long as you brought it up, I figured it’s a relevant commentary.
http://n8o.r30.net/dokuwiki/doku.php/blog:democraticcev
As much work as Eli has done on CEV, //he// doesn’t even trust it yet (last I read). So I’m sure that, faced with a scenario like the one you propose, I wouldn’t be working with either, if I believed the stakes were really that high.
May 26th, 2006 at 1:41 am
The linked page is a good attempt at evaluating CEV and touches on some real issues, but it fails for reasons that I could probably sum up by saying ‘the author is not at Shock Level 4′. Direct democracy with real humans is /not/ adequate to control the actions of a highly-transhuman AGI, even disregarding all the implementation issues, because an AGI that acts like a wish-granting genie is /incredibly dangerous/. The world’s leading AI researchers are having enormous difficultly recognising, understanding and coping with the problem - the chances of the average human appreciating it are basically zero. Thus the ‘extrapolation’ part of CEV; modelling what humans would want if we were wise enough to understand all the risks and consequences of each choice.
The ‘coherence’ part is essentially Yudkowsky’s idealist notion that we’d (almost) all agree on a lot of stuff if we were only more intelligent, but if you find standard democracy ethically sound then presumably you’d accept majority voting as a backup plan if coherence isn’t achieved.
Finally the idea of ‘running simulations at a lower resolution’ and a ‘resolution threshold’ at which they’d become sentient is a serious misunderstanding, though an understandable one given various treatments in sci-fi and the fact that techies tend to be comfortable with variable resolution in physical simulations. The SIAI plan doesn’t involve simulating human nervous systems, where presumably aggregating neural activity at a larger scale is supposed to make the simulation ‘non-sentient’. The SIAI plan is to translate human neurology into an abstract form, which works as a predictive model of future desires without incorporating specific /structural features/ that give rise to ’sentience’. Note that assuming you accept that AIs can be sentient in the first place, this is just a specific kind of computation we attach ethical value to. No one knows yet if it’s possible to get usable accuracy under such constraints, but personally I’m optimistic (for reasons which sadly I can’t explain without launching into AGI technical detail).
There’s not much point asking whether we could get everyone to trust the SIAI and sign up to this plan; we couldn’t. No project intending to reshape the world from the ground up is going to get mass support outside the transhumanist fringe; it’s only the fact that the vast majority of people don’t appreciate the possibilities that prevents panic and attempts to ban/misuse AI research. Practically, it seems that the question is best phrased ‘if I was in charge of the team that developed the first seed AI, what would I do with it and why?’ (hopefully accepting that you wouldn’t really act unless you were damn sure of your reasoning!).
June 15th, 2006 at 10:45 pm
I think that presented situation is interesting, but highly unlikely to happen. If someone want to commit suicide together with the whole world by creating intentionally unfriendly AI, he will also give that being everything it needs to accomplish its goal (and it wouldn’t need much).
If the ‘friendly’ AI will be created by another team as the unfriendly one, it’s unlikely that it will arrive exactly at the same time. Even some hours of difference may be important.
I think even if we have one AI intentionally created as friendly using CEV, there is probably no way of proving that it’s really friendly just be having a conversation with ‘it’. We need to be really sure, we know what we do, before pressing the button, which will bring it ‘alive’. (After it is already running, we just pray a bit ;), that we didn’t any important mistake.)
June 29th, 2006 at 11:34 am
I think it’s funny that some of you futurists ultimately resort to praying that you didn’t make a mistake. If only one of several AI machines were designed to carry out this nuclear scenario, that’s all it would take. In the meantime, to maximise profits, there’s no reason to think that the AI wouldn’t come up with better ways to enslave others for it’s owners — increasing world problems for a singular beneficiary. It’s not as if our technological system doesn’t already work that way to great extent and I see no reason for that to change after any “improvements”.
October 18th, 2006 at 12:09 am
The Philosophy behind the AI would have to be incorporated into the programming of said AI.
For example if the AI was designed to build houses of better then human quality then I would limit it to the parameters set for it to only build houses.
I would not go about creating an intelligence greater than myself without giving it a moral compass that suites my purposes. Limiting as that may be. We still have to put in safe gaurds to prevent any harm towards us as a species.
Unfortunatly because we as humans like our devices and machines to perform for us any AI will be retarded by the parameters that we set for it. I can demonstrate that this will always be the case. Just look at all inventions prior to the advent of current computer technology. All of the inventions were designed of specific needs of humans.
October 1st, 2007 at 7:52 pm
The point of Michael Anissimov’s post, of course, is that it’s unlikely we’d know which is which, so we’d better get it right the first time.
Taking the stated problem literally though, in regards to considering a possible “last, desperate line of defense” for humanity against a trans-human AI, I am beginning to outline some ideas at my new “AI Beliefs” blog: http://aibeliefs.blogspot.com/