Wiki Interview With Eliezer/Ethics And Friendliness

From The Transhumanist Wiki
Jump to: navigation, search

Ethics and Friendliness


What is the difference between friendliness and Friendliness?

"friendliness" is an ordinary English word, and Friendliness is a technical term. It is very hard to give a simple definition of Friendliness, nearly as hard as giving a simple definition of "mind." The intuitive definition of Friendliness is that it's a nice, benevolent, white-hat AI that has a positive effect on the universe however you define that. Where you get specific, both in terms of what it means to be Friendly, and in terms of what kind of mind is Friendly, is where it becomes technical. From an operational standpoint, a Friendly seed AI is one that is not inferior in any respect to an uploaded altruistic human, in terms of who you want to send to deal with a Hard Takeoff. A Friendly AI is an AI that occupies the basic moral frame of reference that is defined by human altruism and human altruistic philosophies, including the ability to choose between moral frameworks and make improvements to philosophies learned from others. Friendly AI defines a kind of open-ended starting point for nonhuman moral philosophers.

(copied to Friendliness VS friendliness)


Do you believe that a Friendly seed AI will be in many ways superior to an uploaded altruistic human? If so, please elaborate.

The only way I could visualize that a Friendly seed AI could possibly be superior to an uploaded altruistic human is if the human is not completely altruistic, or if the complete altruism collapses to partial or total selfishness as intelligence increases. There would/should be substantially less chance of that happening in a Friendly seed AI because it would absorb altruism more than selfishness.

What would constitute "superiority" in this context?

Since the absorption is subject to a conscious decision by the Friendly AI, "superiority" means that you have a human upload and a Friendly AI whose programs are both suspended, and you decide to run the Friendly AI. It's hard to imagine a Friendly seed AI having capabilities that are in principle inaccessible to an uploaded human. After all, the upload can always just write a Friendly seed AI, assuming a certain native level of intelligence. Actually, I can think of two other reasons to run the Friendly AI. One is that you don't trust the human, s/he might be trustworthy but less knowably trustworthy than the Friendly AI. The second is that the Friendly AI might have much more existing experience at self-modification, and hence be less likely to execute a modification which simultaneously makes the self smarter and evil, which is what would be needed to break the obvious uploading protocols - i.e., make a modification of yourself and observe the copy running at a slower subjective rate before melding with the copy. That's three possible reasons to go with a Friendly AI instead of an upload, if you have both. A possible reason to go with an upload instead of a Friendly AI might be that you don't trust the Friendly AI. It might be trustworthy but less knowably trustworthy than the human. All of this is probably entirely moot, since I don't see the technological achievement of uploading in advance of AI as a realistic goal, no matter how much resources you put into it, since it requires computing power and advancement of Cognitive Science which is necessarily enormously in excess of that needed to create AI, thus uploading is basically out of the running, in my opinion.


What's the estimated difference between a friendly transhuman and a Friendly transhuman?

Well, going on the difference between the words, a friendly transhuman might be an upload who decides to take 99/100ths of the universe for verself but still leaves the remaining 1% to humanity, rather than exterminating us. I would consider this 'friendly' but not 'Friendly.'


Is attempting to create a Friendly transhuman AI the most important goal? If so, why?

Attempting to create a Friendly/altruistic Singularity is the most important goal, and attempting to create a Friendly seed/transhuman AI is the most achievable road to that goal. The Friendly/altruistic Singularity is the most important goal because on it rests the welfare of six billion humans, the continued existence of six billion humans in a world subject to Existential Risks, and the entire future that is dependent on the continuing growth and existence of Earth-originating intelligent life, including all the sentient beings who are part of that future. In turn, the Singularity is the critical point of humanity's entire future because humanity's entire current world was created by the advent of human intelligence - everything else is forces unleashed by that one differential between human intelligence and primate intelligence. The positive feedback loop of intelligence creating technology that improves intelligence means that we are about to see intelligence differentials of an enormously higher order with correspondingly greater impacts, which is why the Singularity is the critical point of the entire future of Earth-originating intelligent life.


Do you believe SIAI's Friendliness architecture will be "sufficient" at near-human-level AI? If so, why? What about transhuman-level AI?

The entire point of the Friendliness architecture is that it is open-ended. It wraps up not just everything the programmers thought of, but everything the programmers could have thought of, because the AI is looking at the programmers and understanding its own creation and thinking "What did the programmers not think of?" This sounds like magic but it is no more magical than General Intelligence - a human could think that way in a similar situation. What Friendly AI is, or is supposed to be, is a way of giving the AI everything we have in the way of altruistic moral philosophy. If the human architecture of altruism is sufficient at transhuman levels, allowing for the human/transhuman's ability to consciously make improvements, then it should be sufficient for a Friendly AI, if the Friendliness architecture works. You sorta have to separate out the question of "what is 'sufficient'" from "is this specific architecture 'sufficient'."

How big do you presently believe that "if" is?

The probability that the AI's Friendliness architecture works, just before the Singularity, depends a lot on what we see of the AI in the meantime. I would like to try for 90%. I don't think higher than that is theoretically possible for humans or AIs because of the different cognitive architectures. I don't think it would be possible to have more than a 90% knowable confidence of a human growing into an altruistic superintelligence, even given the knowledge that it is possible for Minds-In-General. The same holds true of a Friendly AI.


Eugene Leitl has repeatedly expressed serious concern and opposition to SIAI's proposed Friendliness architecture. Please summarize or reference his arguments and your responses.

Eugene Leitl believes that altruism is impossible period for a superintelligence - any superintelligence, whether derived from humans or AIs. Last time we argued this, which was long ago, and he may have changed his opinions in the meantime, I recall that he was arguing for this impossibility on the basis of "all minds necessarily want to survive as a subgoal, therefore this subgoal can stomp on a supergoal" plus "in a Darwinian scenario, any mind that does not want to survive, dies, therefore all minds will evolve independent drives toward survival." I consider the former to be flawed on grounds of Cognitive Science, and the latter to be flawed on the grounds that post-Singularity, conscious redesign outweighs the Design Pressures evolution can exert. Moreover, there are scenarios in which the original Friendly seed AI need not reproduce. Eugene believes that evolutionary design is the strongest form of design, much like John Smart, although possibly for different reasons, and hence discounts intelligence as a steering factor in the distribution of future minds. I do wish to note that I may be misrepresenting Eugene here. Anyway, what I have discussed with Eugene recently is his plans for a Singularity without AI, which, as I recall, requires uploading a substantial fraction of the entire human race, possibly without their consent, and spreading them all over the Solar System before running them, before any upload is run, except for a small steering committee, which is supposed to abstain from all intelligence enhancement, because Eugene doesn't trust uploads either. I would rate the pragmatic achievability of this scenario as zero, and possibly undesirable to boot, as Nick Bostrom and Eugene have recently been arguing on wta-talk.


In "CFAI: 1: Challenges of Friendly AI", you wrote, "Failure in Friendly AI has negative consequences that are also arbitrarily large." Please provide your reasoning behind that statement, as well as what specific or non-specific consequences may occur if Friendly AI is not successful.

A seed AI can become arbitrarily smart. Arbitrarily great intelligence means an arbitrarily great ability to create change, therefore success in Friendly AI is an arbitrarily large benefit, possibly up to and including the eternal elimination of all Existential Risks and the guaranteed continued existence of humanity in a future we enjoy. Failure has arbitrarily large consequences, up to and including the absolute extinction of humanity. It is the quintessential challenge of the Singularity, which may not be permanently avoided even in theory, and pragmatically cannot be avoided even temporarily except by increasing the risk, expressed in the quintessential form of Friendly seed AI.

[1/11/03: Some people argue that "the absolute extinction of humanity" is not the worst thing that could posibly happen as a result of Friendly AI Failure. See HyperExistential Risk.]

(copied to Wiki Commentary On CFAI)

Why do you believe things are so simple as the dichotomy you've expressed?

Not all things are complicated. Sometimes there are forces driving to extremes, rather than to the middle. Our contemporary civilization is a system that possesses two attractors - two stable states of the system. One attractor is transhuman intelligence creating still-smarter intelligence, the superintelligence attractor. One attractor is the extinction of intelligent life, which is also a stable state. If the system continues for long enough it will fall into one of the two attractors.

(copied to Wiki Commentary On CFAI)



Near the end of "CFAI: 3.4.3.6: Objective morality, moral relativism, and renormalization", you wrote, "A Friendly AI with causal validity semantics and a surface-level decision to renormalize verself has all the structure of a human philosopher. With sufficient Friendliness content plugged into that structure, ve can (correctly!) handle any moral or philosophical problem that could be handled by any human being." Does this statement still stand as of 2003?

It does. After a year or so, I still have not been able to conceive of any moral statement that stands outside of causal validity semantics and which does not also stand outside of a human. As far as I can tell this structure is complete.

(copied to Wiki Commentary On CFAI)


In your response to Bill Joy's essay, "Why the Future Doesn't Need Us", you wrote, "That issue [Friendly autonomous intelligence] is amazingly amazingly complicated. Maybe any sufficiently advanced mind converges to a single purpose. Maybe it's all arbitrary and the goals of a superintelligence are determined simply by momentum. Maybe the convergent purpose will actually prove beneficial to humanity, or maybe the momentum-goals go through inscrutable changes. My estimated probability of friendly superminds changes every couple of months, but it's never been more than 70% or less than 10%. Currently, I'd call it a coinflip." That was written in 2000. What is your present probability of superintelligence being friendly, and why?

It's edging up toward 75%, i.e., better than it's ever been, bearing in mind that the 90% quote I gave earlier is the probability I'd like to be able to assign to Friendly AI, given the possibility that friendly superminds are theoretically possible, and given the chance to watch the Friendly AI grow up, and also bearing in mind that under similar circumstances 90% would be around the upper limit for a human too, so in this case what we're talking about is a much more basic issue: whether friendly superminds in any form are theoretically possible, whether the future can bear any relation at all to present-day humanity, as well as the probability that, if there's a future for us, we can get there, but not the probability that we are extinguished first, so currently that probability is around 75%. I mention all these complexities because, like I said, this particular issue is amazingly amazingly complicated and there are some tempting errors to make in discussing it. Bill Joy's arguments provide an example of some of these errors. The reason that the possibility is edging upwards to 75% is because I see more of the human complexity for reasoning about morality. Structural completeness is not the same as content completeness. The structure of Friendliness still looks complete, but today I know more about the content, and the content - of a Friendly AI or a human altruist - looks more stable than it did when I knew less about it. To be even more precise, I know about more of the system, and hence it seems like there's intuitively less of a chance that something inside the system that I don't know about automatically overturns the applecart of altruism. (in a Friendly AI or a human). There's still a basic inscrutability about the Singularity, but that inscrutability isn't necessarily bad, although it isn't necessarily good either, and I would sum up this whole amazingly complicated picture by saying "75%."

(copied to Wiki Commentary On Why The Future Does Not Need Us)


Please elaborate on what you mean by the "inscrutability about the Singularity" not being necessarily good or bad.

Well, in the terms I previously introduced, I mean that the Singularity superunknown, the effect of increased intelligence, can supervene on the altruism equation in ways that we would regard as positive or negative. The reason I originally wasn't thinking in terms of Friendly AI (back in 1996-1999) is that my own experience tended to show above-average intelligence supervening on the equation to produce an increase in altruism (-- coughs modestly --) and, extrapolating forward, I figured that the best thing you could do would be to get out of the way. Today I model that effect as occurring relative to an existing philosophical base. The point is that the Singularity unknown is not necessarily a bad thing. Whatever small fraction of that I've gotten by virtue of being a little brighter by human standards has, in my experience, tended to produce completely positive effects, and this is not something to overlook, because I think it's a real effect that occurs for deep reasons. To be afraid of the unknown can be a flaw. To rely on the unknown can also be a flaw, but in human experience, being afraid is actually worse. Technology has moved us forward. Intelligence is a good thing, and fear of the unknown seems to translate into strategic stupidity and blind panic more often than not.

Personal tools
Namespaces
Variants
Actions
Content Navigation
Network
Community
Toolbox