Friendly AI

From The Transhumanist Wiki
Jump to: navigation, search

Template:Wikilink


From the SL4 Lexicon:

Friendly AI (FAI):

The field of study concerned with the production of human-benefiting, non-human-harming actions in Artificial Intelligence systems that have advanced to the point of making real-world plans in pursuit of goals. The term "Friendly AI" was chosen not to imply a particular internal solution, such as duplicating the human friendship instincts, but rather to embrace any set of external behaviors that a human would call "friendly". In this sense, "Friendly AI" can be used as an umbrella term for multiple design methodologies Yudkowsky01. Also used to refer to a completed Friendly AI itself. An AI that is Friendly not because it is being forced, but because that's what it wants to do. See also 24 Definitions of Friendly AI, Creating Friendly AI, seed AI, Singularity, Friendly AI.




AI based upon Friendliness Theory; AI built with FriendlinessArchitecture and trained with FriendlinessContent.


Friendly AI will be developed to occupy the normative moral frame of reference of humanity, which is defined by human altruism, human altruistic philosophies, and the complex adaptations that underlie the metamorality and morality that we all share. Friendly AI will be developed to learn and improve the altruistic, philosophical content that it receives from programmers, and to remove any selfish content that is accidentally conveyed.

Friendly AI with Human-Similar General Intelligence will be able to self-improve its altruism, morals, philosophy, and wisdom, with the assistance of programmers, to continually bring itself closer to Friendly behavior. ("Technically, a Friendly AI doesn't have supergoal 'content'; the Friendly AI has an external referent labeled 'Friendliness,' and a series of probabilistic assertions about that external referent, derived from sensory information such as programmer affirmations" (Eliezer Yudkowsky).) Friendly AI will have all possible Friendliness content that any indivdiual or group of programmers could convey to it. Friendly AI will be able to improve that content to develop a human-surpassing degree of philosophical and moral integrity.

For more about the "normative moral frame of reference of humanity," see Wiki Interview With Eliezer/General Information.

24 Definitions of Friendly AI

see also SIAI's page about Friendly AI

-- Anand


This quote could have been describing Friendliness research;

`How do you cope with that kind of ignorance? Dwelling on it was enough to make
 his faithfully simulated body sick to the stomach. Part of him screamed that
 the only thing to do in the face of such barely comprehensible stakes was to
 bow out, to withdraw from any possibility of intervention - as if showing the
 appropriate humility was more important than the outcome.
 But Mariama refused to be cowed by the gravity of the situation. "We keep
 exploring," she insisted. "We keep narrowing the gap between what we know and
 what we need to know." "What we need to know is when we have no choice but to
 stop gathering information and make a stand."' - Greg Egan, 'Schild's Ladder'

-- Starglider

At the end of Manna, the main character is asking why someone can't subvert the AI system that people have merged with. The other character explains, "For the same reason someone can't get you to cut off your arm." The super-AI system receives information, and considers it. If it looks obviously bad, it won't do it.

It seems plausible to me that there might be some sort of "Con Artist" capable of taking on such an AI system. It might not be possible for it to be a human, because the AI would be too smart for that. But a rival AI system? Who knows.

I can imagine if there were two super-AI systems with different "Friendly AI" systems in side of them, that they might try to subvert the other, for the sake of "Friendliness"..!

-- Lion Kimbro

It strikes me that Friendly AI would likely come out of initial rule maintenance systems.

As we build up Social Software, we seem to be headed towards making it easy to codify social rules, and teach them to the system. (You may be interested in a refering thread on CommunityWiki:HowWikiWorks.)

This itself isn't terribly interesting. What is interesting is that it is conceivable that we will teach computers to "watch out" for us. That is, if the system is being gamed in a way that leads towards someone having undo or overwealming power, the system may automatically inform everyone of the dangerous situation. People can then respond.

But we can imagine that people who have been duped a few times will put safeguards and greater intelligence into the mechanical system.

Then the system may turn into a Friendly AI.

Of course, it could also not. It could be terrible. But let's aim for not terrible.

-- Lion Kimbro

That probably wouldn't work because the rules wouldn't be predesigned to stay stable under self-enhancement, and the system would not have the inbuilt capacity to correctly infer the reasoning behind the rules and generalise and extend them correctly. Once the Bayesian Boundary is passed and the Goal System becomes capable of both self-modification and self-protection, we'd probably end up with either planetkill or Prime Intellect style stagnation. We might get lucky if someone followed the 'In Case Of Singularity Break Glass' design rules from CFAI (a bucketload of moral complexity with the pointer 'look here for answers'), but frankly I doubt that would work if the system isn't structurally correct to start with.

-- Starglider

So how do you stop it from getting religion?

-- Mykael

Religion is inherently irrational, at least under any sane base prior. The delusional behaviour arises from specific flaws in the human mental architecture that will not exist in an AI unless you are silly enough to put them there. Exactly replicating human failings is actually no mean feat; humans are a tiny, tiny subset of the space of possible mental architectures. That said, while any arbitrary AI is not likely to exhibit humanlike flaws, fail your AGI Design skill roll and it may well suffer from entirely new and exciting forms of mental disability. More likely than not irrationality will get corrected early in Recursive Self-Enhancement, but quite likely not before completely mangling your Goal System.

Proving that an AGI design will not 'get religion' should actually be an easy warmup exercise for an FAI designer. Even I can manage this one for APEX, just by looking at the structure of the self-environment embedding model and related causal axioms that underpin the derrivation of probabilistic reasoning, though I doubt I could do it in enough rigour to convince a mathematician. -- Starglider


  • This project or action is currently active - there is currently at least one person working on it.


<imagemap>localurl</imagemap>

Personal tools
Namespaces
Variants
Actions
Content Navigation
Network
Community
Toolbox