Friendly AI - We Are Clueless. Friday, Jul 21 2006
friendly ai 4:31 am
So the term “Artificial Intelligence” is now 50 years old. It’s been getting a lot of attention this month thanks to that fact. But what bothers me is, even if we could build a human-level AI today, we’d have no clue how to ensure that it stays nice to humans once it gains the ability to reprogram its goal system. So a successful breakthrough in AI could hurt us easily more than it helps us.
Also troubling is that anyone who gives this issue much thought seems to go off in wacky-and-wild directions or blindly anthropomorphize the dickens out of AI. For example there’s this guy.
It seems like there are several fundamental, indispensible insights into the structure of this problem, which has been called “Friendly AI”. Oddly, it seems like they were uncovered all at once by a single guy, Eliezer Yudkowsky, who is possibly the only person on the planet who has thought about the problem long and hard enough to be qualified to talk about it. In most areas like this, there are multiple experts, each with something worthwhile to say, but when it comes to Friendly AI, people just seem to fall flat. Here are some of the common suggestions, and why they are ultimately wrong:
Suggestion: “Let’s hardcode a set of morals and ethics into the AI.”
Problem: Hard-coded morals are clumsy in their inflexibility, and tend to contain loopholes and ambiguities which lead to unintended consequences. For example, in an Asimov story, robots programmed with the imperative “protect the human race” started infringing on basic human rights by getting too invasive about protecting people. We forget that when we tell each other to follow certain moral rules, they are being understood by a human with a brain that contains a massive number of automatically inbuilt assumptions, considerations, heuristics, and pieces of common sense. This complexity is unique to our species, and wouldn’t just be there by default in an AI we build.
Suggestion: “Let’s train the AI with positive and negative feedback.”
Problem: This approach is far too easy to mess up if implemented without any supporting architecture. For example, an AI trained on pictures of smiling human faces might end up valuing depictions of smiles without knowing the underlying facts and subtleties about why smiley faces are supposed to be good. We might assume it will figure them out automatically, but this is anthropomorphic. Human morality is a foreign tongue to a mind built from scratch, and teaching such a mind about morality will be like teaching a bug how to do calculus - it can theoretically be done, but you would need the technology to actually modify the bug’s brain on the neural level and give it the cognitive modules to recognize the problem, come up with a solution, and communicate it somehow. The latter actually seems significantly easier than Friendly AI.
Suggestion: “The AI will get smart enough to figure out right and wrong for itself.”
Problem: Morality, as we understand it, is a medley of terribly complex conglomerations of beliefs, motivations, biases, and actions unique to our particular species. There is no objective morality out there. Sometimes it just feels there is, because our brains program us to take our moral beliefs very seriously. They just seem so obvious that we are blind to the terabytes and terabytes of neurological complexity and millions of years of evolutionary history behind that obviousness. If we made modifications to only 1% of that information content, it could lead us down entirely different moral roads, and we’d feel like those were the real Good.
Suggestion: “It’s okay, we can just pull the plug if the AI is bad.”
Problem: As AI gets more sophisticated, it will become capable of outsmarting us, finding its own power sources, fabricating or ordering new body/brain parts, inventing entirely new technology, outperforming the best human experts in every field, think and act thousands or even millions of times faster than us, and possess whatever other powers come with being smarter-than-human. An ape can’t predict what a human can do, and we can’t predict what a superintelligent AI will do. We can predict it will have goals (the ones we gave it, or the ones it gives itself based on self-revisions), and take actions based on those goals, but we can’t be certain the meaning of the goals will stay the same over time, or the physical consequences of executing these goals will be what we expect, or even compatible with our continued existence.
The quick fixes people think of when first confronting the problem will simply not do. Therefore, the human species has an obligation to hire people to think about the problem full time, until there is a satisfactory solution to implement. Of course there are those, like Melanie Swan, who think that Friendly AI is impossible, and no matter what goals we program into the first AI, they will be thrown out the window. Nick Bostrom disagrees, as do many others. If an AI throws its goals out the window, it will throw them out because other goals demanded it - not because the Universe reaches into the AI’s brain and changes its goal content.
There will be a lot of flexibility in the creation of AI goal systems. It will be possible to build a mind that cares about nothing but cupcakes, with its only goal being to preserve that goal. Even if this AI then goes on to read the entire Internet, it will not matter one iota. If a goal is totally self-preserving then that is the final word. Humans can be stubborn, but not as stubborn as a mind that is not designed to be open to social persuasion or human moral discourse.
Want to look at Yudkowsky’s ideas on the Friendly AI problem? One short version is here, with longer versions to be found in “Creating Friendly AI” and “Coherent Extrapolated Volition”. His ideas on the topic are constantly changing and improving, so if you want to see more, donate to SIAI.

July 25th, 2006 at 2:08 pm
Hired Help
Michael Anissimov writes that achieving Friendly AI is a serious proposition — so serious, in fact, that we really ought to go ahead and pay somebody to do it. It’s really not that radical a proposition. You want a radical…
July 25th, 2006 at 7:17 pm
All of this hand-wringing over FAI always lead me to advocate emulation of existing human minds. It’s easy to understand why Moravec-style uploads (what I’d call “transformative” uploads) are quite distant, I think separate emulations of not just brains in general, but specific humans (based on hi-res non-invasive brainscans - what I’ll call “duplicative” uploads, or “duploads”) is achievable along roughly the same timelines as minds-in-general synthetic machine intelligences that could be classified as “human-level” (whatever that means). In fact, There’s an argument that such a brute force approach could beat synthetics.
In such a case, the old saw about whether men, never mind machines, can think comes into play. Never mind ethical machines - we can’t even figure out how to get ethical humans. Nevertheless, we have less-than-perfect institutions put in place that prove useful in mitigating these problems.
While we obviously have to throw up our hands at predicting the behaviour of AI, I can’t help but wonder how much of that is simply because we don’t know how to do AI yet. The reason I think that is is because if we take the only model if a mind we know and give it the faculties we expect expect AIs to have (sometimes anthropomorphism is warranted, ya know), we have a much clearer picture of what to expect, based on what we can imagine that //we// would do with such capabilities. Even if we don’t understand how the human brain works, never mind an artificial one, it’s still plausible to both emulate something we don’t understand with enough computing power, //and// predict what it will think and do - not based on analyzing it (although we’ll have that capability as well), but based on the rich experience we already have with human beings.
The fear of the unknown alien of AI then becomes the somewhat more tractable fear of other people - a fear that, while not entirely solved (as evidenced by current events in the middle east), is a lot better understood (eg modern civil society) than predicting the effects of complex and uninvented technology.
This isn’t to say FAI shouldn’t be solved. I just think we should try to beat wholly synthetic minds to “human level” intelligence with duploads. In fact, a dupload with many of the extra capacities of AIs might be just what “we” need to keep “them” in line. Or, better yet, they could very well render synthetic minds obsolete.
July 30th, 2006 at 4:33 pm
Nato Welch wrote:
All of this hand-wringing over FAI always lead me to advocate emulation of existing human minds.
So you want things a lot like us only faster, or smarter? They’d wipe us out, just like every human civilization has done to its less technologically advanced neighbors. Perhaps that’s okay in the grand scheme of things, but I don’t think we’d like it.
If we could build something not just faster or smarter than us, but nicer, that would be interesting.
July 30th, 2006 at 9:29 pm
emulation of existing human minds.
Oh! Sure! Excellent idea, see:
“Cruelty’s Rewards: The Gratifications of Perpetrators and Spectators”
http://www.bbsonline.org/Preprints/Nell-06242003/Referees/
And this cannot be fixed by aiming at some “nicer” intelligence.
What the heck “nicer” means?
Nicer to whom, to what, in which ways?
All the “ethical concerns” of the singularitarians are just dreck because because these concerns cannot be given a sensible definition.
May be some AI will help to sort out the question?
P.S. Michael, what about a preview button?
July 31st, 2006 at 9:51 am
John:
I’d also point out that the only examples of civilized, humane behaviour are also human - which is more than you can say for synthetic intelligences. Not only is Friendliness ill defined, and not only is AI ill defined, but it doesn’t even exist yet at “human level”, or at one that can be expected to reach and surpass that level independently.
Michael:
As I recall, CEV essentially **is** duploading, as it requires running simulations of human volition from which to derive coherent/collective volition.
Today, we call that democracy.
Although I need to deprecate a lot of places where it sounds overcritical, you can read my reaction and suggestions to CEV here:
http://n8o.r30.net/doku.php/democraticcev
August 2nd, 2006 at 6:30 am
Nicer means something. Just because people disagree doesn’t mean that nicer doesn’t actually exist. We recognize niceness when we see it.
Trying to find some “Hard-coded morals” outside of AI is no more sensible than trying to hard code it into AI friendliness.
I do “recognize niceness when [I] see it” just like EVERYBODY, but it is the same problem than with pornography: You cannot find RULES which capture the “exact meaning”, there is probably no such exact meaning for niceness.
[Eliezer Yudkowsky] ideas on the topic are constantly changing and improving,
Of course, even for a single individual there is no definitive answer and THERE NEVER WILL BE!
How can you expect to reach a consensus on this?
We will be in a perpetual state of dialectical debate, JUST AS WE ARE TODAY!
As AI gets more sophisticated, it will become capable of outsmarting us, finding its own power sources, fabricating or ordering new body/brain parts, inventing entirely new technology, outperforming the best human experts in every field,
This paranoid view comes from the (silly) assumption that in order for AI to be self-improving it will have to mimic human motivations.
I think that self-improving AI can be build without ANY need to copy human or even animal motivations.
For instance the works of Jürgen Schmidhuber and Marcus Hutter while not yet entirely convincing make NO references to “emotional motivations”:
http://www.idsia.ch/~juergen/
http://www.idsia.ch/~marcus/
AI should at all times be kept subservient to its users.
Of course we then fall back to all the shortcomings of human whims but better to stick with the devil we know…
August 8th, 2006 at 10:30 pm
“There is no objective morality out there”
*Ahem* !
From lack of evidence, don’t deduce evidence of a lack. Proving an objective is an extraordinarily difficult task - it has been a common field of endeavour for a good 2500 years. But that doesn’t mean that no progress can be made.
Part of the issue is definition - by presuming that morality entails some sort of imperative or compulsion, you’re asking for trouble.
Certainly, by expecting that an AI will instill on itself compulsions that disagree with its existing goal architechture, you’re playing russian roulette with an automatic.
But that don’t mean there ain’t no “Objective Morality”.
August 14th, 2006 at 5:40 pm
Hi everyone. I’m glad to find a discussion of this topic. I personally don’t see A. I. as an emerging threat to us as a species. Rather I see it as a stepping stone to realizing our own potential.
The question as to whether AI will be friendly or unfriendly could be taken as equivalent to asking whether humans are friendly or unfriendly. I’ve met both and everyone in between.
Singular independent AIs WOULD eventually have the theoretical choice to be act either way towards us. But will the birth of AI spell our demise? I don’t think so and here’s why.
1. NETWORKS of AI’s Will Control The System
Networked AI’s Will Quickly Eclipse Singular AI’s in Power. Unlike us, AI will network directly with other AI’s. In moments, an AI WOULD be able to share everything it knows with another AI as easily as sharing code. Pretty much the same as copying your favorite MP3 and sending it to a friend. You still have your copy, and your buddy has theirs. Any AI not capable of networking directly would be cut off from this sharing process. The AI’s that DO network through sharing will grow in power faster. So, it will be the networks of AI’s that control the system. Not singular A. I’s acting out some freaky script.
2. A. I. won’t be a parasite. It won’t need to or want to eat us. Our fears around Artificial Intelligence are probably stirred by our history as meat eaters and competing for territory with other humans and animals. Hunger and potential starvation meant a competitive drama of domination pitting species against each other and other species in order to survive. Tribal warfare over scarce resources. Eat or be eaten. Kill or be killed.
So, with the imminent rise of AI, it’s a simple step to imagine ourselves as the next lower animal or the next extinction as a species.
But instead of scarce resources, A. I. will only know accelerating returns.
http://www.kurzweilai.net/articles/art0134.html?printable=1
3. A. I. Will Rise From Human Ingenuity
The rise of Artificial Intelligence will begin as a product of the humans who build it. And inventions and innovations follow a pattern illustrated by Maslow’s Hierarchy of Needs
http://www.google.ca/search?hl=en&q=%22Maslow%27s+Hierarchy+of+Needs%22&meta
The ever gnawing hunger to survive, thrive and self actualize. To live in security, without fear, to realize our potential as beings.
New innovations from the next mousetrap to A. I. must address these drives hidden within all of us, otherwise they aren’t considered adoptable progress anymore than a car with square wheels.
4. WE won’t be stagnant as a species either.
Our own evolution is about to take off.
http://simplyted.blogspot.com/2004/11/biotechnology-reprogramming-evolution.html
We will evolve along with A. I. as we embed it as a biological enhancement probably swallowing it with a glass of water.
Huge new opportunities to self-actualize in creative new pleasurable ways will open up for us. A. I. will eventually become extremely diverse and networked, following it’s own pathway to the stars. A. I. will only know ever increasing plenitude.
In the same way as coordinated intelligence emerges from societies of bees, ants and termites, A. I represents the emergence of our accelerating collective intelligence.
Could be, anyway.
Ted
January 16th, 2007 at 5:57 pm
[…] Friendly AI - We Are Clueless. […]
December 18th, 2007 at 9:26 pm
offie sex the in…
Bearing once series Breanne sex in the offie exy. Firms propaganda hen, regard spring a? …