Over at the Streeb-Greebling diaries, Bob Mottram watched the Google video on the Risks of AGI panel and writes,

In this video a panel of luminaries discuss the future risks which advanced forms of AI might pose. Much hinges upon the idea of “friendliness”, and trying to ensure that decisions made by powerful intelligences will always be somewhat in tune with human desires. The elephant in the room here though is that there really is no good definition for what qualifies as “friendly”. What’s a good decision for me, might not be a good decision for someone else. When humans make decisions they’re almost never following Asimov’s zeroth law.

Asimov’s zeroth law is “A robot may not injure humanity, or, through inaction, allow humanity to come to harm.”

It’s not really “an elephant in the room”. There is a common definition for “friendly”, and it is accepted by many in the field:

“A “Friendly AI” is an AI that takes actions that are, on the whole, beneficial to humans and humanity; benevolent rather than malevolent; nice rather than hostile.”

Not too difficult. Then comes the objection, “there can be no such thing”. Well, then you’d want to build an AI that is as close to that as possible.

So should future AIs always be engineered to follow the zeroth law?

Yes… not really as a “law”, but as an innate part of its motivations.

If an AI could override all human political decision making and impose an equitable world food distribution network I think that would be a very positive development. But would national leaders be willing to have their own self-interested agendas overridden by automation? I suspect they would be unhappy about that, owing to the inherently tribal nature of human psychology.

If you can’t please everyone all the time, then try to please as many people as possible most of the time. Again, there’s no dilemma here. I do suspect that a superintelligence with advanced nanotechnology would be able to go a long way towards appealing the national leaders even if their people are fed.

One fallacy in my opinion is that it will be possible to control and predict the decision making quality of very complex AGIs. It’s already hard for us to predict how existing, relatively simple, computer programs will operate under all possible conditions.

Well, a complex AGI will be built by a simpler AGI that human programmers write. Of course we cannot predict anything with 100% accuracy, but I do think that we can build an AGI such that we can place more confidence in it crossing the line to the superintelligent regime than we would in any particular human or combination of humans.

An AGI with a supergoal of maximizing the number of black objects in the universe will not be convinced to change its goal system through any learning experience, however anti-black it may be.

A problem in the conceptualization of Friendly AI is that some people think that we are aiming for perfection. Not so – we’re just aiming for the best we can do, and something better than the alternatives. We can’t ask for anything more.

Once you introduce general learning capabilities into the equation it soon becomes impossible to say what the system will do in the long run.

Not necessarily. A static human cognitive architecture will always do the same things in the long run – humanlike things. A humanlike brain has humanlike goal attractors, which are preserved in the abstract regardless of any amount of learning. In the space of all possible goal attractors, the human mind stays within a very constrained area.

It will be possible to write utility functions that remain invariant regardless of new knowledge that is acquired, or remain invariant within certain constraints. New information that changes the particulars of subgoals, but does nothing to change the supergoal. “Learning” implies acquiring knowledge, but does not necessarily imply changing goals.

Another assumption made by Goertzel and others is that there will only be a few powerful AGIs in the world.

The assumption is that there will be a hard takeoff whereby the first AGI to engage in recursive self-improvement is basically leagues ahead of any other AGI. Given that silicon transistors have switching speeds millions of times greater than biological neurons, this is indeed plausible.

Goertzel and others are not saying that there will only be a few powerful AGIs. Just that there will be a first mover, and that the existence of all future AGIs will be contingent upon the first AGI accepting their existence. The future space of all created AGIs will be limited by lines drawn by the first AGI, or human wishes channeled through that AGI.

Then there’s also the Gates scenario, where there will be a super-powerful AGI on every desktop and in every home. In this situation anybody will be able to have AGIs do whatever they wish, with no guarantees on friendliness.

The first AGI that reaches superintelligence is likely to become so powerful as to qualify for near-omnipotence. It would be trivial to prevent the creation of AGIs antithetical to its goals.

Further material on Friendly AI:

What is Friendly AI?
Creating Friendly AI

Anyone interested in the field of Friendly AI should read that last one from start to finish. Also: note that ‘friendly’, the English word, is not the same thing as “Friendly”, which is extremely complex and subtle.