Can FAI Beat AGI
Is it even possible for the Institute to build an artificial intelligence that is general *and* Friendly, before anyone builds an artificial intelligence that is merely general?
A question asked frequently enough that I would like a referenceable Wiki page.
By Eliezer Yudkowsky. Comments to /Comments.
Joshua Fox wrote:
- Yes, I know that they are working on _Friendly_ AGI. But my question is:
- What reason is there to think that the Institute has any real chance of
- winning the race to General Artificial Intelligence of any sort, beating
- out those thousands of very smart GAI researchers?
- Though it might be a very bad thing for nonFriendly AGI to emerge first,
- it seems to me by far more likely for someone else --there are a lot of
- smart people out there -- to beat the Institute to the goal of AGI.
Through no fault on the part of the poster, who has asked a question that seems ordinary enough from his perspective, this is a "wrong question" from the perspective of anyone trying to build an AI - or do anything difficult to a scientist or engineer. You don't want to come up with convincing reasons why you can solve the problem. You just want to solve the problem. Any attention you devote to comparing yourself to other people is wasted neural firings.
As HC also pointed out, you play the cards you're dealt. If you've got to beat a thousand other researchers to the punch to prevent the world from blowing up, then that's what you gotta do. You should not mistake the Singularity Institute for futurists. We are not here to prophesy that AI *will be* Friendly. It's an easy point to become confused over, because most people talking in this mindspace are futurists; they want to convince you that things *will* turn out nicely. We make no such claim.
I will try to answer anyway, as best I can, the question as you put it. If I thought the probability of winning was negligible, I'd look for other cards to play.
Suppose I walk into a ballroom full of PhD physicists. Can I, a nonphysicist, tell who in that room has the best likelihood of making significant advances in physics?
I can try to sort the physicists into a stratum of relatively uninspired people who picked up a PhD in college, and a stratum of dedicated geniuses. This sorting will not be perfectly reliable but it may be discriminating enough to make it worthwhile.
Competence is made up of fluid intelligence, crystallized skill, and background knowledge. I can't detect crystallized skill or background knowledge in the domain of physics without possessing it myself. I can try to detect fluid intelligence. But short of becoming a physicist myself, I may have no luck at all in discriminating among the people who strike me as smart, unless they possess crystallized skill or background knowledge which I happen to share. If a physicist launches into a lecture on Cognitive Science, I can label him as "+1 Polymath", or detect a mistake if he makes one. Similarly for people who start talking about Bayes, biases, or the philosophy of science, and get "+1 Rational" bonuses.
I've run into people whom others described as "very smart", who not only struck me as not "very smart", but as quite noticeably less smart than other people I know. I strongly suspect that everyone significantly smarter than a given perceiver tends to be perceived as just "very smart". The hypothesis here is that if you've got IQ 130, you can distinguish grades of intelligence up to IQ 140, and everyone smarter than that is just "very smart". I don't think this is actually true, but I think there's a grain of truth in it, meaning that your ability to detect differences in grades of intelligence decreases as the distance above you increases. Someone once asked me if I considered myself a genius. I immediately inquired how rare a level of intelligence is required to qualify as "genius". The person thought for a moment and replied, "1 in 300". I laughed involuntarily and assured him that, yes, I was a "genius".
There are not thousands of AGI researchers in the world. I doubt there are so many as a hundred. And they are "very smart" to widely different degrees.
Observed behavior can set upper bounds on competence. When you make a specific observation of this type, you automatically offend people who are underneath the upper bound. My father, a Ph.D. physicist and believing Orthodox Jew, would not agree that his acceptance of the Old Testament as the factual word of God sets an upper bound on his rationality skills. But we are not talking about a subtle mistake.
My father is more rational than the average human; for example, he taught me some simple magic tricks to make sure I wasn't taken in by supposed psychics. My father looks with contempt upon Jewish sects which practice what he regards as superstition - Kabbalah and so on. But my father cannot possibly be a *world-class* rationalist. That's beyond the bounds of possibility given his observed behavior.
Many atheists, maybe even most atheists, would be reluctant to say that there is a limit to how smart you can be and still be religious. Wasn't Isaac Newton religious? It is historical fact that Newton wasted most of his life on Christian mysticism. The notion of observed behavior setting an upper bound on competence should be understood as a 3D surface over fluid intelligence, relevant crystallized skill, and relevant factual knowledge. Newton lived before Darwin, in an era when humanity's grasp on science and scientific procedure were both much weaker. If Newton lived today and was still a Christian, I'd penalize him a lot more points for the mistake. Also, physics is not as directly relevant to religion or rationality as other disciplines. Darwin's observations forcibly stripped him of his belief in a personal loving God, but he remained a deist. Laplace, the inventor of what is now known as Bayesian probability theory, was questioned by Napoleon as to whether his astronomy book made mention of God. Laplace famously replied, "Your Highness, I have no need of that hypothesis."
Many atheists, probably a majority, arrived to that conclusion through some degree of luck; not because their rationality skills lay above the upper bound that *forces* someone to become an atheist. There are atheists who profess themselves as having unsupported faith in a proposition, the nonexistence of God, which strikes them as more pleasant than its negation; atheists who try to keep an open mind about the healing powers of crystals; and atheists who are atheists because their parents raised them as atheists.
People who are not sufficiently competent themselves, may be very skeptical about the idea of competence *forcing* you to a particular position. People who know the probabilities and still buy lottery tickets set upper bounds on how well they can have internalized the concept of a hundred-million-to-one probability; good luck explaining that to them. People whose mix of fluid intelligence, relevant crystallized skill, and relevant knowledge, does not *force* them to believe in evolution, have a hard time understanding that evolution is not "just a theory". And of course you can't convince them that their "openmindedness" is the result of insufficient competence, or that their "openmindedness" sets a hard upper bound on how competent they could be. They are not willing to believe - to really, emotionally believe, as opposed to claiming to entertain the theoretical possibility - that someone else could have a mind stronger than their own, which is inevitably forced to a single verdict favoring evolution.
Now let's consider an AGI researcher working on a "human-level" AGI project in which Friendliness is not a first-class technical requirement actively shaping the AI. Unless the researchers are setting out in deliberate intent to destroy the world, there is an upper bound on how competent they can be *at AGI*. That is, there is a 2D surface which bounds the combination of crystallized skill at AGI research, and knowledge of related sciences, which they can possibly be using to challenge the problem. (Unfortunately, in this domain, I don't think there's any associated bound on raw g-factor. Lacking knowledge and skill and rationality, you can be at the human limits of fluid intelligence and still get it wrong, a la Newton on religion.)
Trying to explain exactly which AGI skills they can't possibly have, stumbles over the problem of the skills themselves being harder to explain than any one issue that rests on them. If you look at a dynamic computational process, and you expect it to be Friendly for no good reason, then that bounds your skill at noticing what a piece of code really does, and the rigor of the standards to which you hold yourself in predicting that a piece of code does pleasant things. If you were sufficiently skilled at AGI thought, you'd write a walkthrough showing exactly how the nice thing happened, or else you wouldn't expect it to happen. This, whether the nice thing you wanted consisted of something "Friendly" or something "intelligent".
Trying to explain this in words, I see that it sounds very vague - not more vague than most AI discussion, perhaps, but much too vague for an FAI researcher to accept. Some of these concepts are explained more clearly in "A Technical Explanation of Technical Explanation". If you've read that, you remember that people will invent magical explanations like "phlogiston" or "elan vital" or "emergence", and not notice that they are magical; it is not an error that humans notice by instinct, which it is why it is so common in history. If you've read _Technical Explanation_ plus Judea Pearl, then you will understand when I say that bad explanations for intelligence consist of causal graphs labeled with portentous words: leaf nodes for desired outcomes such as "intelligence" or "benevolence", and parent nodes for causes such as "emergence" or "spontaneous order", with arcs reinforced by perceived correlation (the one says, humans are "emergent", humans are "intelligent", from this correlation I infer necessary and sufficient causation). If you come up with a bad explanation for intelligence, and you are sufficiently enthusiastic about it, you can declare yourself an AGI researcher. That's most of the AGI researchers out there, at least right now. People who can't give you a walkthrough of how their program will behave intelligently (let alone nicely), but they have a bright idea about intelligence, and they want to test it experimentally. That's what their teachers told them science was all about.
There's a good amount of knowledge you can acquire, such as Evolutionary Biology, heuristics and biases, experimental study of anthropomorphism, Evolutionary Psychology, etc. etc., which will make it *more difficult* to stare at a computer program and think that it will magically do nice things. Unfortunately, the possible lack of such knowledge in AGI researchers doesn't give FAI researchers any significant advantage, since Evolutionary Biology is not directly relevant to constructing an AGI. Worse, you can know quite a few individual disciplines before they *combine* to *force* a correct answer, since it only takes a *single* mistake not to get there.
In this domain, I doubt there is any humanly possible level of raw fluid intelligence, which would *force* you to get the answer right in the absence of skill and knowledge. I.e., Newton was extraordinarily intelligent but still failed on easy tests of rationality because he lacked knowledge we take for granted. Relative to the background of modern science, AGI and FAI are hard enough as problems that no humanly possible level of g-factor alone will force you to get it right. This is bad because it means you can get incredibly talented mathematicians trying to build an AGI, without them even realizing that FAI is a problem. But they are still limited in how deeply they can understand intelligence; they can grasp facets and combine powerful tools and that's it.
What an FAI researcher can theoretically do, which would require competence above the bound implied by trying to write an AGI *without* FAI, is write an AI based on a complete understanding of intelligence. An FAI researcher knows they are forbidden to invoke and use concepts that they don't fully and nonmagically understand (again, see TechExp to gain a clearer grasp on what this means). When you're staring at a blank sheet of paper, trying to reason out how an aspect of cognition works, in advance of designing your FAI, then your thoughts may bounce off the rubber walls of magical things. But you will be aware of your own lack of understanding, and you will be aware that you are prohibited from making use of the magic until it has ceased to be magical to you. And that's not just an FAI skill, it's an AGI skill - although realizing that you need to do FAI causes you to elevate this skill to a much higher level of importance, because you are no longer *allowed* to just try stuff and see if something unexpectedly works.
If an FAI project comes first, it will be because the researchers of that project had a much deeper understanding.
Again, I am not saying that you *can't* build an AGI without being sufficiently competent that your theory grabs you by the throat and forces you to elevate FAI to a first-class technical requirement. Natural selection produced humans without exercising any design intelligence whatsoever. But there's a limit to how well you can visualize and understand your AI, and yet make elementary, gaping, obvious mistakes about whether the AI will be nice. (Unfortunately, I suspect that you can understand your AI fully, and still make more subtle mistakes, more dignified forms of failure that are just as lethal.)
If you're a researcher building an F-less AGI and you're not deliberately out to destroy the world, there are things you can't know, skills you can't have, and a limit on how well you can understand the AI you're trying to build. You can be a tremendous genius, possibly at the limits of human intelligence, but if so that sets an even stricter upper bound on your crystallized skill and relevant knowledge. Most such folks *won't* be tremendous geniuses. World-class geniuses will be rare among AGI researchers, simply because world-class geniuses are rare in general. There is no physical law that prohibits a non-world-class genius from declaring themselves an AGI researcher; even the Mentifexes of the world do it.
So it is not that FAI and AGI projects are racing at the same speed, toward a goal which is miles more distant for FAI projects because FAI projects have additional requirements. The AGI projects are bounded in their competence, or they will turn into FAI projects; if their vision grows clear enough it will *force* the realization that to develop a nonlethal design they must junk their theories and start over with a higher standard of understanding. FAI projects can continue on past that point of realization, and develop the skills which come afterward in the order of learning. The advantage is not entirely to the ignorant, nor to the careless.
I am sure that it is possible to spend years thinking about FAI, hold yourself to the standard of understanding every concept which you invoke and being able to walk through every nice behavior you expect, and yet make a nonobvious lethal mistake, and so fail. But the projects whose AGIs would *automatically* kill off humanity, the projects who must fail at FAI by *default* - are, yes, genuinely limited in their competence. To reduce it to a slogan that fits on a T-Shirt:
There is a limit to how competent you can be, and still be that stupid.
It's a *very high* limit. There's more to it than raw g-factor. People can be *that stupid*, and still look "very smart". I can even conceive that they might *genuinely* be very smart, though I've yet to encounter a failing-by-default AGI researcher who strikes me as being on a level with, say, Judea Pearl.
So the life-or-death problem reduces to whether people permissibly smarter than an upper bound can accomplish a colossally difficult task by means of exactly understanding it; before at least one member of a much larger pool of people, a few of whom are "very smart", but none of them more competent than the upper bound, can accomplish a task, whose difficulty is merely huge, through work that includes a substantial component of guessing, vagueness, and luck.
And remember also this. This is the pass-fail test for the human species itself. The other AGI projects are, most of them, not fighting that battle. Sadly there is no monopoly on enthusiasm; perhaps some of the other AGI projects will work 20 hours per day on our doom. I wish I could say that any sufficiently intelligent human being *must* see the pass-fail test of our survival, and devote all their resources to passing it. Unfortunately this again requires crystallized skill and background knowledge, not just g-factor. When I look over my own past history, I find that there was, dismayingly, a significant component of luck to noticing the pass-fail test. That is not something I am comfortable with, and I will try to do better in the future, win on purpose instead of by accident.
It still strikes me as endlessly strange that so few people should care about this matter, the hinge of time around which all else turns. Yet I cannot boast that my course here was inevitable - I wonder sometimes if I *could* have made it without a component of luck, but that is not what actually happened. But if I had been smart enough to bulldoze through life on sheer raw intelligence, as opposed to my grand-uncle loaning me a copy of "Great Mambo Chicken and the Transhuman Condition" when I was 11, then there would be many others of slightly lesser intelligence, some of whom would succeed through an event path that included a component of luck. And there would be many FAI researchers; the battlefield would not be so deserted.
It would seem that sufficient fluid intelligence, crystallized skill at rationality, and background knowledge of science, is the price only of it being *possible* to find the hinge of time and move to defend it, if you are also lucky. Yet even if there is a component of luck, we may hope that it will be some of the very best of the human species who move to prevent the world from destruction. That it will *not* be an equal dispute between bands of people who happened to major in computer science, or even an equal dispute between scientific geniuses who fell into the particular field of AI. The FAI project may draw on resources and brainpower allocated to the defense of the human species. Which, in this time, is practically nothing, because the understanding is so rare. But among *very very* smart individuals, if the FAI project has need of them, that rationality may not be so rare.
And it may be that there will not be enough smart people among the human species who have also the luck to discover the one important problem; or the smart people will not be able to raise enough of an advantage in competence to overcome the greater intrinsic difficulty of their problem; or the human species will not have sufficient wisdom among its six billions to allocate even the tiny proportion of its resources that it would need to defend itself. And that will be the end of the human story.
But it is not a foregone conclusion. It is worth putting up a fight.
If you were sufficiently skilled at AGI thought, you'd write a walkthrough showing exactly how the nice thing happened, or else you wouldn't expect it to happen.
Hmmm. I work with an emergent system with some simple AI rules embedded within it. While the system is many orders of magnitude simpler than any AI would be, it's got to the stage where making changes to its behavior is nearly experimental. Make the change you want, then test to see if it works and what the side effects are. Execution path lengths are in the billions of instructions. My point is that it, like many other computer systems, has gone beyond the point of human comprehension, even by a genius. Although we will be able to understand the subsystems we build the AI up out of when we look at them in isolation, I seriously doubt anyone is going to be able to understand the massively complex interactions between them that will result - either to correctly predict everything it will do or to imbed a set of useful rules or beliefs within it. Add in the requirement that the system becomes self-modifying and all bets are off.
I'm curious as to the difference between a friendly AI and a general AI that'd we've been friendly to and have treated with respect. Surely any successful AI will, essentially, be a 'person' and should be teated like one.
What I'd be more worried about is, instead of an artificial intelligence, an artificial moron. We will be making these before we make any AIs, as they'll be AI's that almost, but didn't quite, work - but which we're ethically obliged not to power off. Those, I feel, are far more likely to make the sort of logical mistakes that Si-Fi authors love to write about in their AI's that killed the world stories.
-- Mykael