Predictability of AI
From complexity theorist Richard Loosemore on the AGI list:
It is entirely possible to build an AI in such a way that the general course of its behavior is as reliable as the behavior of an Ideal Gas: you can't predict the position and momentum of all its particles, but you sure can predict such overall characteristics as temperature, pressure and volume.
Without any sophisticated theory of minds in general, predicting the future behavior of any given artificial intelligence can seem impossible - who's to say that it won't reprogram itself arbitrarily at any time, if it has the capacity to do so?
The issue is that capacity does not necessarily signify desire. In humans, desire comes from our evolutionary history - every desire, no matter how seemingly unrelated, evolved because it contributed somehow to our inclusive fitness. Art, literature, philosophy, gossip - few people realize that these domains of human endeavor are in fact evolutionarily programmed subgoals of the evolutionary supergoal: the increase of inclusive fitness, which encompasses both our ability to survive and give birth to children that survive. Some may say, "look at math, and the sciences, and abstract thought, don't these signify that humanity can go beyond its evolutionary origins?" To the contrary, these activities are just extended outgrowths of our evolutionary motivations running their behavior routines in different sorts of external environments.
When order is created, the entropy of a system decreases. The number of possible future states the system is likely to be in goes down. A diamond monolith has very low entropy, and is likely to remain in the same configuration for a long time to come. There is always a nonzero probability that thermal vibrations will shatter the monolith to dust, or that several quadrillion atoms will spontaneously quantum tunnel out of the monolith, thereby destroying it, but the likelihood of these occurrences is minute. On the other side of things, a corpse has high entropy - it is deteriorating, decaying, its molecules are going every which way, being digested by microorganisms and scattered by numerous natural forces. There is a chance that the corpse will spontaneously reform back into a living person, but the probability is close to nil.
An engineer is like a sculptor, creating mechanisms whose purpose is to stay within a certain quadrant of the design space. Because most machines are non-self-repairing and non-adaptive, when something goes wrong in a machine, it tends to simply break (degrade to a local entropy maxima) rather than spontaneously start accomplishing something else entirely unrelated to the original purpose for which it was designed (jump beyond local entropy maxima to another entropy minima). In redundant machines, the offending part breaks, the function of which is quickly replaced by an equivalent part, and the machine goes back to operating how it did originally.
In AI, a popular safety concern is that any AI programmed by human beings will be liable to spontaneously switch motivations after it reaches superhuman intelligence, so any explicit programming is pointless. For example, certain social animals have a heuristic that mediates their treatment of other members of the species. It goes like this:
If person X is weaker than me, then I should consider bullying them around to my advantage.
This heuristic evolved because it boosted inclusive fitness for individuals that followed it, thus it was selected for. For example, if person X has a wife and I have a wife, and I'm stronger than person X, then I can kill him and take his wife, thus giving me two channels to pass along my genes rather than just one. Whether or not our ancestors explicitly made this calculation when bullying people around, it was an adaptive trait that spread. Evolution did the calculations, not us.
Because certain social animals have this trait, like all humans, we've come to think that it is necessarily universal to all minds. If its happened throughout all of human history, why shouldn't it hold true in every history of every possible intelligent species?
This is where the "Overlord AI" fallacy comes into play. If there exists an AI that is stronger and smarter than us, why wouldn't it always bully us around and refuse to listen to us, right? Superintelligent AIs act like a stronger opposing tribe, right? Agent Smith certainly seemed to.
The problem with this view is the misgeneralization of a human social heuristic into the space of all possible minds. We must assume that operating based on this heuristic is in fact necessary for gaining any sort of power, so any mind would choose to employ it, no matter what. What we aren't prepared for is the existence of new minds that violate these assumptions.
Human kindness tends to be conditional. Conditional on shared genetic material, conditional on trusted alliances, conditional on networks of checks and balances. Human political systems work in spite of our observer-centric goal systems, because they developed to work around them.
The kindness of a properly programmed AI can be made unconditional. In fact, programming an unconditional response is probably easier than programming a conditional response, because the former is less complex. That's the type of AI we'd want - one with unconditional niceness.
Singularitarians forsee a "hard takeoff" - that is, a short gap of time between roughly human-equivalent AI and superintelligent AI equipped with molecular nanotechnology. There are various reasons for this, and they're all based on the cognitive differences between human-equivalent AIs and actual humans. Basically, it tuns out that AIs are better at just about everything: staying awake, thinking faster, thinking in different ways, utilizing surrounding technology, making themselves smarter, etc. As soon as you build a roughly human-equivalent AI, it's only a matter of time (days, or maybe even hours) before you have a superintelligent AI that can fabricate its own hardware out of sand, tap solar, chemical, and nuclear energy sources, win every future Nobel Prize easily, quickly manufacture food, housing, products, etc., out of raw materials, and perform other "angelic", "godlike", and "jaw-dropping amazing" tasks.
The hard takeoff prediction is not based on wishful thinking or the desire for a paternalistic AI God; it comes from looking at what humans can accomplish in spite of our limitations, imagining a mind without many of the same limitations, then asking what that mind would likely be able to accomplish. The Singularitarian stance is that no matter what, eventually, we will have to confront a superintelligence with power of this magnitude, and furthermore, that superintelligence is significantly more likely to emerge from AI first, rather than human intelligence enhancement.
If godlike superintelligence emerges from AI first, then what can we humans do to ensure that this god is a kind one, and nothing like God from the Bible?
As mentioned before, unconditional kindness. Because a superintelligent AI will grow from the seed of a human-equivalent AI, we have the power to specify its initial conditions. While we can't say exactly which initial conditions will lead to which outcomes, we can try to ensure an outcome that the vast majority of people will tolerate or even enjoy.
Some people may fundamentally be against the idea of a superintelligent AI existing at all, and they will be impossible to please, at least until they come to terms with reality. It may be best for a superintelligence to have minimal visible impact on the lives of such people, and only intervene during emergencies.
Some may not mind the existence of superintelligent AI, but won't want it in their way too much. They might take advantage of such an AI and its manufacturing abilities to get stuff like a free mansion, free television and free personal aircar, but prefer to interact primarily with other ordinary humans who regard the role of AI in a similar way.
Some will want to embrace superintelligent AI and take full advantage of what it has to offer. They may want to become superintelligent themselves, augmenting themselves to think faster, look at problems from more angles, have more compassion for the unfortunate, and so on.
How do we build a seed AI such that the superintelligence it becomes accounts for these different types of people, balancing their desires considerately and effectively? It would probably be useful to design a goal system integrating human moral decisionmaking ability, so that the AI can address a moral conundrum at least as well as the wisest human philosopher. This way, we don't need to specify every single contingency, but rather can depend upon the AI to solve these things on its own.
Once we realize that a Friendly AI is no more likely to spontaneously reprogram itself to be unfriendly than a broken cup is to reform itself and jump back on a table, the next question is "what is Friendly?" A superintelligent AI could literally be smart enough to pick a series of actions that all six billion humans would personally call "friendly", but is that enough? Probably not - but we can't even begin to address these further problems until we understand the difference between evolutionary goal systems and engineered goal systems, and their predictability.
For a bit of popular culture on unfriendly AI, see a recent PBF.
October 24th, 2006 - 21:04
The problem with AI is not necessarily that it might be malicious. Imagine if you will that the benevolent, super-intelligent AI ends up in charge of the world. It could very well decide things like… diets(no more fast food?), allowed activities (no more unnecessarily risky fun), it might even start assigning mates, because of certain genetic traits that would make offspring more disease resistant.
I’m just what-iffing, and I know some of the examples are silly. The point I’m trying to make is that being treated like pets would not be too fun…
October 24th, 2006 - 22:30
The point I’m trying to make is that an AI programmed to be a pet would keep a pet goal system – even if superintelligent. The problem is with the PROGRAMMERS writing behavioral specs that they don’t grasp the full consequences of, not the AI ITSELF deciding, “being a pet is not fun, I think I’ll rewrite myself to become an evil deity or something instead”. Of course, if part of your definition of “pet” is “non-superintelligent”, then a superintelligent pet AI would not meet those criteria.
Doing the greatest good for the greatest number is a balancing act. Humans have to do it, and a Friendly AI would have to do it too. When you wring your hands and worry about an AI assigning mates, it’s because YOU INTUITIVELY KNOW that the sacrifice might not be worth it. We’d want a Friendly AI to realize the same thing as you do, and it could use the same moral processing complexity as you do to acheive that.
With superintelligent, MNT-equipped AI, in general, you should be able to have your cake and eat it too. Thinness without dieting, perceived risk without genuine danger, freedom from disease without prearranged marriage. Heck, even humans *alone* will be able to accomplish these things within a couple decades, never mind AI with thousands of times our processing capacity…
October 25th, 2006 - 00:29
How will this mystical friendly super-ai make a difference between plants, apes and humans?
To me, this friendly AI theory is bollocks. First you need a working design for AGI, and then you can start thinking up ways to make it supposedly “friendly”. And if the hard-takeoff happens its too late already. So either way it’s just a waste of money trying to come up with some global magic algorithm which will make the AI “friendly”. In the end, when we people can’t even decide what is friendly or not how the hell can we make some couple hundred lines of code to make the AI friendly. There are many kinds of issues like abortion just to name one which one could interpret as unfriendly and another friendly.
When the AI is recursivly reprogramming itself it couldn’t even predict how it will act after each step making it more intelligence, becose you can’t predict the actions of more intelligent things. The AI might look at some function and notice that the function is not needed and deletes it, thus making the AI work 10% faster. But at the same time it deleted some critical part of the program which was the friendly part of it’s code.
October 25th, 2006 - 00:52
A working design for AGI is not necessary before making the decision that Friendly (human-helping, non-human-harming) AI is necessary. Arguing to the contrary is like saying you need a complete readout of human biology before you can say you’d like the next human you meet to be a nice one.
If phrasing is a problem, we can say we’d like a Nice-As-Possible AI instead of a “Friendly” AI. This is just semantics, though. A Nice-As-Possible AI would need to make some decision about abortions, and even if not everyone was happy with that decision, it would make an effort to please as many people as possible. Maybe you’d even have an AI that consistently made better decisions that the best human decisionmakers. Even if they weren’t perfect, it’d still be a distinct improvement.
A Nice-As-Possible AI undergoing a hard takeoff would want to continue being As-Nice-As-Possible, because that’d be its supergoal. It wouldn’t delete part of its own supergoal in optimizing something else unless it were poorly programmed (google “subgoal stomp” for more on that), because its supergoal content would be explicitly self-affirming.
I doubt that you could program a Friendly goal system with just a couple hundred lines of code… it’d be more like a couple hundred thousand, or maybe even a couple million. You could save code by teaching an AI to effectively use human preferences as input, allowing you to avoid copying each bit of moral complexity part by part.
An AI reprogramming itself could certainly reprogram itself in such a way as to maintain its original goals. An intelligence is a highly ordered system, relative to the entropic background – preserving central goal information would not be much harder than preserving the pattern of the intelligence itself. Both are narrow slices of the probability space, but narrow slices which can be preserved indefinitely.
Just because we can’t currently precisely define Friendliness doesn’t mean that Friendliness isn’t valuable. Just because an AI wouldn’t be perfect doesn’t mean that we still have a responsiblity to set the initial motivations with consideration and prudence. Just because we don’t have a full AGI design yet doesn’t mean that we can’t think about how to make an AI a good person.
October 25th, 2006 - 02:26
One full AGI design we have is the human brain, how would you apply your theory of friendliness to this model?
October 25th, 2006 - 03:48
We may have human brains, but we don’t have their designs – they’re too complex for us to have the full design for, yet. Having a nuclear reactor and having the design for a nuclear reactor are different things, for example. So your question doesn’t make sense really.
The (not mine, actually) theory of Friendliness (not friendliness – the capital letter signifies a different term that was first introduced in singinst.org/CFAI, which you should read) can say certain things about the human brain – like, that it isn’t a cleanly causal goal system, or a goal system where subgoals can fluidly be created beneath supergoals, or a system that is consistent under reflection. We’d want a Friendly AI to be all those things, while a human brain is not.
There are two aspects of Friendliness – structure and content. “Friendliness content” is the stuff that’s fun to talk about – whether abortion is okay, for example. Friendliness structure, on the other hand, deals with the technical details of implementation, and how to preserve implementation across successive self-modifications.
Simply put: it’s complicated. You’re not going to start understanding Friendly AI by reading this blog alone. You have a read a bit of the background literature. Otherwise, you’re hardly qualified to comment on it.
October 25th, 2006 - 03:54
The problem I see is that in order to design an AI capable of solving moral problems better than the best of us you would first need an average human-level AI, which can only solve a few moral problems right. But it would be just as capable of initiating a hard take-off.
October 25th, 2006 - 05:02
Finally a serious question…
Actually, Danila, the goal would probably be a roughly human-equivalent AI that already has superhuman moral reasoning. Even if this couldn’t be achieved, the goal is to have a seed (low-complexity) that grows into a tree (high-complexity) no matter what. The part of the genome that codes for the cognitive adaptations responsible for human moral behavior is probably only a few MB in size, and I don’t see why the “Friendly” part of a Friendly AI would need to be that much larger.
Part of the solution is that a Friendly AI can “cheat” – for example, simulate its future self as if it made 10 different choices with self-modification, then pick one of those choices when it satisfies certain criteria. Another way of “cheating” would be to make models of the ten humans widely recognized to be moral genuises, along with a thousand models of typical humans, and see what happens when they are presented with moral issues. Or, take a model of a moral human and extrapolate it as if the human had greater knowledge and had lived longer. Or, wisdom tournaments, where AIs simulate themselves with handicaps, and then devise strategies to continue being Friendly despite those handicaps.
There are thousands of tricks and advantages a roughly-human-level AI would have that humans don’t, and these can be used to achieve superhuman morality before a hard takeoff even occurs. Even if not, we’d want an AI so cautious that it modifies itself extremely conservatively until it has superhuman moral reasoning.
Again, I recommend that everyone read Creating Friendly AI, which goes into all these issues in detail:
http://www.singinst.org/friendly
October 25th, 2006 - 18:02
So, again, my examples weren’t the key to my post. What I was saying, was that it’s possible to have make Friendly AI who ends up treating us like pets.
The underlying problem that I was trying to get at is that – depending on how AI is built – it might or might not understand human thought fully. Which is dangerous, because if it doesn’t understand our motivations and our values, it could be dangerous by chance rather then by choice.
We know how the human mind works (not everything mind you, but we do know the main bits and pieces). So we do know that thought is basically pattern recognition. Whether it’s the low-level (unconscious) perception, the higher-level language up to possibly the highest level which is consciousness.
Now the idea is this. If we build AI using neural nets, or even something that works hard at approximating neural net functionality (Novemente) it will be nearly impossible to build it to have something like Asimov’s three rules (or any rules for that matter).
If we build it a different way, it’s way of thinking and ours may be widely different. It’s way of reaching conclusions, and ours, might diverge so far and – more importantly- it might not be able to associate with us. And if it doesn’t understand us…
October 26th, 2006 - 09:52
We’ve already encountered the problem of ensuring that artificial persons are friendly. According to Leviathan by Thomas Hobbes, governments are artificial persons.
The problem of ensuring that governments will be and remain friendly is still unsolved.
October 26th, 2006 - 16:33
A government isn’t really an artificial person because it’s made up of members of our species all the same – AI opens up new opportunities for consistent friendliness because, in theory, we get to set the initial conditions practically any way we want.
The greatest difference is probably that we can build AIs without observer-centered goal systems at all, that is, an AI focused on a goal, and only concerned with its own existence insofar as it contributes to the fulfillment of that goal.
November 1st, 2006 - 18:49
Unfortunately Loosemore employed the brilliant strategy of conclusively demonstrating that it is impossible to have a rational debate with him, before finally revealing his attempt at FAI; probably the only way to avoid it being ripped to shreds by everyone with a clue, but still somewhat annoying. For me the single silliest assertion was the opening dichotomy between ‘a huge number of weak constraints’ and ‘a few strong but brittle constraints’; why on earth couldn’t we just use a huge number of strong constraints (assuming we have the tech to check those constraints for consistency with each other and our abstract goal system)?
Robert; yes, we (which is to say myself, the SIAI and other supporters of provable Friendliness) do want to build it ‘a different way’. Any AGI not very closely based on the human brain is going to have a similar gulf to cross in trying to understand humans. The mind design space isn’t some smooth continuum, it’s an extremely rugged mountain range with the human brain somewhere on the lower slopes of one of the smaller (intelligence) peaks. Anything other than an upload is very unlikely to be on the same peak. Good FAI designs are actually much more likely to be able to understand humans than the emergentist ‘brain-inspired’ designs, because the designers explicitly address the important problem of giving the AGI human-modelling capability, instead of just expecting it to ‘emerge’ based on the supposed merits of the substrate.
Part of the reason why humans find moral problems so hard to solve is that our motivational systems are a complete mess, and that it’s very difficult to (a) extract some objective sense from them, (b) reason about morality without losing objectivity and (c) actually apply the conclusions without the lower level motivational system ‘cutting in’ and messing things up. A sensible AGI design won’t be saddled with all that baggage; it may need to subsume a fair amount of human moral complexity (though I suspect that in absolute terms human morality isn’t actually that complex), but it will be able to reason about it objectively, reflectively and reliably.