From complexity theorist Richard Loosemore on the AGI list:

It is entirely possible to build an AI in such a way that the general course of its behavior is as reliable as the behavior of an Ideal Gas: you can’t predict the position and momentum of all its particles, but you sure can predict such overall characteristics as temperature, pressure and volume.

Without any sophisticated theory of minds in general, predicting the future behavior of any given artificial intelligence can seem impossible - who’s to say that it won’t reprogram itself arbitrarily at any time, if it has the capacity to do so?

The issue is that capacity does not necessarily signify desire. In humans, desire comes from our evolutionary history - every desire, no matter how seemingly unrelated, evolved because it contributed somehow to our inclusive fitness. Art, literature, philosophy, gossip - few people realize that these domains of human endeavor are in fact evolutionarily programmed subgoals of the evolutionary supergoal: the increase of inclusive fitness, which encompasses both our ability to survive and give birth to children that survive. Some may say, “look at math, and the sciences, and abstract thought, don’t these signify that humanity can go beyond its evolutionary origins?” To the contrary, these activities are just extended outgrowths of our evolutionary motivations running their behavior routines in different sorts of external environments.

When order is created, the entropy of a system decreases. The number of possible future states the system is likely to be in goes down. A diamond monolith has very low entropy, and is likely to remain in the same configuration for a long time to come. There is always a nonzero probability that thermal vibrations will shatter the monolith to dust, or that several quadrillion atoms will spontaneously quantum tunnel out of the monolith, thereby destroying it, but the likelihood of these occurrences is minute. On the other side of things, a corpse has high entropy - it is deteriorating, decaying, its molecules are going every which way, being digested by microorganisms and scattered by numerous natural forces. There is a chance that the corpse will spontaneously reform back into a living person, but the probability is close to nil.

An engineer is like a sculptor, creating mechanisms whose purpose is to stay within a certain quadrant of the design space. Because most machines are non-self-repairing and non-adaptive, when something goes wrong in a machine, it tends to simply break (degrade to a local entropy maxima) rather than spontaneously start accomplishing something else entirely unrelated to the original purpose for which it was designed (jump beyond local entropy maxima to another entropy minima). In redundant machines, the offending part breaks, the function of which is quickly replaced by an equivalent part, and the machine goes back to operating how it did originally.

In AI, a popular safety concern is that any AI programmed by human beings will be liable to spontaneously switch motivations after it reaches superhuman intelligence, so any explicit programming is pointless. For example, certain social animals have a heuristic that mediates their treatment of other members of the species. It goes like this:

If person X is weaker than me, then I should consider bullying them around to my advantage.

This heuristic evolved because it boosted inclusive fitness for individuals that followed it, thus it was selected for. For example, if person X has a wife and I have a wife, and I’m stronger than person X, then I can kill him and take his wife, thus giving me two channels to pass along my genes rather than just one. Whether or not our ancestors explicitly made this calculation when bullying people around, it was an adaptive trait that spread. Evolution did the calculations, not us.

Because certain social animals have this trait, like all humans, we’ve come to think that it is necessarily universal to all minds. If its happened throughout all of human history, why shouldn’t it hold true in every history of every possible intelligent species?

This is where the “Overlord AI” fallacy comes into play. If there exists an AI that is stronger and smarter than us, why wouldn’t it always bully us around and refuse to listen to us, right? Superintelligent AIs act like a stronger opposing tribe, right? Agent Smith certainly seemed to.

The problem with this view is the misgeneralization of a human social heuristic into the space of all possible minds. We must assume that operating based on this heuristic is in fact necessary for gaining any sort of power, so any mind would choose to employ it, no matter what. What we aren’t prepared for is the existence of new minds that violate these assumptions.

Human kindness tends to be conditional. Conditional on shared genetic material, conditional on trusted alliances, conditional on networks of checks and balances. Human political systems work in spite of our observer-centric goal systems, because they developed to work around them.

The kindness of a properly programmed AI can be made unconditional. In fact, programming an unconditional response is probably easier than programming a conditional response, because the former is less complex. That’s the type of AI we’d want - one with unconditional niceness.

Singularitarians forsee a “hard takeoff” - that is, a short gap of time between roughly human-equivalent AI and superintelligent AI equipped with molecular nanotechnology. There are various reasons for this, and they’re all based on the cognitive differences between human-equivalent AIs and actual humans. Basically, it tuns out that AIs are better at just about everything: staying awake, thinking faster, thinking in different ways, utilizing surrounding technology, making themselves smarter, etc. As soon as you build a roughly human-equivalent AI, it’s only a matter of time (days, or maybe even hours) before you have a superintelligent AI that can fabricate its own hardware out of sand, tap solar, chemical, and nuclear energy sources, win every future Nobel Prize easily, quickly manufacture food, housing, products, etc., out of raw materials, and perform other “angelic”, “godlike”, and “jaw-dropping amazing” tasks.

The hard takeoff prediction is not based on wishful thinking or the desire for a paternalistic AI God; it comes from looking at what humans can accomplish in spite of our limitations, imagining a mind without many of the same limitations, then asking what that mind would likely be able to accomplish. The Singularitarian stance is that no matter what, eventually, we will have to confront a superintelligence with power of this magnitude, and furthermore, that superintelligence is significantly more likely to emerge from AI first, rather than human intelligence enhancement.

If godlike superintelligence emerges from AI first, then what can we humans do to ensure that this god is a kind one, and nothing like God from the Bible?

As mentioned before, unconditional kindness. Because a superintelligent AI will grow from the seed of a human-equivalent AI, we have the power to specify its initial conditions. While we can’t say exactly which initial conditions will lead to which outcomes, we can try to ensure an outcome that the vast majority of people will tolerate or even enjoy.

Some people may fundamentally be against the idea of a superintelligent AI existing at all, and they will be impossible to please, at least until they come to terms with reality. It may be best for a superintelligence to have minimal visible impact on the lives of such people, and only intervene during emergencies.

Some may not mind the existence of superintelligent AI, but won’t want it in their way too much. They might take advantage of such an AI and its manufacturing abilities to get stuff like a free mansion, free television and free personal aircar, but prefer to interact primarily with other ordinary humans who regard the role of AI in a similar way.

Some will want to embrace superintelligent AI and take full advantage of what it has to offer. They may want to become superintelligent themselves, augmenting themselves to think faster, look at problems from more angles, have more compassion for the unfortunate, and so on.

How do we build a seed AI such that the superintelligence it becomes accounts for these different types of people, balancing their desires considerately and effectively? It would probably be useful to design a goal system integrating human moral decisionmaking ability, so that the AI can address a moral conundrum at least as well as the wisest human philosopher. This way, we don’t need to specify every single contingency, but rather can depend upon the AI to solve these things on its own.

Once we realize that a Friendly AI is no more likely to spontaneously reprogram itself to be unfriendly than a broken cup is to reform itself and jump back on a table, the next question is “what is Friendly?” A superintelligent AI could literally be smart enough to pick a series of actions that all six billion humans would personally call “friendly”, but is that enough? Probably not - but we can’t even begin to address these further problems until we understand the difference between evolutionary goal systems and engineered goal systems, and their predictability.

For a bit of popular culture on unfriendly AI, see a recent PBF.