The Benefits of a Successful Singularity Monday, Dec 14 2009
friendly ai and singularity 3:41 pm

Check out my new article at Good.is, the latest installment in a “Singularity 101″ series I’m writing with Roko. I would be most pleased if you registered for the site and voted my article as “good”.




Within this series, I liked this installment least. You are introducing oversimplifications that lower the standard of what you could otherwise say.
“The question is whether our wisdom will keep pace with these material gains, and whether we will be able to avoid conflict in a new world of abundance. A civilization with immense power and knowledge still has to decide which goals to direct that power and knowledge towards—and these are not easy questions. Maybe some combination of human and machine intelligence will bring us closer to the answers.”
“Combination of human and machine intelligence”? What’s that supposed to mean?
Vladimir, given the constraints on space I had to simplify. For me, the most important insights that convinced me that AI could have a major impact were that AI could be 1) smarter than human, 2) self-improving, and 3) think faster-than-human, with a special wow factor for the third because of the potential degree of improvement that is easy to understand. Because 1 and 2 were already covered in other articles quite well, I wanted to focus on 3.
“Combination of human and machine intelligence” is my way of referencing the fact that AI cannot create good values for us de novo — everything we regard as having value has value because humans believe it does. My point there is to say that AI might amplify our technological powers by a huge degree, but we will still need to query humanity deeply and often to direct those fantastic powers towards worthwhile goals. It is a nod to the profoundly necessary “human” side of the Singularity — the necessity for us to provide information to come up with a “Friendly” utility function.
I see a CEV as a combination of human and machine intelligence. By expanding human intelligence by consulting and interfacing with AIs that help us analyze our every desire, we might reach more clarity about what it is we actually want, and make it consistent under reflection. Those are the sort of answers I am referencing in this piece.
If you wrote the article, would you have completely changed the focus? Which simplifications bothered you in particular?
re “The entire planet could be raised to the standard of living of the West or better, all on clean power plants built and maintained by automatic systems.”
I can’t blame you for leaving out the extreme outcomes—they are, after all, hard to explain, highly speculative, and, above all, incredibly silly. But I do have to say that it’s awfully … strange … to read an article about the benefits of a positive Singularity that waxes poetic about clean power. What about, you know, the stuff we actually believe? Uploading, immortality, lightspeed expansion, exploring the space of possible minds, apotheosis and eudaimonia forever and ever or at least until the heat death of the universe?
Still, I’ll take the clean power. Much better than being paperclips.
The more I read about CEV, the more I like it. Is it a component of Friendliness? Or is Friendliness a component of CEV? I’m still curious how CEV would react to a human desire by a majority to inflict suffering or outright extermination on a minority.
Zack, I know, it’s quite a conundrum, isn’t it? Dale Carrico calls this “sanewashing” our beliefs. Thankfully, Kurzweil’s widely available book and many others discuss the SL4 stuff, and the ideas are pretty widely known to anyone who looks closely (or not) at this stuff.
There can exist a variety of “minimalist singularitarianism” where the primary goal is just to make sure that humanity survives the Singularity and continues to progress happily, and where all that SL4 stuff is just incidental. According to this interpretation, even the Amish would have a great incentive to push towards a safe Singularity. But, bring in the SL4 stuff, and it suddenly sounds like we’re trying to transform the world in odd ways, when what many of us really expect is a behind-the-scenes superintelligent OS.
Brad, read this for the answer. CEV is a strategy for Friendliness that appears most likely to work and be workable.
“My point there is to say that AI might amplify our technological powers by a huge degree, but we will still need to query humanity deeply and often to direct those fantastic powers towards worthwhile goals.”
I’m pretty sure it’s not how it works. The AI becomes the new laws of Nature, it doesn’t “query” the stuff it implements (i.e. humanity or whatever it is that turned out a worthwhile thing to support). The adsorption of values refers to humanity-before, not humanity-after.
“If you wrote the article, would you have completely changed the focus? Which simplifications bothered you in particular?”
As an awful writer, I can only complain about what I read, not suggest how to write. I agree with further points made by Zack.
But I don’t support this “sanewashing” strategy: if you can’t say some things and be understood, focus on other ideas. But don’t tell lies, things you believe to be false.
Sure it could. In a Singularity sparked based on the CEV model, there would be two things: a CEV and the AI it replaces itself with. The CEV would absorb human preferences, then replace itself with an AI based on the extrapolation of those preferences. I think it’s safe to assume that that AI will continue to query human preferences in deciding what to do. It seems unlikely that humanity’s coherent volition would produce an AI that completely ignores us.
An AI “becoming the new laws of nature” is just a metaphor, and not a very good one in my opinion. The question is how much of an AI’s structure consists of abstract invariants, and how much consists of parts being continuously updated based on human preferences is an open one, but it seems unlikely it will consist exclusively of the former.
It’s not a strategy — when I quote Dale Carrico, you can rest assured that I am being thoroughly tongue-in-cheek. I am comfortable with the way I presented the ideas, and believe they are generally understandable. I don’t think that portraying post-Singularity Friendly AI as something that permanently decouples from human preferences after the CEV finishes running is either plausible or helpful. The immense optimization power of Artificial Intelligence depends on the rich moral complexity of Homo sapiens as a starting point. Without a set of values, AI is just a tool. With a set of values from extrapolated humanity, AI is an advanced AI-human symbiosis, which is what we want. It’s not a Kurzweilian symbiosis, because it doesn’t depend on physical coupling, but moral information content coupling, which is more important anyway. My view of Friendly AI in this way is neither false nor a lie.
To clarify what I meant and understood of your position: (1) By “lie” I referred to the general impression (of being misleading) left by your article, not specifically to mixing-of-values, but that also; (2) I assumed that “Combination of human and machine intelligence” resulted from simplification of presentation, but now I see it’s actually your position. (3) I’m pretty sure it’s a wrong view, or at least a misleading way of presenting it (that is, I may have misunderstood again).
Once AI is in power, it doesn’t “ignore” us, on the contrary, it implements us (hence “laws of Nature”, it’s pretty literal, not so much a metaphor). Once it obtains the definition of values, it doesn’t need to “explicitly” look for more info, but it is aware of how we actually develop, because we develop according to its (dynamic) plan for our development constructed based on our (initial) values. Values don’t change, ever (but they unfold, in a way that even FAI can’t foresee in detail). It may sound overly invasive, but it’s what it is, there is no way for AI to not determine everything (related to: putting AI in a box), and it’s undesirable not to determine everything (OB post: Not taking over the world). Free choice is but a part of the way the world gets determined, AI’s plan depends on the humanity’s choices in Newcomblike way, because some things it has to know beforehand to make the actuality better.
“I don’t think that portraying post-Singularity Friendly AI as something that permanently decouples from human preferences after the CEV finishes running is either plausible or helpful.”
Explicitly on this point: of course FAI doesn’t “decouple” from human preferences, it makes sure that human preferences follow the path determined by preferences that initialized the FAI.
Well, I am going to write the next article to cover a different set of angles, including
* David Pearce,Nick Bostrom and Julian Savulescu’s ideas about peak experience and pushing good human experience to the upper limits of what is possible;
* Eliezer Yudkowsky’s ideas about fun theory; the difference between low-grade and high-grade utopia similar to the above; some examples of low-grade utopia; Yudkowsky’s idea of the optimizer: what would the effect of a your-values optimizer be?
* Research on happiness and human psychology showing that human beings have happiness set-points, and that we are really bad at predicting what will make us happy and unhappy.
Vlad,
What did you read that gave you the impression that a post-Singularity AI would implement everything on the planet rather than just being there alongside everything? If my brain still consists of proteins being trimmed and regenerated via molecules from my food post-Singularity, then it doesn’t really make sense to say that I’m being “implemented” by the AI. It also brings up other weird questions, like is the AI “implementing” plants and bugs too? How do you personally know in advance what a CEV output will do, when no one really knows for sure? It’s possible (but unlikely) that a CEV outputs something like a roller coaster or ice cream cone and then spontaneously shuts down.
It seems to me that the notion of an AI becoming “the new laws of physics” is an overextension of the Sysop idea. A Sysop can exist and infuse practically everything without literally being a substrate on which all minds are being run. It could also potentially exist and manifest at a relatively low density if that were all that is necessary.
I believe that continuous querying of preferences will be a hallmark of the “definition of values”. Thus, the querying would never end. I could be wrong, and you could be right, but I don’t think that this is a settled issue in the way that you appear to be making it out.
An AI could be capable of controlling everything without actually needing to do so to fulfill its directives as written by the CEV. Just because an AI is difficult to keep in the box, does not follow that it will infuse its agents into every cubic nanometer on the planet, for instance. Perhaps it might choose every cubic centimeter, or even every cubic meter. It does not follow from AI box experiments that “there is no way for AI to not determine everything”. Humans are extremely powerful entities relative to bacteria, but we do not control everything, nor implement them. The notion that AIs will “control everything” or “implement us” are based on speculative determinations of CEV output.
The taking over the world post mentions that it takes effort not to take over the world with AI, which is presumably what we’re going with here. Taking over the world, implementing us, etc., could be viewed by the CEV as negative things and it might be that the CEV would never write an AI that takes those actions in a million years. Though it might.
What if I want to prefer murdering myself, will an AI “make sure” my preferences “follow the path determined by preferences that initialized the FAI”? Would suicide be one of those preferences? What would it “make sure”, relative to the default state today where no higher being is making sure of anything? And what did you read that gave you the above understanding?
I think the idea is that the action/inaction distinction doesn’t really exist in decision theory and begins to break down as agents become better predictors of their environment. When you read the words I write, I’m affecting you, but we wouldn’t say that I’m “controlling” you because I don’t know know how to target my words to cause you to do what I want. But if I were to be magically granted complete apprehension of a lookup table relating all possible blog comments (including “the null comment”) to what you would think and do in response to each of them—in what sense could I avoid controlling you? By selecting my comment, I would select your future. To say that the AI only “controls” the areas where it has physical agents is to ignore why we expect this optimization-thingy to be powerful in the first place; property rights are a nice bookkeeping device for agents of the same level, but in the real world there is only cause and effect.
Eliezer has specifically said that part of Friendliness content might be to not optimize individual lives too hard. This was ostensibly addressed in “Free to Optimize” and “Amputation of Destiny,” but I found the explanation in a 2004 SL4 comment much more helpful (start reading from “Okay, here’s one”).
Zack & Mike,
The reading of “free to optimize” as being related to the “strength of optimizer” seems to be a free will confusion. In a completely deterministic world, you are still free to optimize. “Strength of optimizer” merely shows how well it’s able to achieve its stated goals, but even within the exactly most optimal solution there is still as much freedom as anywhere. If you don’t like finding $1M in all boxes you open, no matter how badly you’ve chosen, then it sets the goal so that finding $1M in all boxes is not a property of the optimal solution.
Mike,
I think we agree in principle, I just choose to call my uncertainty about what happens “laws of Nature”, and you to say “I don’t know”. I see three reasons for this: first, as per Zack’s comment, I don’t distinguish “inaction” as a special kind of action — whatever happens, it’s a very specific choice made by AI, about the whole world and the whole future; second, I find “minimal intervention” scenario implausible, and third, I’d call “minimal intervention” scenario “laws of Nature” just the same, as the choice of initial conditions (or “indirect” influence) determines the way things unfold, and there is a specific “law” to the way future gets directed, which is being optimal according to initial preference/values. I expect this law to be rather visible.
One issue is that you distinguish AI-stuff and non-AI-stuff, which seems conceptually wrong. In the long term (e.g. a few days/months after takeoff), there is no AI. There is just stuff that has the property of being human-values-optimised. This stuff may have “humans” or “posthumans” in it at some point, maybe “supporting infrastructure” as spatially distinct stuff, maybe “substrate-style infrastructure” that isn’t physics-level distinct from the rest, maybe magically-computed infrastructure in the patterns of thermal noise. More plausibly, highly customised solutions with little uniformity and chance to be expected.
Pointing to a specific pattern in this future as “the FAI” that also “continuously updates its preferences” by looking at “humanity” seems a much more detailed assertion about how things are going to turn out than anything I’ve said. Singling out “initial” preference makes sense because it’s a concept in the design, something we do on our own terms, and to boot enactment of this preference is the stated goal of the project: whatever comes next is bound to follow it simply by definition of the successful run of FAI.
When constructing something highly optimised, under a new criterion that wasn’t applied before on a completely new level of control, it seems unlikely that the result will be anything like the familiar status quo. Initial conditions may lead to preservation of existing patterns, such as individual persons, but little can be said about e.g. trees or proteins. Is status quo about trees the best possible thing to do? If we didn’t design something for a million years, it’s probably very far from the way it should be.
As you’ll agree, the purpose of the enterprise is not so much to create a “helper” genie, but to ensure that we travel a good road to the future. This is a difference between the current unfair world, where a person may perish just because they were born physically unable to go further, and a world where they are given potential.