Response to “Thoughts on Friendly AI” Saturday, Mar 22 2008
friendly ai 8:57 pm
I was reading “Thoughts on Friendly AI” at utilitarian-essays.com, a site with short papers on various utilitarian issues, including AI friendliness. (An unfortunate aspect of incorrectly programmed strong AI is that poses a huge risk to humanity.) I wanted to point out several interesting positions in the essay, as well as respond to a few open questions. From here on out, I will refer to the author as “Utilitarian”.
Utilitarian writes:
I think the probability that humans will create an AGI is not trivially small; I wouldn’t put the figure below 0.01, and personally I would consider 0.15 or so to be a more reasonable Bayesian best-guess estimate. Thus, if the stakes are sufficiently high, work related to friendly AI may have enormous expected value.
Here, Utilitarian admits having a low estimate of the probability that humans will create strong AI — around 15%, but at least 1%. In spite of this, the author concludes that friendly AI work may have enormous expected value. This means that you don’t have to believe its particularly likely that AI friendliness is a big deal for AI friendliness to be a big deal anyway. This is because of the degree of power that a strong AI would have if it is indeed technologically possible.
Next, Utilitarian launches into a look at the Coherent Extrapolated Volition (CEV) programmatic concept for AI friendliness. It is pointed out that a CEV-based AI could lead to beneficial outcomes for non-human animals if humanity’s volition decides to include theirs:
CEV would be designed as a dynamic process in which the FAI would extrapolate humanity’s volitions slowly at first and then build upon those volitions in order to rewrite its code and improve the extrapolation process in subsequent iterations. So, for instance, if in the first round, humans decided that chimpanzee volitions should be counted (to the extent this is possible), then chimpanzees would be included in the second round.
My general comment: it might seem unfair program a strong AI to care only about the opinions of humans for the first round (this is the current plan), but unfortunately, anything else is too risky. What if we decide to program the AI to average our volition with that of our cats, and the cats end up outnumbering the humans, and it turns out that cats don’t like us all that much? Do we want a strong AI on the cats’ side? Probably not. On the same note, we should program the first strong AI in such a way that it doesn’t unfairly favor a small subset of humans.
Utilitarian then writes, provocatively:
However, the starting point–i.e., who will be extrapolated in the first round–is arbitrary, because we can’t rely on the CEV process to decide that for us. The current plan is to extrapolate only humans and allow them to decide whether to include non-human animals in subsequent rounds. But why stop there? Why not only extrapolate humans born in January and allow them to decide whether to include humans born in other months?
We might hope that all roads will lead to Rome and that all initial choices of the set of volitions to extrapolate will lead to the same result, but this is far from obvious. Thus, the choice of whether and to what extent to include non-human animal volitions in CEV is an important open question–one with which animal-welfare organizations might consider getting involved.
The designers of CEV assume that the category “all humans” is ideal for the initial input. It must be made openly clear that this choice is entirely arbitrary. For reasons I will describe in another post, I actually suspect it might be safer to use only one human being as the initial input. In any case, the author points out here that animal welfare organizations might be interested in lobbying for a place for certain animals, for instance the great apes, in the first CEV input stage.
Personally, I object to the killing of higher vertebrates if at all possible. I suspect that once that in-vitro meat becomes available, many people will “spontaneously” begin realizing that destroying animals for food was an ethical sacrifice all along, and the practice will fall out of vogue, like slavery. Would this be recognized in the first round of CEV? I’m counting on it, but who knows? If the answer is no, do I have a moral obligation to lobby that higher vertebrates be included in the initial CEV input? I don’t think so, because doing so might make the fundamental building block more complex, less stable, and more unpredictable.
We have an obligation to maximize the stability and predictability of the outcome of strong AI, because anything else is unfair to the people that have to live with it. It might not be possible to put the genie back into the bottle. An obviously flawed AI might be capable of self-perpetuating its influence despite any attempts to stop it, leading to an unpleasant period between its creation and Heat Death. This could be about 1040 years, a long time by any measure.
Do all roads lead to Rome? I hope so, but there is little way of telling in advance. One crutch used in the CEV approach is to have a way of peeking at the final outcome and vetoing it if it is obviously a failure. If a single person or exclusive group has the right to do this, one might ask, “how is this different than using just their volitions as input to begin with?” I see a difference, but it’s admittedly subtle. The elite group would have veto power, but it wouldn’t be micro-managing the outcome.
Utilitarian then writes:
It may be the case that animals don’t have an abstract enough sense of their volitions for CEV to work with them. If this is true, the same could be said of human infants. It’s not obvious to me that human infants deserve more direct influence over CEV than, say, pigs. If one makes the argument that human infants have the potential to develop into adults with a better sense of their true volitions, then replace “human infants” by “human adults with significant intellectual disabilities.”
It may be a good idea to exclude human adults with significant intellectual disabilities and human infants. Neurologically, all humans above a certain age are basically similar, but infants and adults with significant intellectual disabilities will be distinctly different. Where do we draw the line? I don’t know, but it seems counter-productive to include the input of minds that lack an abstract sense of morality. Most would at least agree that people in a coma cannot make moral choices in their current state.
Utilitarian writes:
It’s plausible that the lives of most wild animals involve more suffering than happiness; this is especially likely if insects are sentient. On the other hand, most humans value nature highly and would prefer for wildlife to exist. I’m afraid that the CEV of humanity wouldn’t give enough consideration to the suffering of wild animals and, even worse, might create vastly more through terraforming, directed panspermia, or sentient computer simulations of nature.
My hope is that this concern would be addressed by the “if we knew more” part of CEV. If humans were more cognizant of wild-animal suffering and were able to more deeply imagine how horrible it is for, say, a frog to be swallowed alive by a snake, then perhaps they would be more reluctant to value “pristine natural environments.” And if their opinions were still unmoved, then maybe the impulse to preserve nature would be so strong that it would indeed have some merit.
If insects are sentient, we will figure it out soon enough. We should have faith in humanity’s ability to identify failings in our own morality and collectively improve, as has occurred since at least the Middle Ages. In any case, specially intervening to remove the possibility entirely would be a breach of ethics and interference from an elite group.
A similar concern relates to lab universes. If anyone were going to create infinitely many new universes in a laboratory, it would probably be an AGI. I’m concerned that humans would find the creation of new universes so exciting, cool, or unusual that they would ignore the fact that they would create an infinite amount of suffering in the process–and probably far more suffering than happiness
If lab universes are possible, hopefully we’ll come to a democratic conclusion that they shouldn’t contain suffering sentients. I don’t see why we wouldn’t.
In favor of SIAI, Utilitarian writes:
Of course, these scenarios assume that the friendly AI would be built correctly and humanely, but this is an argument in favor of SIAI’s work, rather than against it. Better to have a friendly AI determine the future of our part of the universe than a careless (or even malevolent) AI built by less circumspect programmers.
I will address the third part, “Religion”, in another post.




I remain unconvinced that Heat Death is inevitable. Yes, I am aware that what I am saying is a defiance of the laws of Thermodynamics. However, if M-Theory indicators about alternate rules governing alternate membranes, it may just be possible that interaction between multiple branes could permit for the localized absolute reversal of entropy.
“Yes, I am aware that what I am saying is a defiance of the laws of Thermodynamics.”
– I don’t think that you are. I took a thermodynamics course last year, and in order for the second law to be provably true about a physical system, that system has to satisfy the ergodic principle. As far as I know, there are certain systems that have been shown to satisfy this principle, but the laws of ordinary quantum mechanics are not amongst them. Therefore we cannot be sure that QM satisfies the second law. Of course, no-one has managed to do an experiment that violates ergodicity, again as far as I know, so one would have to conclude that it is likely that the universe will die a heat death, although not too unlikely that there’s a way out.
Is there someone here who is an expert on ergodic theory and can clarify this?
“For reasons I will describe in another post, I actually suspect it might be safer to use only one human being as the initial input.”
- yes, I would have to say I agree with this. I think that having people with quite badly differing opinions, especially about non-rationally-settlable things like politics or religion, would introduce a dangerous conflict into the process. There’s nothing like a fight to make reasonable people become more and more extreme.
The main difference between a veto which shuts down the FAI, and “fine tuning” its output is in the first case we are no worse off than before. We can always choose to do nothing, decide not to create an AGI, if we don’t know how to make things better.
Roko: it’s a good point that disputes can bring out the worst in people. But why, then, do you suppose we would want to extrapolate people fighting, then base the AI’s future actions off the mess that ensues? The point you give is reason not to. What are you visualising here?
Going slightly off topic to comment on in-vitro meat:
Which non-human species are big evolutionary winners from human hegemony? Well, there’s wheat, rice, cows… There used to be one other big winner: horses. But then horses stopped being useful.
Where are the herds of free-roaming feral horses? An animal rights activist would be well advised to draw inference from their fate: they went into cans, starting the canned dog-food industry.
IF in-vitro meat, THEN cows only in petting zoos. But what creates the most cow-utility, many cows living in farmers’ herds (with the inevitable sticky end), or a pitiful few pampered cows in zoos? Can the sum of their lives be compared against the sum of their deaths?
@Nick: I’m arguing in favor of basing some recursive friendliness algorithm on the beliefs of one particular person. I think that this might be a good idea, because if there is some kind of ethical convergence going on, then one person’s beliefs will probably converge to the same thing as any other person’s beliefs.
If, on the other hand, a recursive friendliness algorithm operates by trying to somehow synthesize the ethical beliefs of every person on the planet, there will inevitably be people whose main desire is that the algorithm dismisses other people’s desires. An example of this would be a religious person whose main desire is that the algorithm ignores atheism, or an atheist whose main desire is that the algorithm doesn’t end up realizing a particular deity. This is one of the main problems I have with CEV. In recursive friendliness [especially with the human desire to form little political groupings and our bias towards arguing against what the other person is saying], it may be the case that too many cooks spoil the broth.
Thanks for the detailed response! I’m glad to have people talking about these issues.
Michael: “If insects are sentient, we will figure it out soon enough.”
I hope so, though even if the evidence points strongly away from insect sentience, there might remain the lingering concern that they somehow feel pain in a way we can’t imagine. Given their vast numbers, insects remain highly significant even at sentience probabilities of 0.001 or 0.0001. But I agree that more knowledge will help, and any good AI should appropriately compute these expected values.
Michael: “If lab universes are possible, hopefully we’ll come to a democratic conclusion that they shouldn’t contain suffering sentients. I don’t see why we wouldn’t.”
Are you thinking we would engineer lab universes not to contain suffering sentients? (As I understand things, that probably wouldn’t be possible, since baby universes would become causally disconnected from our own. It may, however, be possible to influence the physical constants of the new universes.) Or did you mean “a democratic conclusion that we shouldn’t create lab universes because they will contain (infinitely many) suffering sentients”?
Michael: “I will address the third part, “Religion”, in another post.”
Great, I look forward to it.
Julian: “IF in-vitro meat, THEN cows only in petting zoos. But what creates the most cow-utility, many cows living in farmers’ herds (with the inevitable sticky end), or a pitiful few pampered cows in zoos?”
On this point, I recommend an excellent paper by Gaverick Matheny and Kai M. A. Chan: “Human Diets and Animal Welfare: The Illogic of the Larder,” http://www.qalys.org/animal-welfare.pdf
I hope so, though even if the evidence points strongly away from insect sentience, there might remain the lingering concern that they somehow feel pain in a way we can’t imagine.
The safest path is not to even risk it, but of course this would require a complete restructuring of the ecosystem.
Are you thinking we would engineer lab universes not to contain suffering sentients?
For some reason when you mentioned lab universes I was thinking about uploaded worlds in compact packages. Yeah, hopefully we will come to the conclusion that creating lab universes is a bad idea.
Julian: it’s not just suffering at slaughter, but suffering during animals’ lives. You might have a case with cows (or farmed fish), but the life of a factory-farmed pig or chicken is pretty clearly not worth living.
Michael: “Yeah, hopefully we will come to the conclusion that creating lab universes is a bad idea.”
Wow, I’m glad you agree. There are plenty of others who don’t. From http://www.npr.org/templates/story/story.php?storyId=6545246:
—–
Greene says given the chance to make a universe of his own, “I might have a little trouble resisting this possibility. Just because it’s so curious, this idea that because of your volitional act, you are creating a universe that could give rise, perhaps, to things we see around us.”
Linde seconded that in his New Scientist interview.
“Just imagine if it’s true and there’s even a small chance it really could work,” he said. “In this perspective, each of us can become a god.”
—–
See also
http://joostmariavandeputte.blogspot.com/2008/03/st-andrews-physicists-create-artificial.html
and the comment at the end of
http://space.newscientist.com/article/mg19125591.500
Nick: “You might have a case with cows (or farmed fish)”
I’m no expert, but I’ve heard that farmed fish live in pretty bad conditions (presumably worse than cows during their grazing phase):
http://www.fishinghurts.com/fishFarms1.asp
However, fish are less likely to be sentient.
The overall direct expected suffering caused by eating farmed fish vastly exceeds that for cows (and arguably other land animals) because large numbers of fish are needed to produce a given quantity of meat:
http://www.utilitarian-essays.com/suffering-per-kg.html
The NPR link above is
http://www.npr.org/templates/story/story.php?storyId=6545246
Nick Tarleton: surely that’s a separate issue, addressed by the trend to free-range?
Utilitarian: Just ripping into the abstract…
“many farm animals have lives that are probably not worth living” – is a separate issue, addressed by free range farming and animal welfare.
“others prevent a significant number of wild animals from existing” – mistakenly assumes that animals would fill the vacancy. Empty stables weren’t left to the bears and squirrels; they were knocked down and replaced with offices.
“the purchase of animal products uses resources that could otherwise be used to bring a much greater number of animals into existence” – mistakenly assumes they would, or even should be used for that.
There are other equally bad bits of reasoning in the text. “There are more cost effective uses of our money than meat or egg purchases to increase the total number of happy animals in the world.” No, we are buying food. Happy animals are a positive externality.
Julian: “mistakenly assumes that animals would fill the vacancy. Empty stables weren’t left to the bears and squirrels; they were knocked down and replaced with offices.”
Industrial meat production prevents wild animals from existing principally because it requires far more crop land (growing food for farm animals to consume) than would producing the corresponding amount of calories / protein directly from plants.
[...] CEV to feel competent to criticise the raised points extensively (Michael Anissimov has some more interesting thoughts on the essay). Some quick questions that crossed my mind would be: How big is the role of primal [...]