As I’m working my way through the Greene dissertation (currently on page 163, thanks to Roko for pointing this out to me originally via his blog), I feel myself getting more optimistic about a certain issue: that is, that any sufficiently intelligent human will figure out that morality doesn’t generate itself automatically in the absence of very sophisticated and specific cognitive hardware. This is wonderful, because it lessens the probability of unFriendly AI, that perennial Singularitarian bugaboo.

In the Greene essay, he points to three things: psychopaths, Phineas Gage, and similar patients with damage to their ventromedial frontal lobes, which are apparently indispensable to generating moral evaluations and feelings. They all display significant intelligence but the inability to have strong feelings about right and wrong, or to distinguish between moral rules and conventional rules. This proves that intelligence is independent from morality. These findings were buttressed with fMRI studies Greene did on criminal psychopaths.

Anyone who philosophizes enough will figure out that morality is in the human mind, not out in the world. Anyone who does fMRI brain scans on psychopaths and normal people or is aware of such work will realize the same thing. Also, I am hopeful that anyone who develops a sufficiently well-developed AI design will realize the same thing. In the best-case scenario, unFriendly AI simply doesn’t happen because people realize what they’re doing before they finish the task. Unfortunately, this is wildly overoptimistic, because just recognizing the need for a complex and subtle morality is not enough to build one.

Continuing on my optimistic tangent, I am hopeful that anyone building human-equivalent artificial intelligence will at least make a token effort to instill it with some sort of top-level morality. There is little danger of this being Asimov’s laws, because Asimov’s laws are not descriptive enough to really set an AI or robot in motion. Any attempt to build a goal system based on Asimov laws will have to introduce a heck of a lot more complexity to have a product that has any use. If this morality is then applied to test cases, and it would most hopefully be if the AI in question has any significant responsibility, then it will quickly become apparent that morality is not a free lunch and that the AI fails miserably on many moral tests. If pre-human-equivalent AI does significant damage due merely to accident (an issue I am exploring at length by reading Moral Machines by Wendell Wallach and Colin Allen), then a hands-free approach to AI morality will get a bad reputation, and there will be a strong profit motive to produce genuinely Friendly AI. Another obvious pitfall here is that the programmers craft a morality sufficient for a non-self-improving human-level AI but which fails catastrophically and fatally (to us) when the AI improves itself. But at least it would fail for that reason rather than the programmers making an even dumber error.

Basically, this post is an exercise in optimism — I spend much time on this blog wringing my hands so vigorously that one might think the skin will be sloughed right off. But it’s Sunday, so here I am saying, “what if… [optimistic scenario]?”

Going even further back than Greene, who just published his dissertation in 2002, we have Hume, who said, “‘Tis not contrary to reason to prefer the destruction of the whole world to the scratching of my finger”, showing he was completely aware of the reason/morality distinction, and in fact is the premier historical philosopher in favor of this view. It seems to be in the interest of those concerned about unFriendly AI to spread the moral philosophy of Hume as much as possible. Ditto with moral sense theory, a non-cognitivist (non-realist) theory that morality is grounded in complex sentiments and emotions. The seeds of the idea that Friendliness theory (challenging study regarding how to create a Friendly AI) is necessary are planted right there.