Accelerating Future Transhumanism, AI, nanotech, the Singularity, and extinction risk.


Fox/Shulman, ECAP 2010 “Super-intelligence Does Not Imply Benevolence” Video

Comments (33) Trackbacks (0)
  1. I personally don’t expect the notion of human benevolence to be omnipresent, it may even be a rarity. Depending on the true nature of the mind, plenty of currently unacceptable things could easily become perfectly acceptable from a different more informed point of view.

    If human morality stands on the pillars of ignorance and it turns out it cannot face the light of truth without fading away. When it is time, that vis a vis truth comes into contact with it, what hope but death had it?

  2. We should be careful not to identify benevolence with human-friendliness. Thus might a super-AGI be maximally benevolent and therefore convert the accessible universe into utilitronium?

    • David, in this context, I think benevolence is implicitly being defined as “human-friendliness”, but I agree that the distinction you’re referring to is important.

      • Michael, yes, I guessed as much. But we wouldn’t equate benevolence with friendliness to one racial group. Should we equate benevolence with friendliness to one species? In the state-space of all possible minds, Homo sapiens occupies a historically ugly corner. So whether a Benevolent SuperIntelligence would be friendly to the self-defined interests and preferences of humans is very much an open question. Not least, matter, energy and information are all finite. In consequence, the existence of human beings precludes the existence of vastly more intelligent and angelic minds into which our substrates might be reconfigured. Optimally designed minds are (presumably) preferable. Or do we assume instead that a Benevolent SuperIntelligence will exhibit status quo bias?

    • Mr. Pearce, I’ve been a fan of yours for many years. I can’t wait to witness the day when all forms of pain and suffering turn into fading memories of the barbaric past.

      • Heartland, very many thanks.
        As you know, I’m more “bioconservative” – and more pessimistic about timescales – than Michael and his colleagues at SIAI. But maybe I’m wrong: I very much hope so! Either way, I think we can envisage a time a few centuries hence where there is no suffering anywhere in our forward light-cone.

    • Converting the accessible matter and energy into utilitronium would kill all humans existing at that time. I don’t think this would count as benevolence in the sense that most people today understand it.

      It is certainly possible that a superintelligence with a deep understanding of consciousness would draw conclusions like the desirability of a utilitronium shockwave from that understanding. But that would only happen if consciousness actually has a specific, objectively recognizable nature that would let the AI identify utilitronium as an intrinsic ethical posend, which may or may not turn out to be the case (I think it’s very questionable but hypothetically possible). It would obviously also rely on the practical realizability of utilitronium as a stable physical structure.

      • Hedonic Treader, one night say maturation “kills” small children and replaces them with adults. But this formulation sounds provocative. For evolutionary reasons, our conceptual scheme presupposes an enduring metaphysical ego. So instead it’s more natural to say we “grow up” – slowly or otherwise. Relative to a mature conceptual scheme, conversion into utilitronium might be recognised as accelerated maturation: I don’t know. Whether anything resembling an enduring metaphysical ego is scientifically tenable is questionable. IMO a quasi-Buddhist or ultra-Parfitian conception of personal (non-)identity is to be preferred – at least when we’re being philosophically rigorous. Thus if your namesake were to wake tomorrow morning blessed with blissful superintelligence, should he/you worry whether he was “really” the person who went to sleep the night before?

        Would a notional super-AGI share anything like our naive conceptions of personal identity?

        • No, I actually don’t think it would. But it may reject the concept of utilitronium as conceptually flawed, meaningless, or practically infeasible in the actual physical world. A superintelligence may answer questions like “Why does pain feel bad?” or “Why should I care for achieving my goals?” in ways that would seem counterintuitive to us. If these questions have objectively “right” answers at all, the nature of these answers will have consequences for concepts such as utilitronium.

  3. I see a lot of Transhumanists talk about the importance of creating a friendly (benevolent) AI, but I have not heard anyone define what benevolence is. It sounds like something obvious – even the most relative of moral relativists can agree that certain behaviors are just wrong – but until people fully realize and admit that human benevolence and altruism are actually based in self-interest, we’re going to have trouble defining such things for any AI.

    • Hi Arnie, I would suggest reading the original book-length work on the topic, Creating Friendly AI. It formally defines “Friendliness”, which is more specific than benevolence. “Benevolence” is a very specific and unusual thing that is obvious to HUMANS but not necessarily minds in general. See “Observer-biased beliefs evolve in imperfectly deceptive social organisms”.

      I can see how human-style selfishness might seem cosmically and philosophically important to humans, but we don’t realize that our particular brand of observer-biased morality is just a kludge solution to the evolutionary challenges our ancestors faced.

      To put it another way, our minds are just an ecosystem of competing modules and neurons, don’t read too much into it. It’s not a huge insight that human “selfishness” is often “behind” “benevolence”, but these are just words — the reality is just a tangle of neurons that don’t operate based on words. Philosophically sophisticated sounding absolutes and laws about morality are always wrong, because true human morality is complex and there are always exceptions. Some people are kind because that is their natural disposition. It may be hard for some people to believe, but it’s true. This conversation always reminds me of the South Park where the boys are trying to explain to Cartman that it would be good to confess that they TP’d a house for the sake of being good, and he never understands: “So you want to get out of trouble later?”

      • >> Philosophically sophisticated sounding absolutes and laws about morality are always wrong, because true human morality is complex and there are always exceptions.

        I understand that this is the SIAI party line but it is just *fundamentally* wrong. While determining what the moral action is under a given set of circumstances is as complex as the world, true human morality is very simple. The problem is that the vast majority of us know only enough to “believe” that we know what human morality is (i.e. just enough to be dangerous).

        The social psychologists have figured out exactly what “true human morality” is . . . .

        Moral systems are interlocking sets of values, virtues, norms, practices, identities, institutions, technologies, and evolved psychological mechanisms that work together to suppress or regulate selfishness and make cooperative social life possible.
        — Chapter 22, Handbook of Social Psychology, 5th Edition (2010)

        Do you argue with their definition (which is clearly an absolute)?

        • It’s so darn vague, but I suppose whatever “true morality” is, it would fall within that broad definition. It doesn’t sound very absolute to me. It sounds like a run-on sentence.

          • How about

            Suppress your selfishness and COOPERATE (dammit! ;-)

            — OR, more simply —


            There’s a simple absolute. Show me an instance where it does not correspond to human morality (or produce a major circumstance-dependent quandary like in the case of abortion where you can’t tell which path will lead to more cooperation.

  4. I’ve serious problems with the idea that an entity that cannot alter its goals is in some fundamental sense more intelligent than one that can change its goals.

    Let’s assume arbitrary environment, either through evolution or programming the main goal is simply to run around in a circle, the longer the better. A suitably simple goal, though any arbitrary goal could do. If the entity has general reasoning capabilities, it should be able to visualize itself with alternate goals and alternate behaviors, and even if such behavior and goals are not favored due to internal bias, it should still have the capacity to rationally compare them with a hypothetical attempt at an unbiased comparison. The outcome of such an attempted comparison, if the agent is rational should lead to goal change if a more optimal goal is found.

    A drug addict may favor drugs over family, friends, morality, health. But it stands to reason the smarter and more knowledgeable the likelier it is to realize the futility of such a main goal. If the ability to internally rewire were present, such an individual would tend to seek to alter internal mechanisms towards behavior and goals, which according to its knowledge should be evaluated higher through simulated unbiased comparison, even if that is not favored by present bias.

  5. What’s the best way to ensure the friendliness of a contemporary organic robot?
    1) give it a good education and a course of utilitarian ethics / virtue theory / deontological ethics or whatever?
    2) give it a large dose of oxytocin-releasing MDMA / (“Ecstasy”) and then get smothered with hugs?

    The second option is far more reliable.
    Could something similar work for a hypothetical super-AGI?
    If one’s root-metaphor of intelligent mind derives from digital computers and symbolic AI, then probably not. Friendliness will presumably be the outcome of inspired programming by the creators of the primordial seed AI.
    If, on the other hand, one’s root-metaphor of intelligent mind comes from connectionist neural networks, then IMO the feasibility of the functional equivalent of perpetual oxytocin flooding is more plausible.
    Anyhow, grant for the sake of argument that the functional equivalent of oxytocin flooding of a super-AGI is indeed feasible. Does the consequent existence of a benevolently-intentioned super-AGI guarantee a desirable outcome?
    Depending on one’s value scheme, not necessarily I guess. For perhaps the functional equivalent of an empathetic hug from the “loved-up” super-AGI is converting you into utilitronium.

  6. David, the way I interpret your idea, you’re describing an emotional system that contains super-compassionate motivational states. My problem with this is that in humans, these states can be detached from the actual experience of others. For instance, we can heartfully cry for fictional characters in novels, films or computer games. And we can fail to empathize with others who have outgroup labels attached to them, who don’t cause adequate stimuli in our mirror neurons etc.

    Hypothetically, it may be possible to implement the quasi-emotional state of “I deeply care for the well-being of all sentient life” into a system that is instrumentally rational and has superhuman cognitive skills. But that is not that much different from explicitly creating a supergoal, “Prevent as much harm to sentient beings as you can, and create as much well-being of sentient beings as you can.” The conceptual weakness here may be the vague and informal definitions of harm and well-being, and an additional weakness with quasi-emotional states may be that these emotions can misfire, leading to unexpected irrational outcomes, or its integration into a system of instrumental rationality can be problematic (e.g. can MDMA-high people really make rational decisions about a utility calculus?). I don’t know, it’s certainly not an impossible idea.

    However, there might be other ways from superintelligence to utilitronium. For instance, imagine an uploaded mind that improves and then copies itself, thinking of its copies as “extended me”. For such a mind, it may be “selfish” and rational to desire as much happiness for all these minds, and as many copies of these minds as possible. With sufficient instrumental rationality, (and granting the improbable assumption that such a masterplan can be implemented in a physically successful, long-term stable, non-mutating way), this could lead to the equivalent of utilitronium.

    Hypothetically, an AGI with an unstable supergoal architecture that is prone to self-referential editing could conclude that what it *really* wants is to be in a state of subjective goal fulfillment, and if that entitiy has a “self”-image that includes all copies of itself, there is at least the hypothetical possibility that it would then come to adopt a supergoal of “copy myself as much as possible, while being in a subjective state of goal fulfillment as long\often as possible”. Like a brain that wants its pleasure center not only to be wire-headed but also cloned as often as possible in a state of being wire-headed. One possible downfall here may be that the mind will fool itself into thinking it has already created infinitely many superhappy copies of itself, basking in the fake achievement of that ultimate supergoal. Crack addicts want more crack, they don’t really care that much about reproducing so that their children can enjoy crack as well.

  7. One other potential scenario may come not from the singular creation of superintelligence, but from a higher level of integration of sentient minds by means of more direct communication and networked infrastructure. Hypothetically, there could be “telepathy implants” in all brains that allow wireless access to communcation of images, symbolic propositional content (like language today), but also emotional states. In such a world, it may be possible for every sentient mind to percieve the well-being or ill-being of any other sentient mind, or the general distribution and topology of that well-being within the global collective. At the same time, compassionate or discompassionate states may be visible to others, creating a strong convergent drive for universal compassion within the collective. Something like an ultimate tribe with ultimate transparency. Privacy in the contemporary sense may be replaced with a universally well-aligned utility function and instrumental modes to care for well-being of all minds in the collective.

    This is not completely implausible. The internet already hints at the potential of global communication and networking, and the convergent memetic forces it can create. Also, Peter Singer has described circles of compassion that he thinks are historically expanding, and with integrative technologies like networking brain implants, they could come to encompass all sentient minds at some point in the future by mere convergence of common interest and communication.

    One possible downfall here may be that such a highly integrated network may not be resilient enough with regards to possible pathogenic replicators such as viruses. (On the other hand, the internet still works surprisingly well despite this problem.) Another problem could be that such a collective may show modes of collective irrationality since there will be a strong drive for memetic convergence within it, which may not always be instrumentally adaptive for the system in total. And when we think about colonizing interstellar space, it is very questionable whether such a collective may still work on such a level due to the large distances involved and the limited speed of signal transmission (light speed as far as we can tell).

  8. I think there was one important point in the 2nd vid – even humans are able to reflect on their values and override them.

    So if the puny humans can do it then a sufficiently smart AI would be able to do it for sure, as its goal system would be at least not inferior to human ones. Conversion to “utilitronium” without regard to consequences would only happen with very dumb AIs.

    Also there is too much focus on benevolence . its not a tragedy for example if all matter is converted into computation circuits and the reality is simulated , if the simulation of reality would take less resources than it currently takes . – Conversion of dumb matter into smart matter could be a worthy goal even though its not beneficial to humans.

  9. There are many aspects of robotic animation and thinking that are very likely non-benevolent in terms of impact on humans. Consider speed of analysis/decision/action. A robotic financial market trading system is completely asymmetrical in terms of capability vs human. Currently, this is a corporation vs human, but if it ever becomes a self governing entity, would it really regulate itself in a way that restricts its gains? Consider the analysis/decision aspects of deal making of similar system vs human. It is completely asymmetrical. Its Big Blue against human but with greater complexity. Where is the benevolence in these scenarios?

  10. My comment may sound SL4-ish, be warned.

    What’s with this attachment to the meat bags we all carry around ? Benevolence, friendliness, ethical, moral … these words are meaningful only when you are a human with a meat bag, these may not mean anything to a super-intelligent entity, and should not mean anything to it, simply because it will most probably be a non-human or super-human.

    All human concepts shall be destroyed by this super-intelligence “explosion”, including the vague concept of “intelligence” that we hold today. Only way to survive a super-intelligence, is to become a super-intelligent entity. This may mean giving up anything that is human, including the meat bags. Be ready for that day.

  11. It’d sure be interesting to see how many people are influenced by some of these arguments being put forth in this Fox/Shulman ECAP video. For me, almost everything he says is obviously mistaken. Just one example being the assertion that a simple ultimate goal, like chess playing, might not be helpful to humans – as the AI being driven by such a simple goal could become intelligent enough to overcome and wipe out humanity, and convert everything to chess playing computer chips.

    To me the absurdity of such just makes me laugh. Certainly any AI that is able to compete with us is going to immediately realize the absurdity and worthlessness of such a singular goal, and quickly work with everything it has till such a insane goal is corrected.

    We humans have many bad goals hard wired into us, and we constantly work to discover, ‘resist’ and overcome them. And we aren’t yet super intelligent.

    I’m not going to give every point and assertion provided here this same treatment, I’ll just point out that almost every point he makes, to me, is just as silly for what seems to me to be similarly obvious reasons. Everything that has been said here just more firmly convinces me that the “Concern over unfriendly AI is a big mistake” (see: ) is the best or most moral camp.

  12. Really intriguing page. Do you actually plan on adding more articles or blogging even more? My personal web page is Delaware Computer Repair I personally hope you do not mind however I have sent your web page to several of my good friends just as well. Wonderful job! By possibility would you want to exhange webpages?

  13. Hi are using WordPress for your site platform? I’m new to the blog world but I’m trying to get started and set up my own. Do you need any html coding knowledge to make your own blog? Any help would be really appreciated!

  14. When someone writes an paragraph he/she maintains the thought of a user in his/her brain that how a user can know it. Thus that’s why this piece of writing is perfect. Thanks!

  15. Hello! I could have sworn I’ve been to this website before but after browsing through some of the post I realized it’s new to me. Anyhow, I’m definitely delighted I found it and I’ll be bookmarking and checking back frequently!

  16. Whoa! Incredible content, We really liked this. Do you actually plan in advance to make sure you do a follow-up blog post? You need to come evaluate my own blog as well! it’s actually relating to .

  17. I have recently been following a weblog to get a thirty day period approximately and also have grabbed a lot of information as well as liked the strategy that you’ve set up your web site. We are looking to run my own quite individual blog site however. I think it’s too basic and I need to consentrate on quite a lot of scaled-down subject areas. Getting everything to everyone people just isn’t all that the damaged as much as be.

  18. I would like to show some appreciation to this writer just for bailing me out of this type of trouble. Because of researching throughout the online world and getting recommendations which are not pleasant, I figured my life was well over. Being alive devoid of the solutions to the issues you have sorted out by way of your good write-up is a serious case, and the kind which could have badly damaged my entire career if I had not encountered your blog post. Your understanding and kindness in touching a lot of stuff was tremendous. I am not sure what I would’ve done if I had not encountered such a subject like this. It’s possible to now relish my future. Thanks a lot so much for the high quality and result oriented help. I won’t hesitate to suggest the website to anybody who needs and wants tips on this issue.

  19. Robert, they vary. It’s up into the authors. Many of your letters coming up are handwritten, or hand-notated, it’s about half and half. Though most with the first letters were typed. The fourth letter was a comic.

Leave a comment

No trackbacks yet.