Eliezer Yudkowsky/Questions
Question: Can you explain this passage a bit?
- In a sense, the only way to create a Friendly AI - the only way to acquire the skills and mindset that a Friendship programmer needs - is to try and become a Friendly AI yourself, so that you will contain the internally coherent functional complexity that you need to pass on to the Friendly AI. I realize that this sounds a little mystical, since a human being couldn't become an AI without a complete change of cognitive architecture. Still, I predict that the best Friendship programmers will, at some point in their careers, have made a serious attempt to become Friendly - in the sense of following up those avenues where a closer approach is possible, rather than beating their heads against a brick wall. I know of no other way to gain a real grasp on where a Friendly will comes from. The human cognitive architecture does not permit it. We are built to apply reliable rationality checks only to our own decisions and not to the decisions we want other people to make, even if we've decided our motives for persuasion are altruistic. Your personal will is the only place where you have the chance to observe the iterated buildup of decisions, including decisions about how to make decisions, and it is that coherence and self-generation that are required for a Friendly seed AI.
- --CFAI, Interlude: Beyond the Adversarial Attitude
I understand the paragraph as a whole. The sentence that gets me, specifically, is "We are built to apply reliable rationality checks only to our own decisions and not to the decisions we want other people to make, even if we've decided our motives for persuasion are altruistic." It parses, but I don't quite get it. How exactly would I apply a a rationality check to a decision I want someone else to make?
Answer:
I'm not sure I understand your question.
Since the dawn of the collective volition model, it no longer makes sense for a human being to try and imitate an FAI. The FAI I visualized back in the CFAI era was a lot more analogous to an extremely nice individual human, an end which a human could, if not achieve, at least try.
As for the part about reliable rationality checks, why, humans have no such things. It is nonetheless true that when we imagine what we want other people to do, we apply different mental checks than when we consider whether or not to do something ourselves. And when you want someone else to do something, and they don't do it, that ends it - you don't have a chance to actually do it, watch the results, think about it some more, and build an iterated tower of philosophy. So that's why I once said that you'd have to try to be a Friendly AI to get as close as you could to understanding one.
People said all sorts of ridiculous things about AIs that they'd never dream of saying of themselves. It turned out that empathy wasn't good enough to understand AI, not even close. Even so, it can always be worse, and just making up ridiculous stuff at random without even empathic rationality-checks is indeed worse.
How would you apply a rationality-check to a decision you wanted someone else to make? I'm not sure what you mean by asking this. You can look at a chain of thought and try to figure out whether it's 'rational' according to your current understanding of rationality, right? So this happens differently, depending on whether the chain of thought is something you're wondering whether to adopt yourself, or something that you want someone else to adopt. If your question is more complex than that, you'll need to amplify.
Hmm. Yeah, thanks. I think that covers it. --Chris Capel
It seems to me that the objective/subjective distinction is a very unsophisticated distinction as it stands in common knowledge, but one that, in a sophisticated form, permeates the understanding of the creation of AGI. Therefore, this quote from CFAI puzzles me:
- And would science - the structure of hypothesis and experiment - be useful to AIs? Or would AIs simply have no need of it? Scientists prefer the objective to the subjective because mistaken human theories typically rely on subjectivity as an excuse to avoid negative reinforcement; would an AI biased towards objectivity learn faster, or learn more interesting things, than an AI without that bias?
Is it meaningful to say that an AI might be "biased toward objectivity"? Isn't objectivity, in the sense of trying to produce a cognitive model of reality that mirrors reality itself as closely as possible (or at least is maximally useful for forming predictions about the mind's interactions with reality), the ideal/default state of an AGI?
In the above sentence, I didn't mean "objectivity" in the sense of conforming to reality, but "objectivity" in the sense of being measurable by instrumentation. As an entire AI can regard itself as a measuring instrument, I'm now pretty sure that the distinction between objective and subjective evidence makes no sense whatsoever to a well-designed AI. There is just Bayesian evidence. The writing in CFAI only makes sense in contrast to what humans think of as the normal state of affairs, where there's a distinction between anecdotes and experiments, or between opinions and analyses.
I agree. When you are defining objectivity as limiting oneself to exact, reproducible measurements in reasoning, the distinction between the objective and the subjective is only useful for humans. It's an anthropomorphisation to apply that to an AI. The reason humans have subjectivity is that they take sensory information from reflective sources of consciousness into account in their reasoning, which are not exactly quantifiable, aren't comparable between individuals, and aren't reproducible, whereas an AI has access to the exact representation of its own "intuitions" (optimizing hueristics), thoughts, and whatever analogue of emotions that may prove useful, and therefore can use them as a reproducible, quantifiable, exactly communicable source of sensory information. That was my default assumption in reading the passage; hence my confusion.
Really frickin' naive question:
What remains to be done in Friendliness theory before coding can begin on the actual Seed AI?
-- Emil Gilliam (yes, yes, I know I'm acting all impatient)
As a juvenile informed about the Singularity but unknowledgable about nonfinancial assistance... yeah. What can I do?
See Controversial Pages/What Can The Average Person Do For The Singularity and Singularity Task Lists. --observer
---
This really is a minor point, and I could be wrong, but in your essay Coherent Extrapolated Volition, I believe there may be a typo in the following quote:
- Where our wishes cohere rather than interfere: Coherence is not a simple question of a majority vote. Coherence will reflect the balance, concentration, and strength of individual volitions. A minor, muddled preference of 60% of humanity might be countered by a strong, unmuddled preference of 10% of humanity. The variables are quantitative, not qualitative.
- SL4 Wiki: Coherent Extrapolated Volition
Shouldn't it be the other way around? The variables are qualitative, not quantitative? As in, the specific characteristics of the specific volitions (regarding spread, muddle and distance) are taken into account, not merely the quantity of volitions exhibiting a certain preference (as you put it, a "majority vote")? As I'm writing this I'm wrapping my head around an alternate way of understanding the situation, in which it would make sense to use quantitative rather than qualitative, but I've already written this much so I think I'll press on.
--Dylan Oldenburg