(This post will be a more in-depth explanation of something I was trying to get across in much of the Rapture of the Nerds essay.)
Tim is a famous geologist. Tom is a famous clown. Tim gives us a theory about rocks. We judge it to be 90% probable. In a parallel universe, Tom gives us the same theory about rocks. We judge it to be 10% probable.
Jim gives us a theory about fish and presents a full technical case that is good — the facts all fit. In a parallel universe, Jom gives us a theory about fish and presents a full technical case that is bad — it needs coincidences or leaps of logic. We judge Jim’s theory to be 90% probable. We judge Jom’s theory to be 10% probable.
These two situations might seem the same. In the first case, we used only indirect evidence — the theorist’s credentials — to assess probabilities. In the second case, we used only direct evidence — the known facts of the matter — to assess probabilities. Both are useful kinds of evidence. But there is an important difference.
Suppose we ask Tim and Tom to make a full technical case. Tim the geologist gives us a full technical case that is, as expected, quite good. Tom the clown, in his own parallel universe, gives us the same full technical case — one much better than we expected from a clown. Since a full technical case relies in no way on authority, we put the same probabilities on Tim’s claim and Tom’s claim. Anything else would be unreasonable.
Suppose we ask Jim and Jom about all of their credentials. It turns out their credentials are exactly the same. Maybe they’re both equally famous clowns, who both took a course in marine biology once — surprising in Jim’s case, given that his arguments are so good. Or maybe they’re both famous marine biologists of exactly equal fame and competence — surprising in Jom’s case, given that his arguments are so bad. None of this matters for our probabilities. Again, we already have a full technical case, and a full technical case relies in no way on authority. Jim’s theory is still 90% probable, Jom’s theory still 10% probable.
So once we knew Tim and Tom’s full technical arguments, their credentials no longer mattered. But once we knew Jim and Jom’s full credentials, their technical arguments still mattered. Technical arguments and credentials are useful types of information individually, but when both types are available, one trumps the other.
If I’m not mistaken (but I need to read up on this!), what I’ve been doing here is just repeating the definition of “screening off” from the theory of causal diagrams. If we have three variables (A, B, C), and A and C are independent conditional on the value of B, then B screens off A from C, and A and C do not cause each other. In the authority example of this post, you could see the causality running as follows. If a theory is true, that causes the technical case for it to be good. If people have good credentials, that causes them to adopt theories for which the technical cases are good. But causality does not run directly from truth to adoption by people with good credentials, or from adoption by people with good credentials to truth.
Maybe this all sounds like a complicated way to make a simple point, but it matters, because people’s intuitions sometimes get it all wrong. If an idea is adopted by silly people, or is not adopted by competent people, that is seen as a “bad point” that is weighed against the “good point” of solid technical argumentation. But this weighing makes no sense — to a rational thinker, the “bad point” counts until the “good point” arrives, and is then annihilated. In real life, everything interesting is a mix of things you’ll always have to take on authority and things you can check for yourself, but you can still apply this insight.
Update: see the comments for some corrections and clarifications.
Update to the Update: Here’s Eliezer Yudkowsky’s reduxification at Overcoming Bias.
This just isn’t how Bayes law should be used in this case.
Your judgment of the evidential value of the credentials and of the argument are both noisy, and apparent contrasts between the conclusions reached by consulting each piece of evidence count against the certainty of both judgments.
In other words, seemingly ridiculous arguments from well credentialed people are more credible than such arguments from badly credentialed people because in the former case it is more likely that your judgment that the argument is ridiculous is in error.
I don’t think we really disagree. I guess when I say “we have a full technical case” I also mean we fully understand it. In that case authority really shouldn’t matter. In practice, as I was getting at in the end, we have only partial information about the quality of the full technical case. That means the screening off also destroys only part of the correlation between authority and truth. But the effect is still there. If a professor and a clown both present a case that we don’t fully understand but that seems clearly ridiculous, the difference between our probability of the correctness of both should be larger than zero, but it should be smaller than it was before both presented their case, and that’s the effect I was trying to explain.
Agreed then.
I think you’re misunderstanding Bayesian probability theory here. The piece of evidence against Tom’s theory (is this Tom based on me?)- he has no credentials- is still there regardless of how nice an explanation he gives. The only way for Tim and Tom’s probabilities to be equal is for there to be evidence which weighs more in Tom’s favor than in Tim’s favor; eg, Tom’s technical explanation is more evidence than Tim’s because Tom is less able to make up fake evidence using fancy-sounding words. What happens here is that the more evidence accumulates, the less new evidence matters- 1 dB of evidence will only shift a 99% probability to a 99.2% probability, but it will shift a 50% probability to a 55.7% probability. The reason technical explanations dominate credentials is that technical explanations are much *stronger* evidence- a technical explanation may be composed of a hundred different points, all of which can be checked for accuracy (providing evidence each time), while credentials can only be used once. Thus, it is possible for a technical explanation to make credentials irrelevant, but not the converse.
No, completely different Tom.
It’s true that technical explanations are often much stronger evidence, but I’m not necessarily assuming that a full technical case will bring the probability to either 0 or 1. Sometimes all human reasoning is incomplete like that; fundamental indeterminism would be one example, but in practice it’s probably like this at other levels of modeling, too. (I guess any problem would count as long as it made expertise useless.)
I still disagree with this: “The piece of evidence against Tom’s theory (is this Tom based on me?)- he has no credentials- is still there regardless of how nice an explanation he gives.” A full technical case can make credentials irrelevant *even when (for reasons like fundamental randomness) it doesn’t bring probabilities to 1 or 0*.
This probably isn’t too clear. I should think about it more.
Other than fundamental indeterminism it could also just be lack of information; maybe we just didn’t perform the right experiments yet. Like, a clown saying some stuff we don’t understand about space and time being relative should be taken less seriously than Einstein saying the same thing. But if he fully spelled out general relativity and we fully understood it, we should no longer care whether he’s a clown or Einstein, and we still wouldn’t know whether he was right until we looked at Mercury bending light from the sun or whatever it was they did.
Oops, looks like I inadvertantly created an HTML comment. Please delete previous post.
One tiny little quibble. “If a theory is true, that causes the technical case for it to be good. If people have good credentials, that causes them to adopt theories for which the technical cases are good.” You don’t want to set up your DAG (directed acyclic graph; what causal models use) like this:
Truth -> Good Case <- Good Credentials
This would suggest such odd things as that if you have good credentials, we have explained away the goodness of your case; or that if you don’t know whether the case is good, the probability of truth and good credentials should be independent.
You want:
Truth -> Good Case -> Credentialed Speaker
Truth causes a case to be good, which causes a credentialed speaker to adopt the idea. So we can infer truth by seeing a credentialed speaker, but only if we don’t have full information about how good the case is.
McCabe, see http://bayes.cs.ucla.edu/home.htm for what Steven is getting at.
Eliezer: The DAG I was using was:
Good Case Credentialed Speaker
Seen in this way, both a good case and a credentialed speaker are evidence for truth; if there’s a really good case, the “credentialed speaker” evidence doesn’t go away, but it becomes less relevant as we already have all the evidence we need. Your model sounds more like:
Truth -> Good Case -> Sounds like a good case
-> Credentialed speaker
Even if something is a good case, we may be fooled and decide it is not; conversely, something may be crackpottery and may fool even trained scientists (eg, Piltdown Man). If something sounds like a really good case, even to really smart people who have studied the issue, and it makes technical predictions and what not, this does not *eliminate* the evidence of “having a credentialed speaker”; it just makes it much less relevant, because not many people care about the difference between a 99.99% probability and a 99.999% probability.
“A full technical case can make credentials irrelevant *even when (for reasons like fundamental randomness) it doesn’t bring probabilities to 1 or 0*.”
Being irrelevant isn’t the same thing as having zero effect; if there’s a random tornado in Kansas in 1934, that knowledge may have some finite effect on our probabilities (eg, the scientist’s grandparents may have seen the tornado and were inspired to read books on rationality and teach their descendants about truth-finding), but that doesn’t mean the effect has to be large; it could be 10^-20 dB of evidence, or some other equally ridiculous figure.
Oops- the formatting for the DAGs was meant to be-
[Truth (implies) Good Case (and) Credentialed Speaker]
and
[Truth (implies) Good Case (implies) Sounds Like A Good Case (and) Credentialed Speaker]
respectively.
Eliezer, you’re right about the graph being wrong. I considered a few different versions and concluded for some reason that it didn’t matter as long as truth and credentials weren’t directly connected.
Tom, are you saying the evidence becomes small in terms of decibels or just small in terms of differences in P?
I agree; see my earlier response to Michael Vassar. But I don’t agree that this is *just* because decibel differences translate to smaller P differences near 0 and 1. It seems to me that common sense says you can be only 50% certain after hearing a technical case, and still care less about credentials than before you heard the case.
“It seems to me that common sense says”
Here is a list of all the ways in which common sense will betray you:
http://en.wikipedia.org/wiki/List_of_cognitive_biases
“Tom, are you saying the evidence becomes small in terms of decibels or just small in terms of differences in P?”
Some evidence is small in terms of decibels; some evidence is not. Any given piece of evidence will become less significant (in terms of P) as probabilities move towards 0 or 1.
EDIT: I did not mean to suggest Wikipedia’s list was exhaustive.
Common sense isn’t perfect, but often it points to a failure of modeling. Do you agree with me that different pieces of evidence (in this case, argument quality and credentials) are not necessarily independent, so that it’s not necessarily appropriate to just add up the decibels from their respective marginal/unconditional probability distributions? If yes, then we can argue about how much that applies in practice to arguments from authority/ad hominem/etc. If no, then see all the Bayes networks stuff.
“Do you agree with me that different pieces of evidence (in this case, argument quality and credentials) are not necessarily independent”
I already pointed out one case in which they are not independent (if credentials make you much more capable of spouting important-sounding nonsense, then having good case + credentials may be less evidence than just having a good case, even though just having credentials would be better than having nothing). My point was, in the *absence* of such an effect, the piece of evidence “credentials” still shows up in the probability calculus just as strongly as before (in logarithmic terms). If there *is* such an effect, please specify what it is.
I guess the better the case made, the greater the evidence that the lack of credentials simply happened not to influence this particular judgment. Does that make sense?
The reason authority is useful information in the first place is that we expect authorities to see the arguments we missed. The more evidence that no arguments were missed, the less authority matters.
Pingback: Black Belt Bayesian » More Authority