For those of you who actually came here because of the title, this is self-recommending.

# Category Archives: Probability

# A New Challenge to 98% Confidence

According to A New Challenge to Einstein, General Relativity has been refuted at 98% confidence.

I wonder if it wouldn’t be more accurate to say that, actually, 98% confidence has been refuted at General Relativity.

# Vysochanskij-Petunin Inequality

So that sounds pretty esoteric. Apparently the probability of all events more than X standard deviations away from the mean is at most 4/(9X^2), as long as X isn’t too small and the distribution is unimodal (one-humped). Good to know if you’re trying to figure out how impressed to be by, say, a four-sigma-plus event.

In other news, I’ve finally scored the Aumann Game.

# Aumann And Stability

Aumann’s agreement theorem, which I’ve mentioned here a few times before and which has informed many discussions on Overcoming Bias, states:

If two (or more) Bayesian agents are uncertain about some proposition, and their probabilities are common knowledge, and they have the same priors, then their probabilities must be equal,

even if they have different information.

Concretely, as agents trade probability estimates back and forth, conditioning each time on the other’s estimates, probabilities will tend to the same number.

This is a surprising result that has often been summarized as saying “rational agents cannot agree to disagree”. I think there are some problems with applying the theorem this way in practice that haven’t been highlighted enough.

# Relevance Isn’t Transitive

… for the same reason that probabilistic correlation isn’t transitive.

Whether Japan would have surrendered without nuclear weapons in WW2 is relevant to how we should think about nuclear weapons and morality. How likely the Emperor’s dog was to suddenly die on Aug 10, 1945 is relevant to whether Japan would have surrendered without nuclear weapons in WW2. But how likely the Emperor’s dog was to suddenly die on Aug 10, 1945 is not relevant to how we should think about nuclear weapons and morality. Not even a little bit.

If we looked closely, what percentage of the internet would we find is devoted to debating the health of the Emperor’s dog?

# Always 40 Years Away

An earlier post pointing to a Nick Bostrom paper led to a discussion on the observation (true or false) that the predicted date of a technological singularity has been receding by one year per year, so that it’s always the same number of years away. Rolf Nelson wrote:

[T]his is a common objection that I’ve never understood. What hypothesis is this alleged pattern Bayesian evidence for, and what hypothesis is this alleged pattern Bayesian evidence against? If there’s a hard takeoff in 2040, I guarantee you that someone in 2039 will publish a prediction that AGI is still 40 years away.

Also, some amount of “prediction creep” is rational to the extent a phenomenon resembles a one-shot poisson process. Suppose my personal life expectancy today is that I will die, on average, at age 78.58236. If, tomorrow, I find that I haven’t died yet, I should slightly increase it to age 78.58237. I *expect* this expectation value to increase slightly on every day of my life, except for the day I die, during which the expectation value decreases drastically.

Further discussion brought up the exponential distribution and its property of “memorylessness”, meaning that the distribution conditional on failure from time 0 up to time t looks like an exact copy of the original distribution shifted t to the right. (If you have a Poisson process, i.e. one where events occur independently and are equally probable at each time, then the probability distribution for when the first event happens is an exponential distribution, so this is the same thing Rolf said.)

The question now is, if according to experts (chosen in a certain defined way) AI has always been 40 years away, what does this prove? I don’t have a full answer, but I do have some comments.

- In a distribution that’s like the exponential (and with a 40-year mean) but thicker-middled and thinner-tailed (in some sense that I’d have to think about how to quantify), upon observing 40 years of failure, the conditional distribution will be concentrated in its head and will have a mean less than 40 more years away. This is the sort of distribution you get if, e.g., the amount of work that remains to be done is a known quantity, and a random amount of work is being done i.i.d. in each time period.
- If this is true it makes the experts look OK; on reflection they made no mistake in predicting a mean of 40 years.
- In a distribution that’s like the exponential (and with a 40-year mean) but thinner-middled and thicker-tailed, upon observing 40 years of failure, the conditional distribution will be concentrated in its tail and will have a mean more than 40 more years away. This is the sort of distribution you get if, e.g., you add a couple of different exponential distributions with means spread around 40 years to represent the experts’ ignorance of what mean they really should have predicted. The increase in the predicted time until the event would represent their learning that, apparently, the exponential distribution with higher mean than 40 years is a better model.
- If this is true it makes the experts look bad; they underestimated the difficulty of the problem.
- In reality, again if the observation is true, some combination of the above two effects is probably in play, with the effects on the mean canceling each other out. (Intuitively I’d say that if they cancel out, you need “more” of the second effect than the first effect.)
- But if neither effect is going on and we’re seeing a simple exponential distribution in action, what is that Bayesian evidence for? Well, according to this distribution, if it started 40 years ago then with probability 1-1/e AI should have happened already. According to the alternative hypothesis that 1) experts are like tape recorders programmed to say 40 years whatever the facts are and 2) there’s actually a negligible chance of it happening soon, AI should have happened with probability 0. The evidence favors the latter hypothesis over the former hypothesis with a Bayes factor of e. (Of course, you have to consider priors.) So I’m not sure that pointing at memorylessness gets you far; instead of “why haven’t predicted times decreased?”, you get the question “why hasn’t it happened yet?”.
- Of course, everything depends on how you pick what expert predictions to listen to. Just because by one method you can get suggestive evidence that the experts have been over-”optimistic”, it doesn’t follow that a more sensible method would yield more “pessimistic” predictions for the future.
- Just extrapolating to “AI will
*always*be ’40 years away’” is all kinds of naive — I think everyone here can agree on this.

# Sun Fine-Tuned?

New Scientist reports that Charles Lineweaver and others looked at 11 properties of the Sun and calculated that their combined “typicalness” relative to nearby stars is actually above average, concluding that the Sun has no properties fine-tuned for life. If so, there goes yet another potential part of an explanation for the Fermi Paradox.

The paper turns out to be on ArXiv, and there are some of the usual annoying interpretation-of-statistics issues.

The sun is heavier than 95% of other stars, and it’s been suggested this has an anthropic explanation; but the authors argue that because a joint chi-square test on all 11 independent properties comes out below average, the high mass is apparently a result of random chance.

If you ask me, that’s pure Bayesphemy.

The paper itself states:

Mass is probably the single most important characteristic of a star. For a main sequence star, mass determines luminosity, effective temperature, main sequence life-time and the dimensions, UV insolation and temporal stability of the circumstellar habitable zone (Kasting et al. 1993).

So what’s happened here is they’ve combined the data on the Sun’s atypically high mass with data (and attendant randomness) on ten other, less relevant properties. I don’t want to think about the math right now, but it seems intuitively that if you add enough properties that don’t do anything to a property that does do something in a joint chi-square test of the kind the authors used, you always have a decent shot at “showing” they don’t have a combined effect regardless of how strong the one real effect is.

Besides, if you’re interested only in the effect of mass, then how can knowing all the other properties (which are, again, independent of mass) tell you anything relevant? There’s just no information in them. I guess there could be if for some reason your probabilities for the proposition “unusual X is required for life” were positively correlated for different X. I guess if you were relying on someone’s authority when they said “high mass is important”, that person’s authority would be undermined by the evidence that other properties are unusually typical and so they may be cherry-picking properties.

Another thing I don’t immediately see them addressing is whether some properties may have been fine-tuned to be not too far from the typical range. Maybe that’s theoretically implausible in all 11 cases.

I’m not too sure of my thinking here; expect sneaky edits.

Regardless of Bayes gripes, the paper is interesting and informative. Although Lineweaver seems to be on the wrong side of the ET debate, I recommend his other stuff.

# Aumann Forever

Many of you know the drill. Others, see previous iterations. Tell me if you know of a more inspired way to generate standard questions.

Claims after the fold.

# The Aumann Game

Aumann’s agreement theorem says that Bayesian agents cannot “agree to disagree” — their subjective probabilities must be identical if they are common knowledge. This is true regardless of differences in private knowledge. When agents take turns stating their estimates, updating each time based on the information contained in the other’s estimate, private knowledge will “leak out” and the probabilities will converge to an equilibrium.

This theorem makes some big assumptions. One is common knowledge of honesty. Another is common priors. Another is common knowledge of Bayesianity. However, Robin Hanson has shown that uncommon priors require origin disputes, and has discussed agents who are “Bayesian wannabes” but not Bayesians.

It may be interesting to see how this process plays out with real humans in a simplified test bed. Below are 25 statements.

To play, for each statement, you have to say your honest subjective probability that it’s true. Make sure to take into account the estimates of previous commenters. You are strongly encouraged to **post estimates multiple times**, showing how the estimates of others have caused yours to change. We will then see whether, as the theorem suggests, everyone’s estimates converge to the same equilibrium over time, and whether that equilibrium is any good.

I’ve divided the statements into a few categories. For the “statistics” category I used NationMaster and StateMaster. For “history” I used Wikipedia. For “future”, please answer all questions conditional on no disruptive technologies like molecular nanotechnology and artificial general intelligence being invented. This makes the questions rather vague, so I’m not really happy with this category. For “counterfactual”, please answer conditional on the many worlds interpretation of quantum mechanics being true; even if it isn’t, it’s still a well-defined model, so the question is meaningful either way. For “internet”, I always included quote marks.

The answers in the “statistics”, “history”, and “internet” categories are easy to look up, but that would defeat the point. So **no peeking allowed**. Looking up any relevant information is peeking.

Discussion of the statements other than through stating probabilities is also against the spirit of the game. Feel free to ask for clarifications, though.

To reward honest estimates, in the end I may score people on the answers, using the rule where your number of points is the logarithm of the probability you assigned to the right answer.

(update: this was tried again less messily and with more suitable questions here, here, and here)

***

**Statistics**

1. Oregon has more inhabitants than Slovakia.

2. Ghana has a greater GDP (PPP) than Luxembourg.

3. In 1900, Denmark had a greater GDP per capita than Spain.

4. Ohio emits more CO2 than Poland.

5. Afghanistan has more land area than Alaska.

6. Croatia has a greater GDP per capita than Mexico.

**History**

7. George Orwell was born before 1900.

8. Vladimir Putin was born before 1955.

9. The tenth emperor of Rome wore a beard.

10. More than 5000 Americans died in the attack on Pearl Harbor.

**Future**

11. If the USA has a president in 2067, it will be a woman.

12. A 1000-qubit quantum computer will exist in 2020.

13. A nuclear (fusion or fission) weapon will be used in an attack before 2010.

14. Switzerland will join NATO before 2100.

15. Proof of life on Mars (past or present, not originating on Earth) will be found before 2050.

**Counterfactual**

16. In a randomly selected parallel Everett world splitting from ours on 1 Jan 1940, Hitler invades England before 1950.

17. IARSPEWSFOO 1 Jan 1940, Hitler invades the USA before 1950.

18. IARSPEWSFOO 1 Jan 1, a technological singularity happens before 1500.

19. IARSPEWSFOO 1 Jan 1, nuclear war kills at least ten million people in any five year period before 2000.

20. IARSPEWSFOO 1 Jan 1900, nuclear war kills at least ten million people in any five year period before 2000.

**Internet**

21. “brain” gets more google results than “heart”.

22. “Ray Kurzweil” gets more google results than “Sonic the Hedgehog”.

23. “John Paul II” gets more google results than “Ron Paul”.

24. “Iraq” gets more google results than “Italy”.

25. “death” gets more google results than “purple”.

# First Aid for P-Values

The strength of an experimental result is commonly stated in the form of a p-value. For example, if the p-value is 0.01, that means that if there were no effect, the probability of getting a result *at least as* extreme as the one actually obtained would be 0.01. This is a useful number, but it’s easily misinterpreted. What you really want to do is compare the probability of getting the exact result you got under the null hypothesis, to the probability of getting that result under the alternative hypothesis. The ratio of these two is called a Bayes factor. Only this information will allow you to rationally update your degree of belief in the null hypothesis. But from the p-value, you can’t tell.

I don’t know if this is widely known, but there’s a trick that you can use to translate from p-values to Bayes factors. Assume your prior distribution is symmetrical and peaked at the null hypothesis. Then the following formula gives you a *minimum* Bayes factor:

*- e p ln(p)*

For example, if your p-value is 0.05, your minimum Bayes factor is 0.4. That means the odds for the null hypothesis (that is to say, P(null true)/P(null false)) are multiplied by at least 0.4. A null hypothesis that starts out as 75% probable ends up being at least 31% probable. So a p-value of 0.05 isn’t nearly as bad as it sounds.

This article has a handy little table listing some other possible values. It also gives a weaker formula for a minimum Bayes factor that does not make any assumptions about the prior. This pdf article explains more about this minimum and about Bayes factors in general.

If you see a p-value quoted for a result you doubt (cough parapsychology cough), and if you know the basic technical concepts, it can be quite useful to have the formula or table at hand. Coincidental results are more common than you might think.