Bayes Theorem
From the SL4 Lexicon:
Bayes' Theorem:
Fundamental probability theory. Bayesians sometimes argue that Bayes' Theorem usurps Karl Popper's logical positivism as the best strategy for truthseeking in science (or anywhere else, for that matter). Sometimes called "Bayes' Rule" or the "Bayesian Probability Theorem". See also An Intuitive Explanation of Bayesian Reasoning, Bayes Theorem.
Bayes' Theorem is a theorem in probability theory that can be used to update subjective probabilities when new evidence is obtained. It states:
P(A|B) = P(A) * P(B|A) / P(B)
where P(X|Y) is the Conditional Probability of X given Y. P(A) is also called Prior Probability and P(A|B) Posterior Probability. Thus, "posterior is proportional to prior times likelihood".
Alternate statements of Bayes' Theorem include:
P (A | B) = P(A & B) / P(B)
P(A | B) = P(B | A) * P(A) / (P(B | A) * P(A) + P(B | ~A)*P(~A))
This is the special case of Bayes's Theorem that Rev. Thomas Bayes wrote about. There is also a Generalized Bayes Theorem.
Eliezer Yudkowsky has written an excruciatingly gentle introduction to Bayes' Theorem. A good follow-up to reading this is Curtis Brown's Bayes and the philosophy of science.
Rationality is the application of Bayes Theorem to decision theory.
There are some Bayes Theorem Calculators.
"the fundamental mathematical law governing the process of logical inference" bayes.com
Bayes' Theorem is the secret of the universe. I don't have the time to explain that right now, but I thought you might want to know. -- Eliezer Yudkowsky
Yep, it is, in fact, the secret of the universe. I used to be skeptical, but I've seen enough to believe that it's true. I can't explain right now, so if you don't take Eliezer's word, take mine. -- Gordon Worley
I recall from my expert system days that Dempster-Shafer Theory is used a good bit in AI type systems to give better probabilities when working with probabilistic logic. -- Gary Miller
Believe it or not: Bayesian filk (PDF). -- Mitchell Porter
Sweet. Nice find, Mitch. -- Anand
P(A|B)=P(A&B)/P(B) is *not* Bayes theorem (or Bayes' formula, as I like to call it), it's just the *definition* of conditional probability. From conditional probability it's easy to demonstrate something called the Total Probability Theorem, from which Bayes' formula is a step away. The general Lack Of Mathematical Rigour in the statistics material quoted from this wiki has been bothering me; Bayes' formula only makes sense for a partition (a family of sets of empty intersection and whose union yields the universe generating the sigma-algebra containing the family), a fact that's essentially ommitted in the name of keeping the maths accessible. -- Diego Navarro
Given the bad state of statistics here, I'm starting Kolmogorov Probability For Dummies. I hope it's taken as a positive and good-willed contribution. --Diego Navarro
A riddle that might lead to a paradox for a bayesian definition of probability. What's P(A|B) when P(B)=0? More will be revealed when some answers from the wikizens have been posted --Diego Navarro
Typically P(A|B) is undefined if B is considered impossible. If you have conditional probabilities as your primitives you can have P(A|B) well-defined with P(B)=0 although I'm not sure what use it'd be. -- Nick Hay
Actually, P(A|B) = P(A) for P(B)=0 in Kolmogorov (that is, measure-over-sigma-algebra, not Cox/conditional probabilities as primitives) probability theory. Any manual will tell you that. The problem is, if probability is taken to be a generalized logic, P(A|B) for P(B)=0 should be 1, due to the true-by-vacuity principle of deductive logic. That, of course, breaks all of probability theory. Take that, Jaynes! : P --Diego Navarro
Why does P(A|B) = P(A)? Could you give an explanation or a particular reference? Googling I find either P(A|B) is undefined if P(B) = 0, or that P(A|B) can be defined using Radon-Nikodym derivatives (I unfortunately lack the measure theory to understand this). Is there a natural definition of conditional probability which handles conditioning on non-empty sets of measure 0?
P(A|B) = 1 forall A iff B is logically false (I suspect). If P(B) = 0 yet B remains possible (e.g. a uniform distribution [0,1]) there is no logical necessity for P(A|B) = 1 (nor does P(A|B) = P(A) always make sense). If B is logically false (e.g. the empty set in a sigma algebra) it seems conditional probabilities make no sense.
When dealing with finite sets of propositions (answering questions about infinite sets with limits) you don't seem to have this complication, allowing you to leave P(A|B) undefined if P(B)=0. --Nick Hay
A good reference on Kolmogorov-tradition theory of probability is "Probability" by Albert Shiryaev and Ralph Boas (who is the author of my favorite text in analysis, 'A primer on real functions'). I actually have to ask one of my professors why P(A|B) is defined in the textbooks as P(A). From a conditional-probabilities-as-primitives, I'd venture a guess that the law of preservation of relative chances requires that P(A|B)/P(Z|B)=P(A)/P(Z), but I have no background in Cox probability whatsoever. -- Diego Navarro
Reading through Shiryaev's book, it seems P(A|B) = P(A) need not hold even if P(B) = 0. He gives a motivating example at the start of section 2.7: Conditional Probabilities and Conditional Expectations with Respect to a sigma-Algebra. With X the probability of a coin landing on heads, a random variable in [0,1], and K the number of coin flips that are heads, we have P(K=k|X=x) = C^k_n x^k (1-x)^(n-k). Conditioning on the probability of heads results in a binomial distribution (different distributions for different probabilities). This holds regardless of whether P(X=x) = 0. Apparently. I don't understand the integrals, so I don't understand the actual definition of conditional probability here.
I still don't see any conflict with probability-as-logic. The events X=x aren't (necessarily) logically impossible even though they may have probability zero. This weird situation is a result of using infinite sets of propositions. Probability-as-logic uses finite sets of propositions, extending to the infinite via limits. In the finite case probability zero means impossible, so there's no problem with conditioning on probability 0 propositions being undefined or nonsensical.
To derive the above result about coin flips as a limit of finite situations, replace a biased coin with an N-faced shape (e.g. a generalised die) which is equally likely to land on any of its faces. This models a biased coin, with the number of sides marked H the bias. Our uncertainty about the probability of heads is uncertainty about the number of the number of sides marked H. By increasing the number of sides we can replicate any distribution over [0,1] arbitrarily closely. In the infinite limit some probabilities may tend to zero, but this is no particular problem.
Incidently, P(A|B)/P(Z|B) = P(A)/P(B) shouldn't hold in general either: only when P(B|A) = P(B|Z). -- Nick Hay
P(A|B) with P(B) = 0 is, by the traditional laws of probability where P(A|B) is defined as P(A&B)/P(B), 0/0 and therefore undefined. If you want to make a statement about a limit, then start with a well-defined finite case and prove the fraction converges. If you want a nontraditional probability theory where the conditional probabilities are primitives, better make sure it's well-defined. But first, show me a real-world case where I care. If you give me an application of this in the real world, then I can judge your math answers by how well they work to manipulate the real world. I'm not in this (just) for the fun of it.
I can think of at least one (hard-to-explain) case where I'd want to condition on a known logical impossibility - but though such cases arise in decision theory, they should never arise in pure probability theory. The case I have in mind would require that my primitive representation be a causal graph with mechanistic nodes and background unknowns that obey the causal Markov condition, and that I compute the consequences of impossible postulates using Judea Pearl's surgery over causal graphs to model counterfactuals. -- Eliezer Yudkowsky
"Incidently, P(A|B)/P(Z|B) = P(A)/P(B) shouldn't hold in general either: only when P(B|A) = P(B|Z). --Nick Hay" Yes. I should have said P(A|B)/P(Z|B) = P(A&B)/P(Z&B). I must be losing one or two screws due to lack of sleep. (Will the singularity help me stay 30 continuous hours awake and not experience drowsiness? Count me in, then!) The "probability as logic only works for finite sets of propositions" explanation also holds well enough for me. I'm satisfied. --Diego Navarro
It's probably a bit late to be mentioning this, but I happen to be an infinite set atheist. I mean, has anyone ever seen an infinite set? Also, P(A|B) is not equivalent to the probability of the material implication P(B->A) = P(~B \/ A), so the true-by-vacuity principle is irrelevant. This is just a verbal confusion of two different meanings of the term "given" or "implies". -- Eliezer Yudkowsky