Thomas Bayes was an 18th century British statistician, philosopher and Presbyterian minister. He’s known today because he formulated Bayes’ Theorem, which has since given rise to Bayseian probability, Bayseian inference, Bayseian epistemology, Bayesian efficiency and Bayseian networks, among other things.
The reason I bring this up is that philosophers, especially the ones who concentrate on logic and the theory of knowledge, often mention something Bayseian, usually in glowing terms. It’s been a source of consternation for me. I’ve tried to understand what the big deal is, but pretty much failed. All I’ve really gotten out of these efforts is the idea that if you’re trying to figure out a probability, it helps to pay attention to new evidence. Duh.
Today, however, the (Roughly) Daily blog linked to an article by geneticist Johnjoe McFadden called “Why Simplicity Works”. In it, he offers a simple explanation of Bayes’ Theorem, which for some reason I found especially helpful. Here goes:
Just why do simpler laws work so well? The statistical approach known as Bayesian inference, after the English statistician Thomas Bayes (1702-61), can help explain simplicity’s power.
Bayesian inference allows us to update our degree of belief in an explanation, theory or model based on its ability to predict data. To grasp this, imagine you have a friend who has two dice. The first is a simple six-sided cube, and the second is more complex, with 60 sides that can throw 60 different numbers. [All things being equal, the odds that she’ll throw either one of the dice at this point are 50/50].
Suppose your friend throws one of the dice in secret and calls out a number, say 5. She asks you to guess which dice was thrown. Like astronomical data that either the geocentric or heliocentric system could account for, the number 5 could have been thrown by either dice. Are they equally likely?
Bayesian inference says no, because it weights alternative models – the six- vs the 60-sided dice – according to the likelihood that they would have generated the data. There is a one-in-six chance of a six-sided dice throwing a 5, whereas only a one-in-60 chance of the 60-sided dice throwing a 5. Comparing likelihoods, then, the six-sided dice is 10 times more likely to be the source of the data than the 60-sided dice.
Simple scientific laws are preferred, then, because, if they fit or fully explain the data, they’re more likely to be the source of it.
Hence, in this case, before your friend rolls one of the dice, there is the same probability that she’ll roll either one. With the new evidence — that she rolled a 5 — the probability changes. To Professor McFadden’s point, the simplest explanation for why she rolled a 5 is that she used the dice with only 6 sides (she didn’t roll 1, 2,3, 4 or 6), not the dice with 60 sides (she didn’t roll 1, 2, 3, 4, 6, 7, 8, 9, 10, . . . 58, 59 or 60).
Now it’s easier to understand explanations like this one from the Stanford Encyclopedia of Philosophy:
Bayes’ Theorem is a simple mathematical formula used for calculating conditional probabilities. It figures prominently in subjectivist or Bayesian approaches to epistemology, statistics, and inductive logic. Subjectivists, who maintain that rational belief is governed by the laws of probability, lean heavily on conditional probabilities in their theories of evidence and their models of empirical learning. Bayes’ Theorem is central to these enterprises both because it simplifies the calculation of conditional probabilities and because it clarifies significant features of subjectivist positions. Indeed, the Theorem’s central insight — that a hypothesis is confirmed by any body of data that its truth renders probable — is the cornerstone of all subjectivist methodology. . . .
To illustrate, suppose J. Doe is a randomly chosen American who was alive on January 1, 2000. According to the United States Center for Disease Control, roughly 2.4 million of the 275 million Americans alive on that date died during the 2000 calendar year. Among the approximately 16.6 million senior citizens (age 75 or greater) about 1.36 million died. The unconditional probability of the hypothesis that our J. Doe died during 2000, H, is just the population-wide mortality rate P(H) = 2.4M/275M = 0.00873. To find the probability of J. Doe’s death conditional on the information, E, that he or she was a senior citizen, we divide the probability that he or she was a senior who died, P(H & E) = 1.36M/275M = 0.00495, by the probability that he or she was a senior citizen, P(E) = 16.6M/275M = 0.06036. Thus, the probability of J. Doe’s death given that he or she was a senior is PE(H) = P(H & E)/P(E) = 0.00495/0.06036 = 0.082. Notice how the size of the total population factors out of this equation, so that PE(H) is just the proportion of seniors who died. One should contrast this quantity, which gives the mortality rate among senior citizens, with the “inverse” probability of E conditional on H, PH(E) = P(H & E)/P(H) = 0.00495/0.00873 = 0.57, which is the proportion of deaths in the total population that occurred among seniors.