Leo has been fascinated for some time by the probabilities of things. He’s telling us all the time about the probability that, for example, the light will turn red before we get to it, or a meteor will destroy the earth tomorrow. Of course, he’s totally making up the numbers, although he does know that a probability lies between 0 and 1 (inclusive), and that when something can’t possibly happen, it’s p=0.0, and when it either is sure to happen, or has already happened, it’s p=1.0 He also knows the general principle that p of n equally-likely events is 1/n, and he sort of gets the p of complex things generally in the right range. (For example, he puts the probability of the meteor at 0.0000…lots of zeros…and then a 1.)

Given that he’s fascinated by probabilities, my main principle of teaching, “Catch ’em while they’re interested!”, suggests taking this to the next level, so today we started into Bayesian probability updating. In order to do this, I had to devise a particularly simple example. Biomedical tests (a typical example) aren’t really gonna work for him. So I invented a little card game, that goes like this:

Take N ordered cards … I’ll start with 3 for these examples … so, say J Q K of the same suit. Could be 1 2 3, but I don’t want to get all confused with the numbers, so actually, I’m just gonna call them A B C. Shuffle them and lay them face down. Now, the question (hypothesis, in Bayesian terminology) is whether they have been laid out in order: ABC. p(H) at this point is obviously 1/6 (could be any of ABC ACB BAC BCA CAB CBA) — this is our prior.

Okay, so we turn over the first card. In the easy case, it’s not an A. Intuitively, p(ABC|first<>A) should be zero, and indeed, if we use Bayes’ rule to compute p(first<>A|ABC) = 0, so the whole equation just obviously becomes zero, regardless of the other terms.

What? Oh, you’ve forgotten Bayes’ rule….Here you go:

p(H|D) = (p(H)*p(D|H))/p(D)

So, one more time, p(D|H) = p(ABC|first<>A) = 0, so regardless of the other numbers in the equation, the result is 0. (Assuming p(D) isn’t zero, which it never is in any real problem.)

Okay, now suppose first = A. What’s the “posterior”, that is the new p(ABC|first=A)? Now we have to ask about p(D) and p(D|H). p(D|H) = p(first=A|ABC) = 1 because it had to be an A if the hypothesis was true. Okay, so what’s p(D). Well, that’s just 1/3rd since there are three letters (ABC).

(You might be lured into thinking that P(D) is 1/2, since you’ve used up one of the letters (A), but that would be P(D|first=A), but at *this* stage we have all three letters. It’s common to confuse the order of explanation with the order of term evaluation in the equation! Just p(D) is unconditional at this stage, and is just the probability of the data given just the structural properties of the situation. At this stage that is all three letters, so 1/3, whereas at the *next* stage, it’ll be only two letters.)

Recall that p(H) was 1/6th (six possible sequences, as above).

So you end up with:

p(ABC|first=A) = (1/6 * 1)/(1/3) = 3/6 = 1/2

(Remember how to divide fractions? Invert and multiply!)

So there’s now a 1/2 = 0.5 = 50:50 chance that it’s the right sequence, i.e., ABC given that the first card is an A. That is: p(ABC|first=A)=0.5. This accords with intuition since there are now only two possible orders for the B and C cards.

Now, when we turn over the next card, and see that it’s B. At this stage, p(D) = 1/2 (only two possible letters!), and p(D|H) = 1, and the p(H) = 1/2 from the previous stage (that stage’s posterior is this stage’s prior!) = 1/2. (This could also be written p(D|first=A)). So we have p(ABC|first=A & second=B) = (1/2)/(1/2) = 2/2 = 1. Tada!

And, of course, if we’d turned over C here, again p(D|H) would be 0 and the whole thing would come to a crashing zero again, which, again, correctly accords with intuition!

It’s fun to go to very large numbers, which Leo also likes. That’s easy to do using cards by just asking exactly the same question as above, but using *all 52* cards, that is, p(ABCDEF…). We know that, given a fair shuffle, this is very very very very small. (Here’s a terrific video about how very very very very small a given ordering from a shuffle really is!) I’ll call this “epsilon” (a number very very very very very close to zero!)

So in this case p(H) = epsilon (~=0), and p(D) = 1/52 the first time through.So although getting an Ace of Spades (assuming that’s what you are counting as the first ordered card) multiplies the probability 52 times, it’s still very very very very small, and will continue to be so for a very very very long time!

So, now I have to start engaging Leo when he generates random probabilities, for example of meteorites hitting the earth tomorrow, to reenforce his Bayesian thinking.

Postscript: Just a couple days after I posted this, the medical case actually came up in conversation. Leo was reading a book about a little girl who became deaf due to meningitis, and he became worried about whether he had meningitis, and how he would know. So we talked about how very unlikely it is in the first place (p(H) [prior] is very small), and how if he just had a headache, that made it more likely, but not a whole lot more likely (Although p(D|H) is high, p(D) for a headache is also pretty high, so they more-or-less cancel.) But that if there were two symptoms, like a stiff neck AND a headache … etc … and we sort of verbally worked through the equation, at least intuitively. (This was a car conversation, so I wasn’t about to whip out a pencil and calculator, much less real incidence data about headaches and meningitis.) He ended up being a lot less worried about having meningitis, so I guess that Bayes’ rule is actually good for kids to understand, at least intuitively.