Bayes Rule for 8-year-olds


, , ,

Leo has been fascinated for some time by the probabilities of things. He’s telling us all the time about the probability that, for example, the light will turn red before we get to it, or a meteor will destroy the earth tomorrow. Of course, he’s totally making up the numbers, although he does know that a probability lies between 0 and 1 (inclusive), and that when something can’t possibly happen, it’s p=0.0, and when it either is sure to  happen, or has already happened, it’s p=1.0 He also knows the general principle that p of n equally-likely events is 1/n, and he sort of gets the p of complex things generally in the right range. (For example, he puts the probability of the meteor at 0.0000…lots of zeros…and then a 1.)

Given that he’s fascinated by probabilities, my main principle of teaching, “Catch ’em while they’re interested!”, suggests taking this to the next level, so today we started into Bayesian probability updating. In order to do this, I had to devise a particularly simple example. Biomedical tests (a typical example) aren’t really gonna work for him. So I invented a little card game, that goes like this:

Take N ordered cards … I’ll start with 3 for these examples … so, say J Q K of the same suit. Could be 1 2 3, but I don’t want to get all confused with the numbers, so actually, I’m just gonna call them A B C. Shuffle them and lay them face down. Now, the question (hypothesis, in Bayesian terminology) is whether they have been laid out in order: ABC.  p(H) at this point is obviously 1/6 (could be any of ABC ACB BAC BCA CAB CBA) — this is our prior.

Okay, so we turn over the first card. In the easy case, it’s not an A. Intuitively, p(ABC|first<>A) should be zero, and indeed, if we use Bayes’ rule to compute p(first<>A|ABC) = 0, so the whole equation just obviously becomes zero, regardless of the other terms.

What? Oh, you’ve forgotten Bayes’ rule….Here you go:

p(H|D) = (p(H)*p(D|H))/p(D)

So, one more time, p(D|H) = p(ABC|first<>A) = 0, so regardless of the other numbers in the equation, the result is 0. (Assuming p(D) isn’t zero, which it never is in any real problem.)

Okay, now suppose first = A. What’s the “posterior”, that is the new p(ABC|first=A)? Now we have to ask about p(D) and p(D|H). p(D|H) = p(first=A|ABC) = 1 because it had to be an A if the hypothesis was true. Okay, so what’s p(D). Well, that’s just 1/3rd since there are three letters (ABC).

(You might be lured into thinking that P(D) is 1/2, since you’ve used up one of the letters (A), but that would be P(D|first=A), but at this stage we have all three letters. It’s common to confuse the order of explanation with the order of term evaluation in the equation! Just p(D) is unconditional at this stage, and is just the probability of the data given just the structural properties of the situation. At this stage that is all three letters, so 1/3, whereas at the next stage, it’ll be only two letters.)

Recall that p(H) was 1/6th (six possible sequences, as above).

So you end up with:

p(ABC|first=A) = (1/6 * 1)/(1/3) = 3/6 = 1/2

(Remember how to divide fractions? Invert and multiply!)

So there’s now a 1/2 = 0.5 = 50:50 chance that it’s the right sequence, i.e., ABC given that the first card is an A. That is: p(ABC|first=A)=0.5. This accords with intuition since there are now only two possible orders for the B and C cards.

Now, when we turn over the next card, and see that it’s B. At this stage, p(D) = 1/2 (only two possible letters!), and p(D|H) = 1, and the p(H) = 1/2 from the previous stage (that stage’s posterior is this stage’s prior!) = 1/2. (This could also be written p(D|first=A)). So we have p(ABC|first=A & second=B) = (1/2)/(1/2) = 2/2 = 1. Tada!

And, of course, if we’d turned over C here, again p(D|H) would be 0 and the whole thing would come to a crashing zero again, which, again, correctly accords with intuition!

It’s fun to go to very large numbers, which Leo also likes. That’s easy to do using cards by just asking exactly the same question as above, but using all 52 cards, that is, p(ABCDEF…). We know that, given a fair shuffle, this is very very very very small. (Here’s a terrific video about how very very very very small a given ordering from a shuffle really is!) I’ll call this “epsilon” (a number very very very very very close to zero!)

So in this case p(H) = epsilon (~=0), and p(D) = 1/52 the first time through.So although getting an Ace of Spades (assuming that’s what you are counting as the first ordered card) multiplies the probability 52 times, it’s still very very very very small, and will continue to be so for a very very very long time!

So, now I have to start engaging Leo when he generates random probabilities, for example of meteorites hitting the earth tomorrow, to reenforce his Bayesian thinking.

Postscript: Just a couple days after I posted this, the medical case actually came up in conversation. Leo was reading a book about a little girl who became deaf due to meningitis, and he became worried about whether he had meningitis, and how he would know. So we talked about how very unlikely it is in the first place (p(H) [prior] is very small), and how if he just had a headache, that made it more likely, but not a whole lot more likely (Although p(D|H) is high, p(D) for a headache is also pretty high, so they more-or-less cancel.) But that if there were two symptoms, like a stiff neck AND a headache … etc … and we sort of verbally worked through the equation, at least intuitively. (This was a car conversation, so I wasn’t about to whip out a pencil and calculator, much less real incidence data about headaches and meningitis.) He ended up being a lot less worried about having meningitis, so I guess that Bayes’ rule is actually good for kids to understand, at least intuitively.


The Young Lisper Meets Infinite Ravens


, ,

When I was a young lisper, of, say, 16 (which at the time was quite young, although these days would be considered quite old!), I read The Little Lisper (TLL). Of course, very few folks actually read TLL to learn Lisp; It’s mostly a curiosity among those who already know Lisp, or at least among quasi-adults trying to learn Lisp efficiently.

So usually, when one read TLL one usually actually just read through the whole book in one sitting, the whole exercise taking a whopping hour. When approached in this way, TLL seems a bit silly; After a couple pages, one says to oneself: “Oh come on! This is silly! Just give me a damned language manual!” (Indeed, I was one who said exactly these words, so I wrote my own!)

Now, however, I think that I — and pretty much everyone — was approaching TLL the wrong way. Obviously it’s not meant for those who know Lisp already. But I also don’t think that it’s meant for adults (or even quasi-adults) who want to learn Lisp efficiently. I contend that TLL is actually meant for CHILDREN (Soylent green is people!), or those of us who can put our mind in the mode of being a child, which I think is actually extremely difficult.

Brief aside: When you apply for at teaching job, they ask you to describe your “educational (or teaching) philosophy”. As one might expect, this is a difficult question, and I’m not going to try to give my whole teaching philosophy. But one of its pillars is this: You only get people’s (esp. children’s!) attention for a couple of minutes at a time, so be sure to do tiny fun things, and build them up over days, weeks, months, and years to reach where you want to go.

So when I decided that it was time for Leo to learn Lisp — specifically, when he turned 8, and was proficient at those “baby languages”, like scratch and hopscotch — I deployed my afore(partially)described teaching philosophy: I started with atoms, then lists, and so on, building up Lisp concepts and programming, no “lesson” consisting of more than about 10 minutes, and usually containing only one of a couple of concepts.

This worked exceptionally well with Leo (as you’ll see below). But just today, after a couple weeks at this, and now writing some actually vaguely-interesting programs, it just hit me that I was doing The Little Lisper ALMOST EXACTLY! Of course, I wasn’t actually using TLL; rather, my teaching method enacted TLL! This is when the aforementioned realization that Soylent Green is… er… that TTL is written for children (or, again, more likely, adults with child-like minds — which probably describes Lispers of a certain age).

Okay, so this long winding dissertation is all by way of introducing Leo’s first substantial Lisp program, which we have called…


First we create two variables WORDS and PATTERNS:

(setf words '(
          Once upon a midnight dreary  while I pondered  weak and weary
          Over many a quaint and curious volume of forgotten lore
          While I nodded  nearly napping  suddenly there came a tapping
          As of some one gently rapping  rapping at my chamber door
          Tis some visitor  I muttered  tapping at my chamber door

(setf patterns '(
          (Once upon a midnight dreary  while I pondered  weak and weary  )
          (Over many a quaint and curious volume of forgotten lore  )
          (    While I nodded  nearly napping  suddenly there came a tapping  )
          (As of some one gently rapping  rapping at my chamber door  )
          ( Tis some visitor  I muttered  tapping at my chamber door  )
          (            Only this and nothing more  )
          (    Ah  distinctly I remember it was in the bleak December  )
          (And each separate dying ember wrought its ghost upon the floor  )
          (    Eagerly I wished the morrow  vainly I had sought to borrow )
          (    From my books surcease of sorrow sorrow for the lost Lenore  )
          (For the rare and radiant maiden whom the angels name Lenore  )
          (            Nameless here for evermore  )
          (    And the silken  sad  uncertain rustling of each purple curtain )
          (Thrilled me filled me with fantastic terrors never felt before  )
          (    So that now  to still the beating of my heart  I stood repeating )
          (     Tis some visitor entreating entrance at my chamber door  )
          (Some late visitor entreating entrance at my chamber door   )
          (            This it is and nothing more  )

(Yeah, yeah, we could have computed words from patterns…cut the 8 year old a break! He’s only been doing Lisp for about an hour total, over a couple weeks!)

Okay, so you can see where this is going:

(defun random-raven (w p)
  (loop for line in p
        do (print (loop for word in line
                        collect (nth (random (length w)) w)))))

Et voila!


And so on! If read with the correct Vincent Price horror movie voice and prosodics, this stuff sounds both great and hysterically funny!

Next week… Eliza?! 🙂


Leo and Jeff’s Zillion Notations



In other posts I’ve talked about Leo being fascinated by giant numbers, and a while back, based on his interest, I proposed a joke number called 11Z, called “eleventy zillion”. Today we put some specific mathematical meat on this notation, in two different ways, which we agreed to call “Leo Numbers” v. “Jeff Numbers”.

Jeff’s Z Numbers:

Both are based on Jeff’s Numbers, which is the following exponentiation ladder:

nZ = n^n^n^n… n times.

For example:

3Z = 3^3^3

Now, there is immediately a problem, which is which way to evaluate the ladder.

Does 3Z = (3^3)^3 = 27^3 = 19,683, or is it = 3^(3^3) = 3^27 = 7,625,597,484,987

I think that the mathematical default if left-to-right, i.e., up the ladder, but I hate these defaults because they aren’t explicit, so I prefer to have an explicit notion telling you which way to evaluate the ladder.

So to clarify this, we use arrows as:

3Z→ = (3^3)^3 = 27^3 = 19,683– called “3 zillion up”

3Z← = 3^(3^3) = 3^27 = 7,625,597,484,987 — called “three zillion down”

Now, Leo, who like scientific notation and thinks in terms of Googol and Googolplex, thinks of Z notion slightly differently.

Leo’s Z Numbers:

3Z→ = 3e19,683


3Z← = 3e7,625,597,484,987

So, just as Z-down notation is way bigger than Z-up notation, Leo numbers are way bigger than Jeff numbers!

A weird observation on powers of 2…


, ,

(…until you think about it for a few seconds, then it’s not so weird anymore.)

Leo’s been fascinated recently by powers of 2, mostly because weird things happen in MineCraft at boundaries of powers of 2, because of floating point overflow and such-like phenomena. (Actually, weird things happen in all computers at boundaries of powers of 2, for the same reason.) as a result of his interest in powers of 2, he spends inordinate periods playing with calculators, and with Wolfram Alpha, and we’ve done a bunch of Lisp, which has built-in Bignums that let you do arbitrary precision arithmetic. This has all actually been very mathematically and computationally educational, but today he noticed something apparently very odd and surprising about powers of 2.

We all know that multiplying by 2 repeatedly gives powers of two, as: 2, 4, 8, 16, … And if you keep dividing by two, the rational representation is, unsurprisingly, the inverse, as: 1/2, 1/4, 1/8, 1/16, …. Leo noticed that the decimal representation of these progressive divisions by 2 seem to be powers of 5, not 2, as: 0.5, 0.25, 0.125, 0.0625, and that in order to get decimal equivalents of the increasing powers of 2 (2, 4, 8,…) you have to divide by 5, not by 2 as: 1/5=0.2, /5=0.04, /5=0.008, /5=0.0016, etc!

A moment’s thought will reveal this to be not very surprising, as 2 and 5 are inextricably bound together in base 10 arithmetic (because 10 = 2*5). So, even though the mathematical reason for this is obvious (after a moment’s thought), it remains slightly weird that 2, 4, 8, … are the same as 2.0, 4.0, 8.0, …. but 1/2, 1/4, and 1/8, … are 0.5, 0.25, 0.125, …. Put even more starkly: 2/1, 4/1 , 8/1, … invert naturally to 1/2, 1/4, 1/8, … but the decimal representations don’t “invert” in the same easy way: 2.0 -> 0.5, 4.0 -> 0.25, 8.0 -> 0.125, …. My friend, Eric, notes that: “Dividing by 2 is like multiplying by 5 and dividing by 10. Hence the successive powers of 5 and convenient shifting of the decimal point.” This just re-emphasizes the fact that rational representations (i.e., fractions, as 2/1 and 1/2) are very different than decimal, place-value representations; they just work differently is all there is to it, regardless of the faux amis of 2.0 (that is, decimal 2.0) and 2 (i.e., the number 2), and our short hand for rationals with unit denominator (as: 2/1 => 2) having similar looking overt.

Continued thoughts: Fractions (rational notation, as in 2=2/1=10/5, etc.) isn’t place-value! The separate components (numerator and denominator) are place value, but the whole thing isn’t! This leads to the weird effect that if you keep dividing by 2 from, say, 4, you get a nice balanced progressions: 4, 2, 1, 1/2, 1/4, and so on, whereas when you divide place-value notation by 2s you get 4.0, 2.0, 1.0, 0.5, 0.25, etc. The former (rational — fractional) notation is really just multiplying the denominator by 2 (i.e., dividing by 2), so the above sequence could, more obviously, be written: 4/1, 4/2, 4/4, 4/8, 4/16, etc. (you could, of course, start anywhere you wanted: 8/1, etc.). You can do the same trick in decimal notation by multiplying by, say, 1000: 8000, 4000, 2000, 1000, 500, 250, 125, etc. That looks a lot less odd, and you can see that it becomes “5-ish” when you have to divide what is an odd number: 1!


Klein Clothing (Soft Topology)


Leo came to me the other day and told me that he couldn’t put on his pants because they were “Klein pants”. Um…whaaaaa?

Turns out that he had managed to get one leg inserted into the other in such a way that it really did seems like his pants were working something was a Klein Bottle: You could put your legs down different leg holes at the top, and they would come out in the same place!

Now, one obvious way to get this effect is to simply put your legs into the bottoms instead of the top, and then they’ll both come out the single waist. But the tangle that Leo had managed to get his pants into was actually more (unintentionally) sophisticated than that. It really did seem for a moment like there was something impossible about what he’d managed to do with them.

I thought about taking a picture of it, but it’s sort of hard to see; like trying to see what’s special about a Klein bottle if it wasn’t clear, so instead I’ll simply explain that the way you get this effect: Simply stick one leg down the other leg from the inside. The effect is that if you start by putting both legs into different holes, as normal, they end up coming out the same hole at the bottom! This is initially very weird, but it’s obvious what’s going on once you look at it for a moment…well, so is a Klein Bottle, I guess.

[Edit: A reader points out that all topology is “soft” by definition, in that anything that it’s specified as a topological assertion, in, by definition, flexible! This is why you get to do things like turn spheres inside out, morph mugs, and such-like transformations, as long as you retain their defining properties.]


Fun Minimum Description Length Game


As I’ve noted before, Leo is fascinated by huge numbers. Tonight we were trying to figure out what Tree(3) means, and we got all into recursive graph representations. That was all a little too complex (even for me, a least in trying to explain it to a 7-year-old!), so I decided to try to flip the problem on its head. We created a game called…well, computer scientists would call it the Minimum Description Length game, but we just called it the Shortest Equation game.

You start with a pretty big number, say 748. The goal is to use any normal math operations that a 2nd grader would know (i.e., +-/*, powers and roots, !, and that sort of thing) to create the shortest equation that gets you to that number, using only single digit numbers (and allowing 10). So, you can’t just say, for example 748-=748, or 700+48, but need to break it down to single-digit numbers (and 10s).

Score each digit, and operation, including parens, as a character. (And we were a little fast and loose about whether you can rely upon order-of-operations, which I usually don’t like, and would rather it never be taught, but then you end up with a LOT of parens!) You can play against one another, or collaboratively.

Here’s our whiteboard after playing 748 for a few turns:


So, Leo started with (7*10*10)+(10*9)+8 for 18 characters, and we got all the way down to 8 chars with 3^3^2+10+9 (which doesn’t have the up-arrows if you use superscript notation…and notice that this is one place where we’re playing a little fast-and-loose with order of operations, so that we’re reading 3^3^2 as 3^27, whereas you could also read it as 9^2, so we really should have used two parens, but it’s still only 10 long! 🙂


Matrix Wars (1.01)


, , ,

Continuing in our exploration of gamified higher math, Leo and I programmed up a version of space invaders in HopScotch that depends on matrix multiplication. I took only a few hours to create a pretty interesting game. Below are some screenshots from the game. It’s a bit hard to explain the game play; I recommend playing it yourself here!


The matrix (Vmn) is constantly being multiplied by the current target (Txy) to create the new values (Nxy):


You click on the V(mn) entries to roll them through -1, 0, 1, and then when you have an Nxy result you like, click “Fire” to load the new (Nxy) to the target (Txy). Since the matrix multiplication is being continuously, as soon as you load the Nxy into the Txy, the new Nxy are re-computed. It’s easy to create matrices that move the target around in more-or-less any way you like.


Meanwhile, the invader is getting closer and closer. If you hit the invader before it hits earth, you win.



PythonCraft: Chasing the Sort Front


, ,

Leo watched an extremely good TedEd about sorting algorithms, and wanted to try it out himself, so we wrote a simple bubble sort in Python, and we deployed it through the PythonCraft API to run in MineCraft. This turned out to be more fun and educational than I thought it was going to be.

To begin with, here’s the code (slightly cleaned up for display here; you’ll have to add the right imports, for example):


Unsurprisingly, we had the longest discussion about why you need the “Hold” variable in the swap step(s). There are two errors in this code, that I’ll reveal below. (The highly motivated reader can try to figure these out; Actually, any semi-competent programmer can probably guess what the errors are without even looking at the code!) Regardless, it does the basic job.

So, it’s actually extremely cool when it runs, because you can watch the sort in action, and actually chase down the sort front! Here are a couple of videos of us doing that:

(Note the sheep on the blocks!)

You can see (and I mention) one of the bugs in the video: The “Obi wan” (Off by One) error, which leaves on of the blocks unsorted at the far end of the array.

The other bug is more subtle, until I point it out: We actually intended to sort by rainbow order (ROYGBIV), but that would have required another dereference, and when I tried to explain that, it went right over Leo’s head, him already being well snowed by the Hold/Swap problem, so I simply let it go. As a result, it’s actually sorting by whatever the color codes happen to be in MineCraft, which seem to be random (or at least I can’t tell what the logic is that relates MC color IDs to rainbow order).


NewtonCraft: Rocket Engine Blocks


, , , , , ,

This year I’ve been making a concerted effort to “climb the mountain of algebra” with Leo through physics (like some sort of Newtonian Julie Andrews*). One way into this that works extremely well in motivational terms is (unfortunately) via MineCraft. Here’s an early example of using MineCraft to motivate some algebra. This week we did an interesting problem on Newton’s second law. (That’s the F=MA one, in case you don’t remember the order!) The pages are reproduced below, and if you click any of them, they link to a downloadable powerpoint which contains a bunch of “fun” math worksheets, including these (although, of course, “fun” is in the eye of the student).


By the way, although there really is a MineCraft Mod called “Mekanism”, which has a lot of cool stuff, like lasers and a fusion reactor, it doesn’t actually have an REB … I made that up!

(*) This is an oblique reference to the wonderful movie “In The Loop”. Search for “Climb the mountain of conflict”, and watch the youtube video. I won’t link to it because it’s slightly NSFW (“strong language”, as they say), and definitely NSFK! But if you’re an adult, it’ll make you truly lol!


Fun (and Educational) Fluid Physics


, , ,

Off of quantum mechanics for a moment, and back to the world of macroscopic (newtonian) fluid physics. We’ve been playing a lot with fluid (particle-based) simulations lately, looking at lift, drag, and turbulence, the bernoulli effect, eddies, and other such-like chaotic phenomena.  (“Macroscopic” would be putting it a bit too strongly; Maybe “mesoscopic” is the right word?)

I had in mind actually making a Navier-Stokes game, but it turns out that there are already a bunch of great apps out there that simulate these phenomena perfectly well, and are also very fun! I’ve tried out at least a dozen — I only bother with the free ones! — and two are of special note.

The first is Wind Tunnel (Free), by Algorizk (which I assume is either a name or a pun). This is a simple but extremely nice app with all the right capabilities. You can draw arbitrary shapes, view in particle, smoke, pressure, or speed modes, and calculate overall lift and drag. Here are some examples:


There are a bunch of pre-drawn objects (although not many), and you can draw your own objects. My only complaint about this app is that you can’t rotate solid bodies, like the wing above, in order to experiment with angle-of-attach (that is, the position of the object with respect to the flow).

The other great example of a fluid sim is Powder Game. The free version is ad-supported, but the ads are along the bottom, and not too annoying. In addition to being an newtonian fluid simulator, Powder Game has tons of special types of particles:


This lets you do a ton of very fun experiments, like exploding things and watching the chaotic dynamics on Nitro!


You can spend hours with either of these apps in the perfect paradigm of learning-by-playing.

In another post I’ll talk more about some great newtonian sims that are a truly macroscopic, which is fun (and educational) of a very different sort, at a very different level.