The Monty Hall problem is one of those brain teasers which has confused me on the three previous occasions that I read about it. The (many) articles on line explaining it didn’t resonate with me, so I decided to work it through again and write some code to prove it.
Once I’d written the code I had an epiphany and realised how incredibly simple it was, so this blog entry is me explaining the Monty Hall problem and why probability works out how it does. Hopefully, if you’re like me and you’ve struggled with this before, this blog will make it simple and clear.
I’m not going to explain the Monty Hall problem, because Wikipedia does an excellent job. If you’re not familiar, see you in a few minutes!
A brief summary of the incorrect thought process is as follows:
- There are three doors: X, Y and Z.
- The contestant picks door X. There was a 1/3 chance of the contestant picking this door. There was a 2/3 chance of then NOT picking door X.
- Door Y is revealed to be a goat, therefore the Cadillac is behind door X or door Z.
- The Cadillac is behind door X or door Z, i.e. there are only two permutations of goat, Cadillac and doors there it’s a 50/50 chance of finding the Cadillac.
So, why given a seemingly 50/50 chance of either having a goat or a Cadillac behind the two remaining doors is the probably of finding the Cadillac on a switch 2/3?
Let’s start with some simple code that emulates the problem. This code is deliberately naive to make it easy to follow.
using System; using System.Security.Cryptography; namespace MontyHall { internal class Program { private static Random _randomGenerator = new Random(); private static void Main(string[] args) { Console.WriteLine("{0,-12}{1,12}{2,12}", "Iterations", "Stick Wins", "Switch Wins"); for (int i = 0; i < 10; i++) { int iterations = 1000000; int winsIfStickCount = EmulateGame(iterations); int winsIfSwitchCount = iterations - winsIfStickCount; Console.WriteLine("{0,-12}{1,12}{2,12}", iterations, winsIfStickCount, winsIfSwitchCount); } } private static int EmulateGame(int iterations) { int winIfStickCount = 0; for (int i = 0; i < iterations; i++) { if (EmulateGame()) { winIfStickCount++; } } return winIfStickCount; } private static bool EmulateGame() { // Create 3 boxes, each with a goat string[] boxes = new[] { "Goat", "Goat", "Goat" }; // Replace a goat with a Cadillac, randomly boxes[_randomGenerator.Next(3)] = "Cadillac"; // Now pick a random door int randomDoor = _randomGenerator.Next(3); // Now Monty opens a door containing a goat int knownGoat = -1; while (knownGoat == -1) { /* Keep picking a door at random until Monty finds one that the contenstant didn't pick * and contains a goat. This isn't efficient, but it keeps the code very simple */ int montysDoor = _randomGenerator.Next(3); if (montysDoor == randomDoor || boxes[montysDoor] == "Cadillac") continue; knownGoat = montysDoor; } /* If we originally picked the Cadillac, then we won. * If we didn't, then as Monty has revealed the other goat the * other door MUST contain the Cadillac */ return (boxes[randomDoor] == "Cadillac"); } } }
Run the code, and ten million iterations of the Monty Hall problem will be run in batches of one million. Here’s some sample output
Iterations Stick Wins Switch Wins 1000000 333788 666212 1000000 333780 666220 1000000 332643 667357 1000000 333629 666371 1000000 333797 666203 1000000 333219 666781 1000000 333363 666637 1000000 332921 667079 1000000 333100 666900 1000000 333255 666745
Pretty convincing. After 10,000,000 iterations we’re seeing a very definite pattern that switching doubles your chances of finding the Cadillac.
Why?
The reason why became obvious once I had realised that this chunk of code was totally superfluous to whether I won or not.
// Now Monty opens a door containing a goat int knownGoat = -1; while (knownGoat == -1) { /* Keep picking a door at random until Monty finds one that the contenstant didn't pick * and contains a goat. This isn't efficient, but it keeps the code very simple */ int montysDoor = _randomGenerator.Next(3); if (montysDoor == randomDoor || boxes[montysDoor] == "Cadillac") continue; knownGoat = montysDoor; }
The line of code that determines whether you win by sticking or switching is totally independent:
return (boxes[randomDoor] == "Cadillac");
Once you understand that this single, simple line is responsible for the probability of winning or losing, I think it becomes really easy to understand. I.e.
- There is a 1/3 chance the contestant picks the Cadillac.
- There is a 2/3 chance that the contestant hasn’t picked the Cadillac.
- Therefore by switching there is a 2/3 chance of finding the Cadillac because if one of the two unpicked doors contains the Cadillac you are guaranteed to find it.
One of the difficulties with understanding this problem is the scale of the numbers involved. It becomes much easier with bigger numbers:
Imagining trying to win the UK National Lottery draw jackpot. The jackpot is won by picking 6 distinct numbers between 1 and 49 and matching them all. There is a 1/13,983,816 chance of matching all 6 numbers with a single ticket (see http://lottery.merseyworld.com/Info/Chances.html).
- Imagine that you have 13,983,816 lottery tickets, each with a different set of numbers on. That means that the winning lottery ticket is in there… somewhere.
- You pick a ticket at random. There’s a 1/13,983,816 chance of finding the winning ticket.
- A magician arrives, waves his magic wand and all of the remaining tickets disappear EXCEPT for one. The magician promises that this remaining ticket is the winning ticket unless you’ve already picked it up (remember there’s a 1/13,983,816 chance of that, so pretty small).
- The magician then offers you a swap: do you want to keep your ticket (which has a 1/13,983,816 chance of being the winning ticket), or take his ticket?
Still reckon it’s a 50/50 chance of winning the National Lottery? Me neither.