I came across this puzzle via an episode of the podcast Modern Wisdom, during host Chris Williamson’s interview with another podcaster/YouTuber called Alex O’Connor. Alex O’Connor credits a blog post, which itself credits Scott Aaronson’s quantum computing course. Scott Aaronson credits philosopher John Leslie, but in my brief search I didn’t find Leslie’s version of this, so I’m not certain where this originates.
The puzzle is quite straightforward to state, and is presented as a paradox. I like it because it exposes a way in which the mind can allow itself to be tripped up, and it turns out to succumb to reason when we are sufficiently careful and precise.
The ‘Paradox’
A maniac is on the loose! He commits bizarre and horrific crimes, but he is, in a way honest. He is far more interested in subjecting people to dilemmas than anything else, and when he announces that he will commit a particular dastardly deed, he does so, to the letter. I’m going to assume he’s Heath Ledger’s Joker from The Dark Knight, since that character rather fits with the level of insanity this plot involves.
The Joker announces he will kidnap someone, blindfold them, and roll a die to determine their fate. If the die lands on six1, he will kill them, and his scheme will be complete. If it lands on any other number, he will let this person go, and then kidnap ten people. He will again roll a single die, and if it lands on six he will kill these ten people, and then stop. But if it lands on any other number, he will release these ten people and then go out and kidnap one hundred people.
You see the general pattern here. Each round he kidnaps ten times as many people as the previous round, with the game being brought to an end when he rolls a 6 and kills the people in that round.
But the Joker adds a twist. He announces that each kidnap victim will also be given a button they can press. With a 50% probability the button will kill the person pressing it. But if it does not kill them they will be released, and will avoid being killed should the die land on a 6 in this round.
You wake up blindfolded. You cannot see how many other people have been kidnapped along with you, so you do not know which round you are in.2 Should3 you press the button?
Argument against pressing the button: Whichever round you are in, your probability of dying if you do not press the button is 1/6, because that is probability of the dice landing on a 6. This is smaller than 50%, the probability of death from pressing the button. Clearly you are more likely to survive if you do not press it.
Argument for pressing the button: The round that gets killed based on the dice roll is the last round the Joker kidnaps. Since each round is ten times larger than the previous one, over 90% of the kidnapped people will be in the last round. When you wake up blindfolded you know that you are a kidnap victim, and you are equally likely to be any of the kidnap victims. So you have a 90% chance of being in the final round, and dying based on the roll of the die. Clearly you are more likely to survive if you do press the button. (Or put another way: If no-one presses their button, over 90% of the kidnap-victims will be killed. But if everyone presses their button, approximately half of them will be killed.)
At the end of the next section I’m going to give the answer, so if you prefer to stop and puzzle over this for a little bit first, this is the place to do so.
First Thoughts
I asked this question to a handful of friends, none of whom had heard of it before. Most of them believed the first argument (the one against pressing the button), and I suspect that’s because the reasoning is simpler.
This isn’t a bad reason to prefer it! We are presented with two contradictory claims. One employs complex reasoning, and the other much simpler reasoning. You should look for flaws in both those chains of reasoning. But if you’re unable to find any flaws, then it’s more likely that one is hiding in the complex chain of reasoning than the simple one.
Two questions often arose:
- Can you be re-kidnapped after surviving an earlier round?
- What happens if the Joker runs out of people?
I’m going to assume that you cannot be re-kidnapped. If you allow being re-kidnapped, you have some probability that you wake up blindfolded thinking “Aww, hell! Not this again…”, and that’s a different situation where you have more information that simply “I have been kidnapped”.
If the Joker runs through the entire population without rolling a 64 then I’m assuming the game is done. This fits nicely with the “no re-kidnappings” rule, because the game cannot continue at this point without some re-kidnappings.
As it turns out, the fact that the population is finite will turn out to be crucial for resolving this paradox. The author of Rising Entropy seems to be aware from this post that an infinite population causes problems with deciding who to kidnap, but presses on regardless. Scott Aaronson’s lecture notes indicate he got a question about whether the population is infinite, but his reply indicates that he believes that the finite population version of this game is very similar. I think that most people believe that the finite-population versions can be turned into an infinite-population version by considering a very large population, since the probability of running out of people becomes extremely small for a large enough population. We’ll see that this is not actually the probability that we need to become small, and so these two cases really are different.
In the interests of not being click-baity, I’m going to tell you the resolution here: In a finite population with no re-kidnappings you should not press the button. The anthropic argument relies on an incorrect application of the law of total probability.
The Finite Population Version
We’ll first pick some notation.5 Let \(p\) be the probability that the killer kills all of the people in a given round. In the description above we had \(p=\frac{1}{6}\), and Rising Entropy and Scott Aaronson had \(p = \frac{1}{36}\). At each round, the killer kidnaps \(g\) times as many people as he did in the previous round. In the description above \(g=10\). We’ll assume that \(g\) is an integer (if you’ve kidnapped half a person, the question of whether or not you murder them is rendered moot) and that \(g > 1\) (otherwise the last round is no larger than the first). I’ll still use phrases like “rolls a 6” below, but in the equations I’ll use \(p\) instead of 1/6.
To make this computation simpler I’m going to assume that the population, \(m\), can be exactly subdivided into \(r_0\) kidnapping-rounds. That is, for some \(r_0\),6 \[\begin{align} m = g^0 + g^1 + g^2 + ... + g^{r_0 - 1} = \frac{g^{r_0} - 1}{g - 1} \end{align}\]
Let’s then define two events. \(S\) is the event that you are selected. \(K\) is the event that are selected in the round that is killed.7 We’ll also define the random variable \(T\), the terminal round, taking values in \(\left\{1, 2, \dots, r_0, r_0 + 1 \right\}\). \(T\) is the round at which the killer finally rolls a 6. Remember that round \(r_0\) contains \(g^{r_0-1}\) people and is the last round before the Joker runs out of victims. \(T = r_0 + 1\) denotes the special case that the die did not come up six during any of the kidnapping rounds. i.e. it is the case in which the Joker gives up and kills no-one.
We want to compute the probability that we are in the round that will be killed, given that we have been selected \[\begin{align} \mathbb{P}(K | S) = \frac{\mathbb{P}(K \cap S)}{\mathbb{P}(S)} = \frac{\mathbb{P}(K)}{\mathbb{P}(S)} \end{align}\] The first equality above is simply the definition of conditional probability, and the second follows from the fact that \(K \subset S\). i.e. if you are killed that is a subset of being selected. So to say “you are killed and selected” is the same as saying “you are killed”.
The math that follows is just establishing that the argument against pressing the button is true. i.e. that if you wake up blindfolded you really do only have probability \(p\) of being killed if you do not press the button. If you already buy this, and just want to understand where the anthropic argument breaks, feel free to skip ahead to Why Does The Anthropic Argument Fail? below.
To compute these two probabilities we’ll use a neat trick. I look at this and say “well, it would be much easier to compute the probability that I’m selected if I knew that the game went for exactly \(t\) rounds”. Thankfully there’s an equation that lets us do this:8 \[\begin{align} \mathbb{P}(S) = \sum_{t=1}^{r_0+1} \mathbb{P}(S | T = t) \mathbb{P}(T = t) \end{align}\]
Suppose the game went for one round. What is probability that you were selected? We’ve assumed the Joker chooses uniformly from the population of \(m\) people, so this would be \[\begin{align} \mathbb{P}(S | T = 1) = \frac{1}{m} \end{align}\] What if the game went for two rounds? There are now \(1 + g\) people selected, uniformly without replacement, from a population of \(m\) people, so \[\begin{align} \mathbb{P}(S | T = 2) = \frac{1 + g}{m} \end{align}\] We can pretty clearly see the pattern, and remembering that \(T=r_0+1\) is the special case where the Joker runs out of population, \[\begin{align} \mathbb{P}(S | T = t) = \begin{cases} \frac{1}{m} \sum_{t^{\prime}=1}^{r_0} g^{t^{\prime}-1} &\text{if } t \le r_0 \\ 1 &\text{if } t = r_0 + 1 \end{cases} \end{align}\] Finding \(\mathrm{P}(T = t)\) is easy: for \(t-1\) rounds in a row he must roll a non-6 (probability \((1-p)\)) followed by rolling a 6 (probability \(p\)), except for the case where \(T = r_0 + 1\), which corresponds to rolling \(r_0\) non-6s in a row: \[\begin{align} \mathbb{P}(T = t) = \begin{cases} p (1-p)^{t - 1} &\text{if } t \le r_0\\ (1 - p)^{r_0} &\text{if } t = r_0 \end{cases} \end{align}\] So we an now write down an expression for \(\mathbb{P}(S)\), and we’re going to manipulate9 it into a slightly odd form, but it’ll make sense once we compute \(\mathbb{P}(K)\): \[\begin{align} \mathbb{P}(S) &= \sum_{t=1}^{r_0 + 1} \mathbb{P}(S | T = t) \mathbb{P}(T = t) \\ & = \left( \sum_{t=1}^{r_0}\left(\frac{1}{m}\sum_{t^{\prime}=1}^t g^{t^{\prime} - 1} \right) p (1-p)^{t-1} + (1-p)^{r_0} \right) \\ & = \sum_{t=1}^{r_0} \frac{g^{t-1}}{m} (1-p)^{t-1} \end{align}\]
\(\mathbb{P}(K)\) is a lot easier to compute, because \(\mathbb{P}(K | T = t)\) no longer involves a sum, since it’s the probability of you being kidnapped on exactly the \(t^{\text{th}}\) round \[\begin{align} \mathbb{P}(K | T = t) = \begin{cases} \frac{1}{m}g^{t-1} &\text{if } t \le r_0 \\ 0 &\text{if } t = r_0 + 1 \end{cases} \end{align}\] And so \[\begin{align} \mathbb{P}(K) = \sum_{t=1}^{r_0} \frac{g^{t-1}}{m} p (1-p)^{t-1} \end{align}\]
Scroll a few lines up and look at the expression we found for \(\mathbb{P}(S)\). The only difference is that \(\mathbb{P}(K)\) has an extra factor of \(p\). So we’ve proved that \[\begin{align} \mathbb{P}(K | S) = \frac{\mathbb{P}(K)}{\mathbb{P}(S)} = p \end{align}\] Supposed you wake up blindfolded, and don’t know anything other than the fact that you are one of the people kidnapped. Your probability of dying based on the dice roll is \(p\) (i.e. 1/6 for the version where the Joker rolls a single die), and pressing the button does not improve your odds of survival.
Why Does The Anthropic Argument Fail?
There’s something unsatisfying about just proving that the argument against pressing the button is correct, without showing how the argument in favour of pressing it is wrong. So let’s find where the hole in that argument lies.
Let’s refine this argument a little to the version that people give when they’re made aware of the finite-population issue: Regardless of how many rounds the game goes for, the final round is much larger than all the other rounds put together. Given that you’ve been kidnapped, you’re equally likely to be any one of the kidnap victims, so you’re very likely (>90%) to be in the final round, and hence to be murdered. It is possible that the Joker runs out of fresh people to kidnap before he rolls a 6, but if we make the population very large, this probability is very small, so we can neglect it.
In other words, this argument invites you to make the following decomposition: \[\begin{align} \mathbb{P} (K | S) = \sum_{t=1}^{r_0 + 1} \mathbb{P}(K | S \cap T = t) \mathbb{P}(T = t) \end{align}\] and then argues that:
- \(\mathbb{P}(T = r_0 +1) = (1-p)^{r_0}\) is very small for large \(r_0\) (which is true) so the sum of all of the other \(\mathbb{P}(T = t)\) terms must be close to one (also true).
- \(\mathbb{P}(K | S \cap T = t) = g^{t-1} / \sum_{t^{\prime}=1}^t g^{t^{\prime} - 1} > 1 - \frac{1}{g}\) for \(t \le r_0\). For \(g=10\), \(1 - \frac{1}{g} = 90\%\). This is also true.
- Given (1) and (2), \(\sum_{t=1}^{r_0 + 1} \mathbb{P}(K | S \cap T = t) \mathbb{P}(T = t) \approx 90\% \times \sum_{t=1}^{r_0} \mathbb{P}(T = t) \approx 90\%\). This is also true.
However, the decomposition this all proceeds from is not true! \[\begin{align} \require{cancel} \xcancel{\mathbb{P} (K | S) = \sum_{t=1}^{r_0 + 1} \mathbb{P}(K | S \cap T = t) \mathbb{P}(T = t)} \end{align}\] This is just not the form of the law of total probability for a conditional probability. The correct form is:10 \[\begin{align} \mathbb{P} (K | S) = \sum_{t=1}^{r_0 + 1} \mathbb{P}(K | S \cap T = t) \mathbb{P}(T = t | S) \end{align}\]
So the fact that \(\mathbb{P}(T = r_0 + 1)\) is small for a large population may be leading us astray. We should instead compute this \[\begin{align} \mathbb{P}(T=r_0 + 1 | S) &= \frac{\mathbb{P}(T = r_0 + 1 \cap S)}{\mathbb{P}(S)} = \frac{\mathbb{P}(T = r_0 + 1)}{\mathbb{P}(S)} \\ &= \frac{(1-p)^{r_0}}{ \sum_{t=1}^{r_0} \frac{g^{t-1}}{m} (1-p)^{t-1}} \end{align}\] We’ve taken \(\mathbb{P}(S) = \frac{1}{m} \sum_{t=1}^{r_0} (g (1-p))^{t-1}\) from the earlier section where we derived it.
This time we actually need to evaluate that geometric sum, but that’s fine we can do that. We get \[\begin{align} \mathbb{P}(T=r_0 + 1 | S) & = \frac{m (1-p)^{r_0} (g(1-p) - 1)}{(g(1-p))^{r_0} - 1} \\ & \ge \frac{m (1-p)^{r_0} (g(1-p) - 1)}{(g(1-p))^{r_0}} \\ & = \frac{m}{g^{r_0 - 1}} ((1-p) - 1/g) \\ & = \frac{1 - g^{-r_0}}{1-1/g} ((1-p) - 1/g) \\ & \approx \frac{1 - p - 1/g}{1 - 1/g} \end{align}\] In the last line we’ve dropped the \(g^{-r_0}\) term, which is extremely small in the large population limit. Note that \(\mathbb{P}(T = r_0 + 1 | S)\) is not small! For \(g=10\) and \(p=1/6\) this is roughly \(81.5\%\). i.e. just a little less than 5/6.
Why is this significant? We must have \(\sum_{t=1}^{r_0+1}\mathbb{P}(T = t|S) = 1\). So the other \(\mathbb{P}(T=t | S)\) terms for \(t \le r_0\) must sum to just a little over 1/6. The \(\mathbb{P}(K | S \cap T = t)\) terms must each be \(\le 1\), since they are probabilities. And we also have that \(\mathbb{P}(K | S \cap T = r_0 + 1) = 0\), since none of the rounds get killed if \(T = r_0 + 1\). So we can write \[\begin{align} \mathbb{P} (K | S) &= \sum_{t=1}^{r_0 + 1} \mathbb{P}(K | S \cap T = t) \mathbb{P}(T = t | S) \\ &= \sum_{t=1}^{r_0} \mathbb{P}(K | S \cap T = t) \mathbb{P}(T = t | S) + \mathbb{P}(K | S \cap T = r_0 + 1) \mathbb{P}(T = r_0 + 1 | S) \\ &\approx \left(\text{something } \le \frac{1}{6}\right) + \left(0\times \frac{5}{6}\right) \end{align}\]
…and this completely resolves the paradox. Given that you wake up blindfolded, you probably are in the largest round to be kidnapped. But given that you wake up blindfolded, it’s very likely that no round will be killed.
The Mayor’s Dilemma
Suppose now that you are the mayor of Gotham city.
You are widely respected and trusted, and once the Joker announces his diabolical plan the people look to you for guidance. If they find themselves kidnapped, should they press the button? When reporters interview people on the street, the find that everyone intends to simply do what their beloved mayor advises.
If you advise them not to press the button (and remember, we’ve found above that if you wake up blindfolded, your best chance of survival is by not pressing it) then the most likely outcome is that the Joker rolls a 6 at some point, kills over 90% of his kidnap victims. The larger the population, the most likely this outcome becomes. In this scenario you could look pretty bad in the eyes of the city - if you’d advised people to press the button, only 50% of them would be dead.
However, the unlikely cases where the Joker gets as far as the final round have, by far, the most lives at stake. If you, as mayor, want to minimize the expected number of deaths, you should still be advising them to not press the button. How you explain this to the people is another matter.
The Infinite Population Version
Most formulations of this problem seem to ask about the infinite population case. Or at least they want to, but sometimes resort to saying “well let’s take the population to be finite but very large and it will essentially be the same”.
The problem is harder to formulate in the infinite case because there is no discrete uniform distribution on an infinite set of elements. Partway through this post the author of Rising Entropy draws an analogy with the continuous uniform distribution, saying “Imagine that you’re randomly selecting a real number between 0 and 1. The probability that you select any particular number is zero. But at the same time, you will end up selecting some number. Whichever number you end up selecting, is a number that you would have said had a 0% chance of being selected!”.
I think this is problematic. Our infinite population probably should probably still be countably infinite.11 There is no uniform distribution on a countably infinite set.
It’s tempting to consider this alternative scheme:
- The Joker does all of his dice rolls first, and figures out how many people he would have kidnapped.
- He goes out and kidnaps that many people.
- He splits them into groups of size 1, \(g\), \(g^2\), etc., assigning them uniformly at random.
- He lets everyone go except those in the largest group, who he kills (aside from people killed/release as a result of pressing the button).
This is deceptive because, in the finite population case, if step (2) involves kidnapping people uniformly at random, and he just gives up if he runs out of people, then this is equivalent to the original scenario. But that’s not true for the infinite population case, because there is no way of uniformly picking people to kidnap in step (2).
Can we say anything interesting at all about the infinite case? Yes, we can. Suppose that the Joker does not do some kind of “pre-rolling” scheme like what we’ve described above. With a bit of careful thought you can convince yourself12 that the events of being selected in any round (\(S\)) and of being killed (\(K\)), conditioned on the terminal round (\(T\)), must obey \[\begin{align} \mathbb{P}(S | T = t+1) = \mathbb{P}(S | T= t) + \mathbb{P}(K | T=t+1) \end{align}\] Then we can write \[\begin{align} \mathbb{P}(S) &= \sum_{t=1}^{\infty} \mathbb{P}(S | T = t) \mathbb{P}(T = t) \\ &= \sum_{t=1}^{\infty} \left( \mathbb{P}(S | T = t + 1) - \mathbb{P}(K | T = t + 1) \right) \mathbb{P}(T = t) \\ &= \sum_{t=1}^{\infty} \left( \mathbb{P}(S | T = t + 1) - \mathbb{P}(K | T = t + 1) \right) \frac{\mathbb{P}(T = t + 1)}{(1-p)} \\ &= \frac{1}{1-p} \sum_{t=1}^{\infty} \left( \mathbb{P}(S | T = t) - \mathbb{P}(K | T = t) \right) \mathbb{P}(T = t) \\ &= \frac{\mathbb{P}(S) - \mathbb{P}(K)}{1-p} \end{align}\] Note that the relabelling from \(t+1\) to \(t\) only works because the \(t=1\) term vanishes, since \(\mathbb{P}(S | T = 1) = \mathbb{P}(K | T = 1)\). Rearranging we get \[\begin{align} \mathbb{P}(K | S) = \frac{\mathbb{P}(K \cap S)}{\mathbb{P}(S)} = \frac{\mathbb{P}(K)}{\mathbb{P}(S)} = p \end{align}\]
So we have our answer in the infinite case too: for any scheme where re-kidnappings are not allowed, and where your probability of being kidnapped at round \(t\) does not depend on later dice rolls, the probability of death based on the dice roll must be \(p\). But the distribution you have been selected from cannot be a uniform one.
Intuitively what is going on here is that there must be some point where you become more likely to be selected. The idea that you are most likely to be in the final round, because that contains the greatest number of people, simply is not true.
Closing Thoughts
I find this problem extremely interesting because I think it shows us several traps that we can fall into.
Firstly, I suspect that people who are more mathematically sophisticated are more, not less, likely to fall into the trap of buying the “push the button” argument.13 That argument contains a few elements that they can immediately see to be true (the final round containing the majority of people, the extremely small chance of running out of people in a large population) and I think this builds its credibility. This is what Daniel Kahneman would call the cognitive bias of substitution: instead of answering the difficult question of “does this argument contain any steps which are false?” our mind substitutes the easier problem of “does this argument contain many steps that are true?” But I don’t think this is the whole story. The second argument flatters us. Its says: “Look at this sophisticated argument that you are clever enough to follow. You don’t have to be like all of those fools who accept the simpler argument.”14
Secondly, we can easily be convinced by the mayor’s dilemma argument. We’re able to see in the case the (correct) argument that you’re very likely to save more people by advising them to press the button. What does not immediately jump out at us is that the (rare) case in which pressing the button kills more people than it saves is so much higher stakes in terms of lives.15
If you believe there are mistakes in my argument, or the exposition is unclear, or even if you just want to talk about this or similar problems, please feel free to contact me.