There was some discussion in comments recently about the distinction between aleatoric uncertainty (physical probabilities such as coin flips) and epistemic uncertainty (representing ignorance rather than an active probability model).
We’ve talked about this before, but not everyone was reading this blog 15 years ago, so I’ll cover it again here. For a very similar take, I also recommend this article by Tony O’Hagan from 2004. These ideas weren’t new back then either!
The paradox
Consider the following two probabilities:
(1) p1 = the probability that a particular coin will land “heads” in its next flip;
(2) p2 = the probability that the world’s greatest boxer would defeat the world’s greatest wrestler in a fight to the death.
The first of these probabilities is essentially exactly 1/2. Let us suppose, for the sake of argument, that the second probability is also 1/2. Or, to put it more formally, suppose that we have uncertainty about p2 that leaves us equally likely to choose either option. In a Bayesian sense, assume the mean of our prior distribution, E(p2), equals 1/2.
In Bayesian inference, p1 = p2 = 1/2. Which doesn’t seem quite right, since we know p1 much more than we know p2. More generally, it seems a problem with representation-of-uncertainty-by-probability. To put it another way, the integral of a probability is a probability, and once we’ve integrated out the uncertainty in p2, it’s just plain 1/2.
Resolution of the paradox
The resolution of the paradox is that probabilities, and decisions, do not take place in a vacuum. If the only goal were to make a statement, or a bet, about the outcome of the coin flip or the boxing/wrestling match, then yes, p=1/2 is what you can say. But the events occur within a context. In particular, the coin flip probability p1 remains at 1/2, pretty much no matter what information you provide (before the actual flipping occurs, of course). In contrast, one could imagine gathering lots of information (such as in the photo above) that would refine one’s beliefs about p2. “Uncertainty in p2” corresponds to potential information we could learn that would tell us something about p2.
This understanding of the difference between epistemic and aleatoric uncertainty is also useful in helping us think about intermediate cases. You can think about uncertainties that are mostly but not completely aleatory (for example, the outcome of the roll of a die that might have some imperfections) or uncertainties that are mostly but not completely epistemic (for example, the outcome of a telephone survey, where response rates are well below 10%), and one way to understand this sliding scale is to think about how your probabilities would likely change as intermediate information becomes available. Again, the point is that “epistemic or aleatoric” is not a property of the event; it’s a property of the intermediate information that could be available.
Let me also add that models for aleatoric uncertainty can be important, even for problems where the uncertainty is mostly epistemic. For example, in our election forecasting model, most of the uncertainty about polling error is epistemic. But it’s still important here to have a model for aleatory uncertainty. Why? Because that helps us understand why, even though with existing polls the big concern is nonsampling error, not random sampling error, we wouldn’t want to start doing polls with N=4, because then the random error would dominate. Aleatory uncertainty is a lower bound on real-world uncertainty, and that’s one way in which probability models can be way.
P.S. Shravan assures us that the above photo of Molly is completely unstaged.