## Subtleties of discretized density plots

Many people are familiar with the idea that reformatting a probability as a frequency can sometimes help people better reason with it (such as on classic Bayesian reasoning problems involving conditional probability). In a visualization context, discretizing a representation of uncertainty, or really any probability distribution, can be useful for other reasons. For instance, by animating draws from a target distribution, something I call hypothetical outcome plots, it becomes possible to visualize joint probabilities that would otherwise require many individual static plots to visualize or that you add another visual variable, like color lightness or opacity, to already complex visualizations. By discretizing a representation of a pdf, something my collaborator Matt Kay termed quantile dotplots, area judgments that people would otherwise have to do but which are error-prone can be avoided in favor of simpler perceptual judgments. For example, when tail probability judgments are of interest, provided a small number of outcomes (e.g., 20 or so) are used a person may not even need to count, instead relying on the visual system’s ability to recognize small counts of objects (e.g., four or less), called subitizing.

One thing I like about this line of research is that despite the way it sometimes feels like I’ve sufficiently explored the space, I still often end up being surprised sometimes by things I hadn’t seriously considered before. I think this stems from the inherent tension in trying to represent continuous-valued variables discretely. This has happened a couple times recently.

Implications of approximation in representing continuous functions discretely

Consider quantile dotplots, which are Wilkinsonian dotplots (dotplots for continuous-valued variables) displaying predictive quantiles for the distribution you want to visualize. You can set the number of outcomes (dots) you want. If you have a predetermined bin size you want to use, you can do that. The binning algorithm will no longer be Wilkinson’s, but the idea is the same. (By the way if you want to make them in R, along with lots of other frequency and continuous visualizations of distribution, you should check out ggdist, Matt’s latest package, or you can do it in tidybayes).

Here’s one with 20 dots:

We’ve done some controlled experiments motivated by domains like making decisions about when to leave to catch a bus and found that quantile dotplots can lead to more consistent probability judgments among lay people (bus riders) and better (more utility optimal) decisions. But there’s a tension related to what we could call “expressiveness”, which is the term used in visualization in research to refer, roughly speaking, to the extent to which a visualization shows the data in a way that leads people to form spontaneous impressions of the data that align with its actual characteristics.

Say I showed you the above plot, telling you it showed hypothetical values a variable could take, and I asked you how probable it is that the process generates a value between 13 and 14. What would you say?

You should recognize that the dots toward the edges are meant to represent the tails and so the two stacked dots toward the right end don’t mean that there’s exactly a 10% probability of values in a bin from say, 11.8 to 12.7, and no probability between 13 and 14. It’s just that as you move toward the edges of the distribution, the bins are getting wider.

This is pretty obvious if you understand probability distributions, yet this aspect of the plot design seems in tension with the aim of giving people a completely concrete representation that avoids the more error-prone area judgments required with continuous representations of density plots. Since now we are implicitly wanting them to spread some dots out.

Does this matter? Probably not in many applications. But I was thinking about this recently because of a project where we are eliciting distributions after showing people an animated network model visualization, to see what they perceive to be the posterior distribution of various network properties (density, shortest path length, etc). To elicit their distributions we give them a distribution builder interface, which requires that they distribute 20 balls between pre-determined bins (based on what they say are minimum and maximum plausible estimates for the property and a preset number of bins held constant across different users).

I found myself questioning what sorts of assumptions someone given an interface like this might make (even those who do analysis often, like the network analysts we’re evaluating with). Ideally, they should operate like the quantile dotplot algorithm to find the best discretized version of the distribution they’re imagining. But I could also see someone finding it slightly unnatural to be leaving gaps to better represent a long tailed distribution, when given a set of balls and told to put them in bins according to probability. Maybe this is related to the Gestalt principle of good continuation, like if you’re trying to ‘’draw’’ a continuous distribution it’s not necessarily intuitive to include gaps. Of course if they treated the interface this way, even if they perfectly perceived the posterior distribution, their response distributions could look like they perceived something different.

The lingering ambiguity bugs me, inconsequential as it may be, since our motivation for many of these discretized interfaces is that we’re trying to find more intuitive alternatives to continuous representations. Ideally these shouldn’t require a lot of instruction to be used properly, since then the representation itself isn’t really doing as much work. In this case, maybe what’s needed is a responsive distribution builder that doesn’t give pre-determined bins, just a number line, and does some inference to predict bins as the user places dots, then presents some feedback on the inference back to themr so they can adjust the placement. But that’s getting complicated.

One visual way to try to make quantile dotplots more expressive would be to let the width of dots vary. But then you just create other problems with expressiveness in that it could easily invite misinterpretations of the amount of probability assigned to each dot due to the area differences, since area is often meaningful in plots of distributions. So, trade-offs. Similarly adding an area encoding of the pdf as a layer behind the dots could reduce ambiguity, but then we’re back to continuous representations.

I find this quandary kind of interesting since I’m often thinking about discrete vs continuous representation trade-offs from the opposite direction, like related to the various strong tendencies we seem to have as human to want to mentally discretize things like analysis results so they’re simpler to reason with. This example is sort of like the inverse, where it turns out it’s hard to avoid continuous thinking when you’re trying to keep things discrete.

Risk perception and decision-making versus comprehension

In another project I’m doing we’re using animated scatterplots to show uncertainty in a regression fit based on an observed dataset of 50 data points. Each frame in the animation shows 50 points resampled from the original observations with the least squares line overlaid. Recently the question came up of how much it matters if users of the visualizations take the draws to be historical data instead of hypothetical observations. If, for instance their degree of caution in making decisions based on the information was appropriate according to some defined standard, then how much should we worry about it if they didn’t get the fact that the outcomes were hypothetical?

For example, I could imagine some CEO being presented with plots like this in a presentation, or any discretized uncertainty visualization really, and making better decisions about how to allocate resources relative to if they had used more conventional continuous representations of uncertainty. How much should we worry about it if their decisions improve despite the fact that they don’t grasp that it’s not real data they’re seeing? (I’m setting a low bar for CEOs here, I know.) It doesn’t seem great to show someone a visualization that give a false impression that their company has access to a lot more data than was actually available. But if we always did what was comprehensible to the average layperson, or the average CEO, we might be pretty restricted.

The unavoidable tension here seems related to the leap that must be made, from thinking in terms of observed data to thinking in terms of hypothetical outcomes and discretized representations. Discretized representations can make probability more concrete and make it easier to visualize certain information, but can that concreteness then mislead people to thinking they’re seeing more data? One challenge with uncertainty visualization that we see sometimes is that when people are under a high cognitive load, there’s a risk of overwhelming them with information. So while we could spend a lot of time designing training for how to use new visualizations, in settings where people are eager to just make a decision, it might not always help.

1. Dale Lehman says:

This post made me immediately think of a presentation I just attended (an ASA luncheon about data storytelling). They were highlighting their (the consulting firm involved) development of the COVID dashboard for the city of Chicago. Some key numbers on the dashboard (weekly tests, cases, deaths) are color coded green or red, depending on whether they increased or decreased from the prior week. I asked if there was a threshold for the size of the change and they said no. This is an extreme form of discretization – 2 categories for the color. The problem, as I see it, is that viewers immediately get a “good” or “bad” image/feeling depending on the color, when some changes are large and others are not. Sort of like the financial reporting that says “stocks were up on the news of X; the DJIA went up by 0.2%.” In these cases, I think discretization is bad – precisely because it is so easy to digest.

There are compromises, of course. The green or red shading can vary in intensity. Better yet, there can be an up or down arrow, varying in length proportional to the size of the change. Even subconsciously, these provide visual cues that are more accurate than the discrete version. So, the tradeoff between simplicity and accuracy may not be that worrisome in some cases.

2. Back in pre-historic times, I tried using quadrature rules (say 7 quadrature points) as a way to make sense of continuous distributions. If I recall correctly it nicely highlighted the difference between Cauchy and Slash distributions.

But these days, Monte Carlo is so fast, and the draws do replace a continuous distribution with a discrete one. Also, say 19 draws could be in grey and 1 in black. Possibly cycled through.

So the question is, why are particular discretizations being used?

• Interesting question. There were a few reasons that motivated the quantile dotplot… we were thinking about bus riders who want to see some representations of the uncertainty in point estimates of bus arrival times they would normally get, so they can decide things like when to leave for the bus stop. But they need it in a format that they can more or less glance at and make decisions quickly. The thought was that since its often one sided intervals they care about, with a small number of dots as long as you know the total number its easy to quickly see your probability of catching/missing the bus.

Also, I guess we were assuming the average app designer would have an easier time implementing something static. But I’m a big fan of animation and so rapid cycling through draws seems intuitive to me. I like how the weather app DarkSky visualizes precipitation predictions, which is something like this.

3. Andrew says:

I have a slightly different question, which is whether we should be visualizing densities at all. OK, sometimes we should, but I’m conjecturing that, as statisticians, we think about densities much more than we should. From this perspective, the relevant question is not, should we display histograms or smoothed densities or parametric densities or discrete visualizations, but rather, should we be trying to visualize the 1-d distribution in the first place?

One reason I ask this question is because I remember, years ago, teaching intro statistics and having a homework problem near the beginning of the semester asking students to gather some data and plot a histogram. About half the students plotted time series as bar charts. At first I was annoyed, but then I started to think, Why am I so sure that students should be learning about histograms anyway, given how non-intuitive they can be.

So, I’m not saying to never show a histogram or display a distribution, but now I’m more happy with time series plot. In Jessica’s example above, we could plot bus waiting time vs. date.

• I agree sometimes densities are not what we need. In this case the probability of different arrival times is what the bus riders wanted, so we tried to show them that.

The time series with bar charts example reminds me of this Barbara Tversky paper on spontaneous tendencies among students to read bar charts as discrete data and line charts as continuous:
https://cpb-us-w2.wpmucdn.com/sites.wustl.edu/dist/e/952/files/2017/09/zacksmemcog99-12d5ktx.pdf
Maybe the results would have been different on your students.

• David Chorlian says:

I have a large number of observations with distributions which change with age. I’ve plotted time series of quantiles which give some sense of the change in distributions, but plotting the distributions for different age ranges seems to make changes in the skewness much more clear than looking at the quantiles. This pattern, for some variables, suggests a maturational process. Any suggestions for more informative visualizations?

• Luke says:

Time series plot for distributions is pretty much a trace plot from MCMC. Although I wouldn’t put it passed people to start analysing time series plots like those tea leaf readers in stocks doing “technical analysis”.

4. Tom Passin says:

When I looked at Jessica’s diagram with the dots, and thought about the probability of drawing a value between 13 and 14, I immediately said to myself that the value there would be about 1 1/2. Then I starting wondering if I should bother counting all the dots because I had forgotten that there were 20. I had just decided that I wasn’t motivated enough to count them when I remembered that there were 20. Of course, to get a probability estimate I would then have to divide the 1 1/2 by the total count.

If the bins were actually supposed to have been unequal, then I probably got it wrong anyway.

The whole process was distracting and I don’t think it was very intuitive, if you get right down to it.

The only intuitive thing for me was that the pdf was pretty much a typical single-peak pdf, not Gaussian perhaps, but maybe much like a radial gaussian. As I think back on it, though, it has to be idealized instead of real data since there isn’t real-life variation like there would be with a histogram.

I hope these reactions are helpful for Jessica’s research.

5. Phil says:

On the broader subject of discretizing continuous data:

I’m looking at data on electricity outages and the fires that they cause. I have some potentially useful explanatory variables, such as estimates of fuel moisture content for various sizes of fuel (grass, small sticks, large sticks), and the current temperature and the mean temperature over the past 24 hours and past week, that sort of thing. I expect a relationship between ignition probability and fuel dryness, for a given cause of electricity outage, but I don’t expect that relationship to be linear (in either untransformed or logit space). It’s a large dataset — there are hundreds of thousands of outages per year, and a few hundred ignition events — and a lot of the methods I usually use for exploratory data analysis were not providing me with much or any insight.

One thing I found very useful was to divide many of the variables into quintiles and calculate ignition probability per quintile: what’s the probability that an outage causes a fire when it is extremely dry compared to when it isn’t? What about when it is extremely hot compared to when it isn’t? And so on. And of course I could combine these: what if it’s very hot and very dry; very hot and somewhat dry; and so on. Histograms and tabulation gave me a lot of understanding.

Now that I have a good feel for what explanatory variables are most important and which ones seem to have interactions too big to ignore, I’m switching to continuous models, since I don’t want artificial boundaries between one dryness category and another, but for exploratory analysis the discretization really helped.