Skip to content
Archive of posts filed under the Statistical graphics category.

“Data in Wonderland”: A course on storytelling with data:

Scott Spencer is teaching this class at Columbia. It looks really cool.

Tableau and the Grammar of Graphics

The first edition of Lee Wilkinson’s book, The Grammar of Graphics came out in 1999. Whether or not you’ve heard of the book, if you’re an R user you’ve almost certainly indirectly heard about the concept, because . . . you know ggplot2? What do you think the “gg” in ggplot2 stands for? That’s right! […]

Color schemes in data graphics

Natesh Pillai points us to this recent article, “The misuse of colour in science communication,” which begins: The accurate representation of data is essential in science communication. However, colour maps that visually distort data through uneven colour gradients or are unreadable to those with colour-vision deficiency remain prevalent in science. These include, but are not […]

If you put an o on understo, you’ll ruin my thunderstorm.

Paul Alper writes: Here is a fascinating article by Matthew Cappucci from the Washington Post dealing with the difficulty experts have when trying to convey technical results to the lay public. In a nutshell, the categories the experts at the Storm Prediction Center use: marginal, slight, enhanced, moderate or high do not correspond to the […]

Subtleties of discretized density plots

Many people are familiar with the idea that reformatting a probability as a frequency can sometimes help people better reason with it (such as on classic Bayesian reasoning problems involving conditional probability). In a visualization context, discretizing a representation of uncertainty, or really any probability distribution, can be useful for other reasons. For instance, by […]

Sketching the distribution of data vs. sketching the imagined distribution of data

Elliot Marsden writes: I was reading the recently published UK review of food and eating habits. The above figure caught my eye as it looked like the distribution of weight had radically changed, beyond just its mean shifting, over past decades. This would really change my beliefs! But in fact the distributional data wasn’t available […]

xkcd: “Curve-fitting methods and the messages they send”

We can’t go around linking to xkcd all the time or it would just fill up the blog, but this one is absolutely brilliant. You could use it as the basis for a statistics Ph.D. I came across it in this post from Palko, which is on the topic of that Dow 36,000 guy who […]

Most controversial posts of 2020

Last year we posted 635 entries on this blog. Above is a histogram of the number of comments on each of the posts. The bars are each of width 5, except that I made a special bar just for the posts with zero comments. There’s nothing special about zero here; some posts get only 1 […]

How many infectious people are likely to show up at an event?

Stephen Kissler and Yonatan Grad launched a Shiny app, Effective SARS-CoV-2 test sensitivity, to help you answer the question, How many infectious people are likely to show up to an event, given a screening test administered n days prior to the event? Here’s a screenshot. The app is based on some modeling they did with […]

Is there a middle ground in communicating uncertainty in election forecasts?

Beyond razing forecasting to the ground, over the last few days there’s been renewed discussion online about how election forecast communication again failed the public. I’m not convinced there are easy answers here, but it’s worth considering some of the possible avenues forward. Let’s put aside any possibility of not doing forecasts, and assume the […]

I like this way of mapping electoral college votes

This post is by Phil Price, not Andrew.  I like maps — everybody likes maps; who doesn’t like maps? — but any map involves compromises. For mapping electoral votes, one thing you sometimes see is to shrink or expand states so they have area proportional to electoral votes (or to population, which is almost, but […]

Why is this graph actually ok? It’s the journey, not just the destination.

Josh Miller was in my office and started flipping through Kieran Healy’s book on data visualization, a book that I like a lot—I even use it in my class, replacing Cleveland’s Elements of Graphing Data which is wonderful but things have changed in 35 years so time for a new book. Josh noticed Figure 8.17 […]

Interactive analysis needs theories of inference

Jessica Hullman and I wrote an article that begins, Computer science research has produced increasingly sophisticated software interfaces for interactive and exploratory analysis, optimized for easy pattern finding and data exposure. But assuming that identifying what’s in the data is the end goal of analysis misrepresents strong connections between exploratory and confirmatory analysis and contributes […]

Follow-up on yesterday’s posts: some maps are less misleading than others.

Yesterday I complained about the New York Times coronavirus maps showing sparsely-populated areas as having a case rate very close to zero, no matter what the actual rate is. Today the Times has a story about the fact that the rate in rural areas is higher than in more densely populated areas, and they have […]

All maps of parameter estimates are (still) misleading

I was looking at this map of coronavirus cases, pondering the large swaths with seemingly no cases. I moused over a few of the gray areas. The shading is not based on counties, as I assumed, but on some other spatial unit, perhaps zip codes or census blocks or something. (I’m sure the answer is […]

Sleep injury spineplot

Antony Unwin sends along the above graph in response to this recent post. The data are kinda crap, but I agree with Antony that this plot is a good way of showing the number of cases corresponding to each histogram bar.

Misrepresenting data from a published source . . . it happens all the time!

Following up on yesterday’s post on an example of misrepresentation of data from a graph, I wanted to share a much more extreme example that I wrote about awhile ago, about some data misrepresentation in an old statistics textbook: About fifteen years ago, when preparing to teach an introductory statistics class, I recalled an enthusiastic […]

Alexey Guzey plays Stat Detective: How many observations are in each bar of this graph?

How many data points are in each bar of the top graph above? (See here for background.) It’s from this article: Milewski MD, Skaggs DL, Bishop GA, Pace JL, Ibrahim DA, Wren TA, Barzdukas A. Chronic lack of sleep is associated with increased sports injuries in adolescent athletes. Journal of Pediatric Orthopaedics. 2014 Mar 1;34(2):129-33. […]

Information, incentives, and goals in election forecasts

Jessica Hullman, Christopher Wlezien, and Elliott Morris and I write: Presidential elections can be forecast using information from political and economic conditions, polls, and a statistical model of changes in public opinion over time. However, these “knowns” about how to make a good presidential election forecast come with many unknowns due to the challenges of […]

“Pictures represent facts, stories represent acts, and models represent concepts.”

I really like the above quote from noted aphorist Thomas Basbøll. He expands: Simplifying somewhat, pictures represent facts, stories represent acts, and models represent concepts. . . . Pictures are simplified representations of facts and to use this to draw a hard and fast line between pictures and stories and models is itself a simplified […]