Alexey Guzey plays Stat Detective: How many observations are in each bar of this graph?

How many data points are in each bar of the top graph above? (See here for background.)

It’s from this article: Milewski MD, Skaggs DL, Bishop GA, Pace JL, Ibrahim DA, Wren TA, Barzdukas A. Chronic lack of sleep is associated with increased sports injuries in adolescent athletes. Journal of Pediatric Orthopaedics. 2014 Mar 1;34(2):129-33.

Here’s the information we have to work with:

This study was conducted at a combined high school/middle school in a large metropolitan area. . . . Eligible participants included any male or female student at the school who was entering grades 7 to 12 who had participated and planned to continue to participate in at least one sport during the previous and upcoming year. . . . Informed consent for participation in our study was obtained from 160 student athletes and their parents. All consenting students were sent a copy of the survey by school-registered email. Of the 160, 112 student athletes (54 male and 58 female athletes) completed the survey . . . Of the 112 athletes studied, 64 athletes (57%) sustained a total of 205 injuries; 48 athletes (43%) were not injured. . . . Sixty-five percent of athletes (56/86) who reported sleeping < 8 hours per night were injured, compared with 31% of athletes (8/26) who reported sleeping ≥ 8 hours per night.

This is like one of those logic puzzles they gave us when we were kids!

Let’s label the proportions in the bar graph as y5/n5, y6/n6, y7/n7, y8/n8, y9/n9.

What do we know? From the text:

• n5 + n6 + n7 + n8 + n9 = 112

• n5 + n6 + n7 = 86

• n8 + n9 = 26 (this is redundant information but it’s good to check that 26 + 86 = 112 so we’re not missing anybody)

• y5 + y6 + y7 = 56

• y8 + y9 = 8

From the graph:

• y5/n5 = 60%

• y6/n6 = 75%

• y7/n7 = 62% (Guzey counted the pixels: 310/499 = 0.621)

• y8/n8 = 35%

• y9/n9 = 16.7%

• n5 + n6

Given all this information, Guzey solved the puzzle. I’ll paraphrase:

• The easiest place to start is with the last two bars, which represent a total of 8 injuries out of 26 kids. It’s gotta be 7/20 and 1/6, as there are the only numbers that give anything close to 35% and 16.7%. For example, 6/14 and 2/12 won’t work.

• Next you can figure out the first three bars, which represent a total of 56 injuries out of 86 kids. The only numbers that are consistent with the above information is 3/5, 15/20, and 38/61. You can play around with other possibilities and they just don’t work out.

So now we can ask, why did Matthew Walker in his book Why We Sleep remove the left bar of that graph (3 injuries out of 5 kids) but keep the right bar (1 injury out of 6) so as to produce the second graph shown above?

The answer is clear. He cut off the left bar because it was based on N=5. He kept the right bar because it was based on N=6. Walker was following the well known remove all bars for which N is less than 6 rule.

Ok, just kidding. I think the real reason he removed the left bar and kept the right bar is that the left bar didn’t support his story that lack of sleep is bad for you, but the right bar did support his theory. He didn’t want to confuse the reader with the complexities of reality.

Not wanting to confuse the reader with the complexities of reality is ok if you know the underlying truth, but it’s not such a great idea if you don’t have a direct line to God and if, like everyone else, you have to rely on empirical evidence to make your conclusions.

To step back a moment, how much can you really conclude about sleep and injuries based on a one-time study at one school? So, from that point of view, you’d have to say that Walker must have already had strong beliefs about the danger of sleeping less than 8 hours a night, and these data provide confirmation rather than evidence. They’re an illustration of his point rather than evidence for his point. Fair enough; still, if you want to present data, you shouldn’t cheat.

The larger point

The larger point here is not about cheating or research misconduct; it’s about research more generally. It’s about learning from data.

We’ve been talking a lot about Walker because his book and Ted talk have received a lot of attention. But the practice of misrepresenting evidence to make a clearer point when teaching . . . that happens all the time. Tomorrow I’ll post another example.

I think that the people who do this really don’t think they’re doing anything wrong: they’re just cleaning up the data to tell a better story. Just like Marc Hauser did when he insisted on coding all his data himself: he knew the story and he didn’t want any random variation to get in the way of it.

The trouble is, once you start mucking with the data, you move from what Thomas Basbøll and I called “stories” to what we called “parables.” Data, and good stories, are immutable—indeed, it’s this immutability that allow us to learn from data. When you start trimming your data to fit your preconceptions or to fit the story you want to tell, you’ve given up your ability to learn. You’re no longer doing science—and you might not be doing good education either. In the immortal words of John Clute, “End of novel. Beginning of job.”

1. JDK says:

I like the concept of parables! My field is fisheries science and I have an example of a published article that trimmed a 50 year data set to only the most recent 6 years or so because those years showed a trend that fit with the authors preconceptions and favoured story line. That it wasn’t pointed out in peer review is another debacle. I fully support the concept that messing with the data to fit your narrative means you have basically given up science. You are now a lawyer/advocate.

• Andrew says:

Jd,:

What’s this example? Can you share a link?

• JDK says:

sorry for the delay in responding and also screwing up the reply, it is down in the comments below. A bit involved but interesting in how the original article cherry-picked data and how they responded to being called out on it.

I think part of the problem is the use of a simple bar chart. The lower rate for 5 hours is probably a fluke. Putting error bars would likely show that the 5hr injury rate is probably not smaller than the 6hr injury rate, and could easily be larger (higher upper bound on CI). And then he would be less likely to throw out the 5hr bar.

• Dan Riley says:

+1

I can understand leaving off the error bars in a popularization, but I do not understand how it’s allowed in serious professional journal. I guess surgeons don’t talk to statisticians much?

• Martha (Smith) says:

Dan said,

“I guess surgeons don’t talk to statisticians much?”

Maybe only when the statisticians are under anesthetic? :~)

• jim says:

“Putting error bars would likely show that the 5hr injury rate is probably not smaller than the 6hr injury rate, ”

I don’t see the point in putting error bars on data that’s self reported from memory. The whole chart is a fluke. It’s not worth even discussing in a scientific context.

Talking about this chart in terms of the missing bar is also a waste of effort. The bar is just as meaningless as the whole chart. It’s all garbage from square one. Not science.

If people want to do science, then they need to get reliable measurements, not self-reported and metaturk junk data.

• Andrew says:

Disappointed that Retraction Watch didn’t work in any sort of sleep-related pun in their headline or their article. They’re usually good with that.

• Martha (Smith) says:

Maybe “they were asleep on the job”?

(Could apply either to the Retraction Watch folks, or might have been usable in the headline.)

3. Dale Lehman says:

Concerning Andrew’s more general concern – I think the problem goes even deeper. I don’t like most textbooks (especially American ones – the British textbooks are often better). One of my main complaints is that the examples are mostly cleaned up ones so that the point (e.g., conducting a T test; interpreting a regression coefficient, etc.) is clear and students can show they have mastered the basic concept. While those goals are worthwhile, there is an unintended side effect – which is not so harmless in my opinion. By providing examples that remove the messiness of real data, we teach that conclusions are often unambiguous. But they never are. The “real” story is always ambiguous. So, even if you don’t muck up your data, when you provide clean data that leads to unambiguous results, the effect is similar. You can justify it by thinking you are just making things clearer for the reader/student. But you may inadvertently be creating a dangerous illusion of certainty.

• jim says:

” But you may inadvertently be creating a dangerous illusion of certainty.”

Excellent point.

But worse yet, if text books aren’t even discussing data that doesn’t perfectly fit some ideal outcome, then they’re not even considering the processes that generate the data. They’re just talking about how the mathematical tools operate under ideal conditions.

• Agree with Dale. Thanks for that.

4. Chris says:

But when does using a simulation as a simplification of reality become a parable instead of a story? If one is clear that the simulation’s purpose is to simplify, then intentionally adding interventions to understand the results, is that a story or a parable? Perhaps a good parable?

• Martha (Smith) says:

Chris,

I’m not at all clear on what you’re trying to say. Can you explain what you mean by “using a simulation as a simplification of reality”?

5. jonathan says:

Maybe I can make this fit an actual parable. Take The Good Samaritan. It’s a simple story of a guy lying in the road. The first two men to come along dont do anything but the 3rd one does. The story is often presented as the old ‘law’ being wrong, while the new law is good, as seen by the Samaritan’s actions, which takes the Jewish context and twists it into Christian meaning. If you restore the Jewish context, the point shifts to mirror closely the Story of Ruth, in which a foreigner, an actual Moabite, one of the few groups listed as an enemy, is accepted into the ‘people’ because she demonstrates her worth through devotion to her mother-in-law Naomi. The Jewish point would be that the Samaritan should be ‘accepted’ because he did the right thing. A key difference is that in the Jewish context, the first two did nothing wrong: they couldnt touch the man because that would ritually defile them and their community. That left it up to the Samaritan to help, though he had no obligation other than as a person. So, by removing the data of the Jewish context, you can actually flip the meaning over so it becomes a condemnation of legalistic Jews instead of a plea to accept the non-Jew as a righteous person.

This seems fairly similar to the way the story of the data at least says that 5 hours of sleep means you’re less likely to have an accident, in direct opposition to the claim that shorter sleep means more. (I’d love to see if staying up all night improved things! It never worked well for me in college.) Presentation of the data is thus like channeling a story into a version that says what you want it to say, even if or though that version is actually an if not the opposite of the original meaning.

For non-Jews, in the Jewish context, the Samaritan story is similar to any number of stories about rules and who needs to abide by them and when. One of my favorites is Hassidic and it involves a great rabbi who doesnt go to worship on Yom Kippur because he was taking care of someone who was ill. That’s an explicit ranking of obligations. Since obligation is equivalent to commandment to mitzvot, the religion constantly focuses on the meaning of adherence. One of those meanings is how you treat people who act righteously though they may also disagree with you. Moabites were actual enemies, but of course Ruth as a woman wasnt the same as a male Moabite. Samaritans are still around in small numbers, still fundamentalists who only accept a few parts of scripture. They’ve been at odds with ‘Judaism’ all along.

6. Martha (Smith) says:

Andrew said,
“Not wanting to confuse the reader with the complexities of reality is ok if you know the underlying truth, but it’s not such a great idea if you don’t have a direct line to God and if, like everyone else, you have to rely on empirical evidence to make your conclusions.”

Maybe “Direct Line to God” needs to go into The Lexicon?

7. Antony Unwin says:

A spineplot might be better. I have emailed one to you, Andrew.

8. Kaiser says:

“I think that the people who do this really don’t think they’re doing anything wrong: they’re just cleaning up the data to tell a better story.”

I’ve been calling this “story-first thinking” contrasted with “data-first”. It’s widely practiced. the data are fitted to the story rather than the other way round.

Nice example!

9. JDK says:

My bad, I did not remember it well, being from way back in 2007 but it is interesting anyway:
https://science.sciencemag.org/content/318/5857/1772.abstract – original article
https://salmonfarmscience.files.wordpress.com/2012/02/sealice_2008_sea_lice_extinction_hypothesis_fails.pdf – comment on original article
https://www.tandfonline.com/doi/abs/10.1080/10641260802013692?journalCode=brfs20 – response to comment
https://science.sciencemag.org/content/322/5909/1790.2 – another rebuttal to original article
http://www.math.ualberta.ca/~mlewis/Publications%202009/Krkosek-Ford-Morton-Lele-Lewis_WildSalmon—.pdf – another rebuttal to comment

Maybe too much? but it is another example of the back and forth and digging in/defending the original conclusions as not wrong…