Skip to content

The Road Back

Paul Kedrosky points us to this news article by Liam Mannix, “Cold water poured on scientific studies based on ‘statistical cult.'”

Here’s what I wrote about this when it came up last year:

The whole thing seems pretty pointless to me. I agree with Kristin Sainani that the paper on MBI does not make sense. But I also disagree with all the people involved in this debate in that I don’t think that “type 1 error rate” has any relevance to sports science, or to science more generally. See for example here and here.

I think scientists should be spending more time collecting good data and reporting their raw results for all to see, and less time trying to come up with methods for extracting a spurious certainty out of noisy data. I think this whole type 1, type 2 error thing is a horrible waste of time which is distracting researchers from the much more important problem of getting high quality measurements.


  1. jim says:

    What I like about the article is this:

    “The claim that foam rollers could help with sore muscles, made by a team in 2015 that included a Charles Sturt University researcher, was based on a study of just eight people.”

    And, as it’s phrased, it’s irrefutable. Foam rollers really *could* help with sore muscles! So could a beer and a hamburger (applied directly to the muscles of course). Eight people, eight hundred, eight hundred thousand! Hardly matters, I could do a study with no people and reach the same conclusion. Foam rollers + turmeric, in a 180°C sauna (to kill pathogens in the lungs) with a beer and a hamburger.

    Andrew’s point about measurements and data is a good one but wow….if you’re just trying to find magic why waste the public’s money gathering data?

  2. Mark Samuel Tuttle says:

    While I have no illusions that it will have any larger impact when I mentor wanna-be data scientists, I try to get them to avoid the use of the words “bias”, “bad data”, “noisy data”, etc. Similarly, I think “good” data isn’t a helpful label. If one knows that some data is good and other data is bad it means that you have knowledge that you have not encoded in your model. This kind of thinking is the source of the assertion that “noise” is just data from a different distribution.

    On a related note, because of the coronavirus mess that we are in I’ve been thinking about two thought experiments to share with my medical colleagues.

    First, I want to use social media to track exposure or lack of exposure. It’s hard to imagine messier or more problematic data, but I think it’s important to try. Further, if associations are found, it’s not clear what such would mean, obviously; but still …

    Second, I want to work with my medical colleagues on the “denominator problem”. As you’ve suggested in an earlier post, arguing about a single definition for a denominator will not prove helpful at present. Instead, I want my doctor friends to help me identify points in exposure and disease progression that could be used as part of a sequence of denominators. Again, the question, is which denominators prove useful, rather than academic?

    • Martha (Smith) says:

      “Again, the question, is which denominators prove useful, rather than academic?”

      To me, the question is which denominators make sense for the question being studied? (Which may require a lot of thought and transparency.)

  3. Justin says:

    I’m not at all convinced about a single study with p less than .05. If there were several replications with well-designed experiments and p less than .05, then that may be a different story.

    But yet again, same criticisms apply to any single BF, posterior probability, or whatever other statistic is your favorite, and these are also are not immune to any QRPs.

    The article shows that small sample sizes may be an issue. Also as n gets larger, likelihoods likely swamp any Bayesian prior.


    • Anonymous says:

      “I’m not at all convinced about a single study with p less than .05”

      But why? Your Frequentist hero Mayo was fond of parroting real statistician’s talk of “frequentist guarantees” for a while. Even way back when, Fisher talked about “rarely being in error” using such methods. So what happened to those “guarantees”?

  4. Michael Nelson says:

    This is the second time in a week and half I’ve gotten whiplash reading this blog.

    Blog entry from a year ago (quoted above) and most other discussion here on the topic of NHST: Focusing on misuse of NHST just distracts from the fact that NHST is inherently problematic.

    Blog entry from ten days ago (“What is the conclusion of a clinical trial where p=0.6?”): It is a “disastrous” failing of the statistics community that PhD students misinterpret NHST results, one we must all “take responsibility” for fixing by adopting improved pedagogy.

    Blog entry from today: Focusing on misuse of NHST just distracts from the fact that NHST is inherently problematic.

    Either we urgently need to teach people to correctly understand and apply NHST, or doing so is a distraction from teaching them not to do NHST at all. If the former, getting people to understand why MBI is bad is important; if the latter, then we shouldn’t be nearly as disturbed that the PhD students (in the post from 10 days ago) misinterpreted a p-value as that the professor reporting the anecdote asked them to do so in the first place. Which is it?

    • Andrew says:


      I do want to improve statistics teaching, and I’m working hard on it, all the time. I’m writing books, developing teaching materials, and spending lots of time trying to figure out good default methods and workflows. In the meantime, things come up. I think it’s fine for people to explain in detail why MBI is a bad idea; I just think that it would be best for such explanations to be focused on outcomes, not on so-called type 1 error rates.

  5. Bob says:

    Sports science as most people understand it, doesn’t exist. Most of these ‘studies’ are graduate students who need a positive result for a dissertation or thesis. NHST is the least of your problems, if they don’t get a positive outcome they dump the results and start again.

    No knowledgeable practioner takes these things seriously, they’re for youtube personal trainers and equipment salesmen. It’s not how professional coaches do things anyway. Leaving aside the fact that untrained college subjects have no relevance for trained athletes, people work empirically. You do something your sprinters get faster. You do something else they get slower. You do something different and half your rugby squad is injured. Nobody is running t-tests.

    Remeember also that professional athletes are paid to promote, think power plates or physio tape.

Leave a Reply