Skip to content

We want certainty even when it’s not appropriate

Remember the stents example? An experiment was conducted comparing two medical procedures, the difference had a p-value of 0.20 (after a corrected analysis the p-value was 0.09) and so it was declared that the treatment had no effect.

In other cases, of course, “p less than 0.10” is enough for publication in PNAS and multiple awards. This is deterministic thinking for you: it’s no effect or a big scientific finding; no opportunity for the study to just be inconclusive.

This is a big, big problem: interpreting lack of statistical significance as no effect. The study was also difficult to interpret because of the indirect outcome measure. But that’s a standard problem with medical studies: you can’t measure long-term survival or quality of life, so you measure their treadmill times. No easy answers on this one.

Anyway, Doug Helmreich saw this press release on treatments for ischemia, and it reminded him of the stents example:

Here we go again?

I [Helmreich] only glanced through the results… the all-cause death rates are virtually the same. In other cases there’s some evidence for invasive procedures but because it did not meet the p-value threshold it is treated as “no difference”. I guess shades of gray make for much more difficult storytelling… An article titled “evidence for invasive procedures not overwhelming” is harder to write.

I don’t have the energy to follow the links and slides in detail (see P.S. below) but on quick glance I see where Helmreich is coming from. Here are the summary results:

It’s all deterministic; no uncertainty. I understand: as a heart patient myself, I just want to be told what to do. But, given that we’re making conclusions based on statistical patterns in data, this sort of deterministic reporting is a problem.

P.S. Before you haters jump on me for writing about a study I haven’t read, please recall that any press release is, in large part, intended for people who are not going to read the study. So, yes, press releases matter. And until labs stop releasing press releases, and until reporters stop relying on press releases, I’m going to insist that I have every right to record my reaction to a press release.

Also from the press release:

Many doctors routinely use an invasive approach in addition to medical therapy to treat IHD; however, it is not known if this approach is better than medical therapy alone as the initial treatment of patients with stable ischemic heart disease (SIHD), moderate to severe ischemia. ISCHEMIA is designed to find the answer.

“The answer,” huh?


  1. Michael Weissman says:

    Coincidentally, this paper was one that the medical residents’ journal club that I help with just read. Over the particular time-span chosen, the death-from-any-cause result really is so close to the null that there’s no need to worry about arbitrary p-value cutoffs, etc. But the real results tell a very different story. Initially, unsurprisingly, the invasive strategy is riskier. Then over time those who survive it keep doing better. It’s probably not just an artifact of removing the most vulnerable from the invasive group. Total cardiovascular outcomes (death, infarction) are lower by the end of the period for the invasive group. So it looks like there’s a tradeoff between initial risk and long-term risk that will play out differently for different patients. Or, as it’s often summarized “there’s no effect”.

    • Matt Skaggs says:

      “the invasive strategy is riskier. Then over time those who survive it keep doing better”

      Fascinating, thanks. On one hand, this is a good example of “statistics are hard.” On the other hand, there is no good reason why the model did not control for this outcome.

      Reminds me of the Tamiflu example, which I think I mentioned here once before. Either you take it early and don’t get sick, or you take it too late and it has no effect. But since not even you know how long you have had the flu when you take it, the stated benefit is summarized as the average, something like “reduces duration of symptoms by one day.”

  2. Zhou Fang says:

    I’m reminded of an article from 2016.

    “I Just Want Nate Silver to Tell Me It’s All Going to Be Fine”

  3. Mark Samuel Tuttle says:

    Good post!

    This is so complicated …

    For example, I am a “statin failure” because in me they work too well – I can’t metabolize them so they just build up and give me distressing symptoms. Presumably, this is pharmacogenomic – my liver metabolism is different, whatever that means.

    Anyway, sorting by genomic risk might, or might not provide, additional insight. Though, as you would observe, this will require a larger population to sort out the many potential and actual effects.

    Another example: Use of coumadin (warfarin, rat poison used as a blood thinner – anti-coagulent) before pharmacogenomics was problematic (longer story). After pharamcogenomics – different folks metabolize it differently – it’s STILL problematic. E.g., eating salad – vitamin K – has as much effect as liver metaboilsm.

    To your point – it is all deterministic but only on an indiviual patient basis … !

  4. Anoneuoid says:

    Look at the number needed to treat stats for cardiovascular drugs:

    Almost all of these “statistically significant” benefits amount to helping at most 1 out of 10 patients (most are much less)? So doctors are prescribing interventions with 10% or less chance of working? Is that the correct interpretation of NNT?

  5. Nick Adams says:

    The sensible course of action based on this evidence is to not have a policy to automatically stent everyone with symptoms, but to individualize the treatment approach in consultation with the patient. So I don’t think interpretation of this study is difficult but individualization on the other hand might be. Every human is different. Get a good doctor, discuss the options.
    Anoneudoids link above to then lists the interventions for which the evidence is so strong that they are usually protocolised and automatically applied. The rest of the stuff is nuanced.

  6. jim says:

    It’s one thing to say that there is some cutoff off efficacy that must be achieved for use in treatment. It’s a completely different thing to say that, because a treatment didn’t reach the level of efficacy required for prescribing, that there is no effect! My goodness people with MDs and MSs and PhDs can’t and don’t make that distinction?

  7. Chetan says:

    This post reminds me of an old talk by Nassim Taleb where he mentioned that one of the main historical purposes of religion was to keep you away from the doctor.

    Going to the Temple of Apollo, and fasting is better than bloodletting.

  8. deb says:

    “This is a big, big problem: interpreting lack of statistical significance as no effect.”

    This is an excellent, excellent point. Academic journals kind of fetishize “p<.05". And maybe there is a human tendency to impose binary categories on continuous variables. If p.05, then NO.

    One of my favorite Far Side cartoons shows a man stranded on a desert island. He has spelled out the word HELF. A plane flies overhead. The pilot says something like “No need to stop. It just says HELF.”

  9. Shravan says:

    Not only do reviewers want certainty, they insist that for an investigation to be worthwhile there has to be a clear outcome. Recently a reviewer objected to a model comparison we did on the grounds that there was no conclusive winner among the models. The reviewer thinks there has to be one clear winner. Another objection was that in previous work we already showed with a different dataset that one of the models was a winner; why did we bother to investigate this same set of models with new data? I’m getting tired of doing psycholinguistics. It’s a bit like wading through mud.

Leave a Reply