Skip to content

Why We Sleep—a tale of non-replication.

Good to have a non-coronavirus post that I can put on delay . . .

After reading our recent post, “Why We Sleep — a tale of institutional failure”, David Shanks wrote:

You may be interested to know that a little while ago we were completely unable to replicate a key result by Walker and colleagues published in Nature. See

OK. I’m starting to think that the answer to Alexey Guzey’s question, “Is Matthew Walker’s ‘Why We Sleep’ Riddled with Scientific and Factual Errors?”, is . . . Yes.

Research misconduct and work that doesn’t replicate . . . two great tastes that taste great together!


  1. John N-G says:

    “Delay” also allows time for post-publication review!

    Full comment and replay available from the PNAS link. Excerpts:

    “…Three methodological differences suggest potential explanations for this partial failure to replicate…In conclusion, the thoughtful report by Hardwicke et al. (cit. om.) provides an important replication of our earlier work and, through contrasting design features, suggests nuanced limits to measurable human motor memory reconsolidation.” – Walker & Stickgold, 2020

    “…To conclude, we disagree that the moderators proposed by Walker and Stickgold (3) can be viewed as “boundary conditions” on reconsolidation theory because they (i) are at present only post hoc conjectures and (ii) do not provide a compelling account of the extant data. Nevertheless, this discussion has generated a number of testable hypotheses that can be empirically verified with new data.” – Hardwicke & Shanks, 2020

  2. Not Trampis says:

    maybe he di the research whilst asleep!

  3. Shravan says:

    I guess one way to think about this is: why should anything replicate. Every dataset is a unique product of its moment and serves to tell a particular story. It doesn‘t have to be a true story, it‘s just what those data tell us.

    • Anonymous says:

      if it doesn’t replicate, I’d say it does not generalize. Psychologists loooove to generalize though, so I think that is the problem. When making science (for example social sciences) we want to be able to generalize, but if my result about X -> Y found in setting A does not hold for setting B, what else can I say about setting C,D,E,…? If only for generalizability (there are also ethical concerns, especially in psychology), we should be REALLY worried about lack of replicability in psychology

      • I’m trying to do the best I can, and I even understand statistics to some extent, at least better than I did in 2002 when I finished my PhD. But despite my very best efforts, which includes things I have never done before, like a willingness to wait four years to get enough data, I am generally unable to replicate anything, if by replicate we mean that a significant effect comes out significant. If by replicate we mean that I generally get consistent patterns of estimates (differences between two means or groups of means coming out with the same sign repeatedly), then I am much more successful. This is not very impressive IMO; it would be better if others were trying to replicate my work (more generally: different teams need to try to replicate the same effect). But there isn’t much enthusiasm for this because there’s no novelty value; people don’t want to hold up their labs’ work in this way. Ending the year on this happy note…

        • > by replicate we mean that a significant effect comes out significant.
          But that’s a mug’s game, taking the difference in significance as significant and all that nonsense.

          It’s beyond me where this “significant effect comes out significant” came from. We certainly were not the first but “The statistical issues include _consistency (homogeneity)_ of study outcomes” Meta-Analysis in Clinical Research 1987

          Definitely better if another group does the replicate work.

        • AllanC says:

          Focusing strictly on replicating a significant result misses the mark. For one, as Keith mentions the difference between significant and not significant is unlikely itself to be significant. For two, if you are testing substantive theories there is a hell of a lot more going on in the conjunct that you are testing than just the core theory. This is explicated quite nicely in Meehl’s 1989 lectures (available on his website); when testing a theory by way of experiment or study, you are almost always testing the conjunct of: T (the core theory) + At (peripheral postulates) + Cp (ceteris paribus) + Ai (auxiliary theories of whatever instrumentation you are using) + Cm (general conditions of the experiment / study / data collection).

          The consequence the above is that if a subsequent study produces an embarrassing result – the prediction made by the above conjunct does not turn out to be true after the dust settles – it can be a consequence of the falsity of any member of the conjunct. In other words, the core of the theory need not be wrong and a negative result or even a series of negative results does not necessarily demonstrate that to be the case. Meehl refers to this as the Lakatosian retreat.

          Of course, the precondition for clinging to a theory in the above scenario is that it has previously been corroborated by making risky predictions in other settings. If all the theory has ever enabled you to do is explain some piddling phenomenon (such as a sign of an effect such as whether or not mentorship for kids at a young age aids their economic output later in life), and nothing else, then sure, failed replications probably mean the theory is quite poor. However, if you have even made one risky prediction using the theory (risky in the sense that absent the theory the prediction would be highly improbable), then a failed replication or a series of such replications need not dissuade you from believing that the theory has some verisimilitude.

          Admittedly, replications are nice to have though!

          • This is very clear way to put it. Thanks for commenting on this.

            I would add that it is important to lay out in advance how much uncertainty surrounds the risky prediction, in the model implied by the theory. Often the theory has so much freedom that its predictions allow pretty much all outcomes.

            Also, the weight on Cm in this equation may be a bit too high for comfort.

          • jim says:

            “Focusing strictly on replicating a significant result misses the mark. ”

            Isn’t the point of science is to create an experiment that can be *replicated* to test a hypothesis? If one is using statistical significance to test a hypothesis, then the experiment should replicate by the same method. If statistical significance isn’t the appropriate method to analyze the experiment, then don’t use it.

            If At + Cp + Ai + Cm are causing the experiment not to replicate then it’s time to get out the drawing board again and come up with an experiment that actually tests the hypothesis.

Leave a Reply