Skip to content

The multiverse in action!

In a recent paper, “Degrees of Freedom in Planning, Running, Analyzing, and Reporting Psychological Studies: A Checklist to Avoid p-Hacking,” Jelte Wicherts, Coosje Veldkamp, Hilde Augusteijn, Marjan Bakker, Robbie van Aert, and Marcel van Assen write:

The designing, collecting, analyzing, and reporting of psychological studies entail many choices that are often arbitrary. The opportunistic use of these so-called researcher degrees of freedom aimed at obtaining statistically significant results is problematic because it enhances the chances of false positive results and may inflate effect size estimates. In this review article, we present an extensive list of 34 degrees of freedom that researchers have in formulating hypotheses, and in designing, running, analyzing, and reporting of psychological research. The list can be used in research methods education, and as a checklist to assess the quality of preregistrations and to determine the potential for bias due to (arbitrary) choices in unregistered studies.

34 different degrees of freedom! That’s a real multiverse; it can get you to a p-value of 2^-34 (that is, 0.0000000001) in no time. And it’s worse than that: Wicherts et al. write, “We created a list of 34 researcher DFs, but our list is in no way exhaustive for the many choices that need be made during the different phases of a psychological experiment.”

Preregistration is fine, but let me remind all readers, though, that the most important steps in any study are valid and reliable measurements and, where possible, large and stable effect sizes. All the preregistration in the world won’t save you if your measurements are not serious or if you’re studying an effect that is tiny or highly variable. Remember the kangaroo problem.

As I wrote here, “I worry that the push toward various desirable procedural goals can make people neglect the fundamental scientific and statistical problems that, ultimately, have driven the replication crisis.”


  1. Chris Chambers says:

    Hi Andrew – I have always found it a slightly odd argument that drawing attention to procedural aspects of science, such as preregistration, could somehow draw vital attention away from issues of measurement or theory or reliability. The argument – which I have heard many times from critics of preregistration – seems premised on the assumption that procedural and more fundamental scientific challenges are orthogonal, and that attention to each drawn from some finite resource pool in a zero sum way. But from editing Registered Reports, my strong impression is that the reverse is true: when authors invest the time and intellectual energy into carefully planning their study design and analysis plan, and furthermore adjust them based on statistical and specialist feedback before they begin, they are much more likely to also consider deeper issues of theory and measurement. The investment is simply higher across the board.

    I would say that preregistration is far more than “fine” as a solution to the degrees of freedom problem. It is close to being essential; or at least, the onus falls on the critic to identify a better way of controlling bias, or to present evidence that preregistration somehow hampers other critical aspects of science (rather than enhancing them).

    • It would help greatly if there are studies that could be characterized as stellar. Most of the expert comments suggest that hardly any study would would be ranked highly against established criteria.

    • Andrew says:


      I think preregistration is fine; see here for my thoughts on the matter.

      Regarding the last paragraph of my post above: This has come up all the time. For example, I criticize some study based on it being so noisy as to be hopeless, power = 0.06, etc. And sometimes people will ask if I think there should be a preregistered replication. I usually respond, No, I don’t think it’s worth the time. If someone wants to do such a replication, fine, I won’t stop them. But my recommendation is just about always to rethink the design and measurement.

    • Keith O'Rourke says:


      Part of the challenge may be how non-experts learn from experts.

      Some argue they identify a community of experts, assess if that community functioned well enough to reach a consensus and chose to adopt the consensus without grasping the why of it.

      If so, they are just picking up the procedures rather than grasping the challenges they address and how they attempt to address them. Then they play by these rules thinking that keeps them safe from criticism from others or even being frustrated by reality.

      There are many things non-experts successfully address that way (fortunately as most of us are non-experts in many things) but I believe – doing research ain’t one of them.

Leave a Reply