Skip to content

Dan’s Paper Corner: Can we model scientific discovery and what can we learn from the process?

Jesus taken serious by the many
Jesus taken joyous by a few
Jazz police are paid by J. Paul Getty
Jazzers paid by J. Paul Getty II

Leonard Cohen

So I’m trying a new thing because like no one is really desperate for another five thousand word essay about whatever happens to be on my mind on a Thursday night in a hotel room in Glasgow. Also, because there’s a pile of really interesting papers that I think it would be good and fun for people to read and think about.

And because if you’re going to do something, you should jump right into an important topic, may I present for your careful consideration Berna Devezer, Luis G. Nardin, Bert Baumgaertner,  and Erkan Ozge Buzbas’ fabulous paper Scientific discovery in a model-centric framework: Reproducibility, innovation, and epistemic diversity. (If we’re going to talk about scientific discovery and reproducibility, you better believe I’m going to crack out the funny Leonard Cohen.)

I am kinda lazy so I’m just going to pull out the last paragraph of the paper as a teaser. But you should read the whole thing. You can also watch Berna give an excellent seminar on the topic. Regardless, here is that final paragraph.

Our research also raises questions with regard to reproducibility of scientific results. If reproducibility can be uncorrelated with other possibly desirable properties of scientific discovery, optimizing the scientific process for reproducibility might present trade-offs against other desirable properties. How should scientists resolve such trade-offs? What outcomes should scientists aim for to facilitate an efficient and proficient scientific process? We leave such considerations for future work.

I like this paper for a pile of reasons. A big one is that a lot of discussion that I have seen around scientific progress is based around personal opinions (some I agree with, some I don’t) and proposed specific interventions. Both of these things are good, but they are not the only tools we have. This paper proposes a mechanistic model of discovery encoding some specific assumptions and investigates the consequences. Broadly speaking, that is a good thing to do.

Some random observations:

  • The paper points out that the background information available for a replicated experiment is explicitly different from the background information from the original experiment in that we usually know the outcome of the original. That the set of replications is not a random sample of all experiments is very relevant when making statements like x% of experiments in social psychology don’t replicate.
  • One of the key points of the paper is that reproducibility is not the only scientifically relevant properties of an experiment. Work that doesn’t reproduce may well lead to a “truth” discovery (or at least a phenomenological model that is correct within the precision of reasonable experiments) faster than work that does reproduce. An extremely nerdy analogy would be that reproducibility will be like a random walk towards the truth, while work that doesn’t reproduce can help shoot closer to the truth.
  • Critically, proposals that focus on reproducibility of single experiments (rather than stability of experimental arcs) will most likely be inefficient. (Yes, that includes preregistration, the current Jesus taken serious by the many)
  • This is a mathematical model so everything is “too simple”, but that doesn’t mean it’s not massively informative. Some possible extensions would be to try to model more explicitly the negative effect of persistent-but-wrong flashy theories. Also the effect of incentives. Also the effect of QRPs, HARKing, Hacking, Forking, and other deviations from The Way The Truth and The Life.

I’ll close out with a structurally but not actually related post from much-missed website The Toast: Permission To Play Devil’s Advocate Denied by the exceptional Daniel Mallory Ortberg (read his books. They’re excellent!)

Our records indicate that you have requested to play devil’s advocate for either “just a second here” or “just a minute here” over fourteen times in the last financial quarter. While we appreciate your enthusiasm, priority must be given to those who have not yet played the position. We would like to commend you for the excellent work you have done in the past year arguing for positions you have no real interest or stake in promoting, including:

  • Affirmative Action: Who’s the Real Minority Here?
  • Maybe Men Score Better In Math For A Reason
  • Well, They Don’t Have To Live Here
  • I Think You’re Taking This Too Personally
  • Would It Be So Bad If They Did Die?
  • If You Could Just Try To See It Objectively, Like Me



  1. Andrew says:


    Thanks for the post.

    Regarding “x% of experiments in social psychology don’t replicate”: A major concern here, even beyond what you mentioned, is that it’s generally a mistake to think of “replicate” as a true/false statement, in part because whether a result is statistically significant (the usual standard for “successful replication”) is itself so noisy, in part because it’s typically a mistake to think of an experiment as having just one result. The way we handled this in the replication study we did of one of our earlier papers was to explicitly state that our results were not a single. We wrote:

    We began this study with no particular concern about [our earlier published] results, but a bit of replication would, we believe, give us a better sense of uncertainty about the details. In addition, one can always be worried about opportunistic interpretations of statistical results. . . .


    [The analysis in our original paper was] exploratory (and thus a replication cannot be simply deemed successful or unsuccessful based on the statistical significance of some particular comparison).

    Just in general, we shouldn’t take off our statistical-common-sense hat, just because we’re talking about replication. A replication may be preregistered, but being preregistered doesn’t mean turning yourself into a Neyman-style Stepford wife of statistical methods. Just as, we can do randomized experiments without restricting our analyses to classical t-tests and Anovas. And we can do random sampling without restricting our analyses to narrow classical sampling-theory procedures. Preregistration, random assignment, and random sampling are great design ideas which go well with sophisticated and careful analyses as well as with off-the-shelf classical procedures.

    (The above paragraphs do not represent any disagreement with your post; rather, take them as elaborations of your points.)

    • Dan Simpson says:

      Exactly! I was trying very hard not to write a gazillion words, but one of the things that I almost wrote about is that we can’t just try to reproduce “significant” results, because any number of things that don’t pass that threshold under a particular experiment with a particular design will pass the threshold under a different replication or a different design.

      Broadly speaking, I think that exact replications of positive results (or replications with more data) are not massively useful. At their very best they are a tiny corner of the space of interesting things. And I think this paper bears this out! (Also it’s not done under a hypothesis testing framework, which is always nice).

      Like chairs, tables, and governments, the real object of interest is stability.

      • Yup what’s actually common (or common in distribution AKA random effects/parameters) between all relevant studies so far.

        My sense is that commonness is assessed by the likelihoods with the prior set to be common for all studies. Mike Evans disagrees and thinks the different priors that may have been used should also be assessed for commonness and appropriately pooled.

  2. Joshua Pritikin says:

    I just want to second that Berna’s talk is great. If you don’t have time to look at the paper, it is possible to a decent summary only from the talk’s audio.

Leave a Reply