Skip to content

This one’s important: Bayesian workflow for disease transmission modeling in Stan

Léo Grinsztajn, Elizaveta Semenova, Charles Margossian, and Julien Riou write:

This tutorial shows how to build, fit, and criticize disease transmission models in Stan, and should be useful to researchers interested in modeling the COVID-19 outbreak and doing Bayesian inference. Bayesian modeling provides a principled way to quantify uncertainty and incorporate prior knowledge into the model. What is more, Stan’s main inference engine, Hamiltonian Monte Carlo sampling, is amiable to diagnostics, which means we can verify whether our inference is reliable. Stan is an expressive probabilistic programing language that abstracts the inference and allows users to focus on the modeling. The resulting code is readable and easily extensible, which makes the modeler’s work more transparent and flexible. In this tutorial, we demonstrate with a simple Susceptible-Infected-Recovered (SIR) model how to formulate, fit, and diagnose a compartmental model in Stan. We also introduce more advanced topics which can help practitioners fit sophisticated models; notably, how to use simulations to probe our model and our priors, and computational techniques to scale ODE-based models.

Mathematical models of epidemic spread are affecting policy, and rightly so.

It’s fine to say that policy should be based only on data, not on models, but models are necessary to interpret data. To paraphrase Bill James, the alternative to a good model is not “no model,” it’s a bad model. Tell it to the A/Chairman.

But some influential models have had problems. And there are different models to choose from.

One useful step is to write these models in a common language. We have such a language: it’s called mathematics, and it’s really useful. But there are lots of steps between the mathematical model and the fit to data, and that’s where things often break down. The mathematical models have parameters, sometimes the parameters need constraints, and sometimes when you try to read a paper you get lost in the details of the fitting.

Bayesian inference and Stan are not the only ways of fitting SIR models, but they give us a common language, and they also give flexibility: Once you’ve fit a model, it’s not hard to expand it. That’s important, because model expansion is often a good way to react to criticism.

The above-linked paper by Léo, Liza, Charles, and Julien should be useful for three audiences:

– People who want to fit SIR models and their extensions, and would like to focus on the science and the data analysis rather than have computing and programming be a limiting factor.

– People who have already fit SIR models and their extensions but not in Stan, and who’d like to be able to communicate their models more easily and who’d like to be able to extend their models, adding hierarchical components, etc.

– People who are unfamiliar with these models and would like to learn about them from scratch.


  1. Kyle Cranmer says:

    I very much agree that probabilistic programming should be part of the overall strategy for developing and fitting models for disease transition. I thought I would point to some related efforts on probabilistic programming. This includes recent work to interface probabilistic programming inference engines with already existing scientific simulator code bases. There are advantages and disadvantages to porting code to a dedicated system like Stan.

    Simulation-Based Inference for Global Health Decisions
    Christian Schroeder de Witt, Bradley Gram-Hansen, Nantas Nardelli, Andrew Gambardella, Rob Zinkov, Puneet Dokania, N. Siddharth, Ana Belen Espinosa-Gonzalez, Ara Darzi, Philip Torr, Atılım Güneş Baydin

    Planning as Inference in Epidemiological Models
    Frank Wood, Andrew Warrington, Saeid Naderiparizi, Christian Weilbach, Vaden Masrani, William Harvey, Adam Scibior, Boyan Beronov, Ali Nasseri

    Hijacking Malaria Simulators with Probabilistic Programming
    Bradley Gram-Hansen, Christian Schröder de Witt, Tom Rainforth, Philip H.S. Torr, Yee Whye Teh, Atılım Güneş Baydin

    PPX protocol

  2. jim says:

    “the alternative to a good model is not “no model,” it’s a bad model. ”

    Yet the fact that you’ve written some equations or run some code and spit out some plots doesn’t make the model defacto better than a conceptual or eyeball model. Despite all the bits that have been processed on transmission models during this epidemic, we’re still at square one.

    “Mathematical models of epidemic spread are affecting policy, and rightly so. ”

    really? as far as I know there’s no model that has produced a consistently accurate forecast. How is that any better than a guess? It’s not and in fact it could be a lot worse. Models should not be use for policy unless they are verified to a specified level of accuracy.

    • Andrew says:


      Even what you’re calling “eyeball” uses more math than you may realize. For example, a few months ago, people were looking at graphs and seeing the rates doubling every few days. A statement like, “If we don’t do something, it will keep doubling and then the hospitals will be overwhelmed,” is itself a mathematical model, in this case exponential growth, which in turn can be viewed as a special case of a SIR model. Or when people talk about herd immunity, that’s another mathematical model. I don’t think there are any alternatives to modeling here.

      • trystero says:

        Not to mention that some models have performed quite well; they just lack much news coverage. E.g., Youyang Gu’s at

      • confused says:

        I think it’s the difference between “model” as a way of understanding a phenomenon vs. “model” as a particular set of equations or computer program.

        It is possible that you could have gotten results back on, say, March 1 as good as the university-published models were then* by looking at the results of past pandemics, saying “well, this looks to be less deadly than Spanish flu but somewhat deadlier than 1957 and 1968”, and adjusting based on that, rather than using a SEIR model or whatever. But that’s a “model” in the broader sense as well.

        *some current models seem to be doing quite a bit better, but I think those are ones that post-date the realization that the less-dense parts of the US weren’t blowing up nearly as badly as had been expected in mid-March.

      • jim says:

        “Even what you’re calling “eyeball” uses more math than you may realize.”

        It doesn’t use more than I realize. :) I’m humble about my math skills but that doesn’t mean I don’t have any. I’m humble about math because I’ve noticed throughout my life that people that know math but don’t know anything else frequently screw things up badly. That happens because they don’t know that math is just math. It has no necessary connection to the real world, no matter how much theory it’s based on.

        That’s why the eyeball model is better than the computer mode in situations where so little is known. They eyeball model is firmly routed in reality. The computer model is a mash-up of ideas that may or may not be true, expressed via mathematics, which may or may not represent the ideas faithfully, and written in code which may ore may not reproduce the mathematics correctly, and fed with data that may or may not be biased in some way. The opportunities for unrecognized systematic errors are piling up.

        • Paolo Inglese says:


          That’s why papers are not just written in Math but also in plain English.
          Bayesian statistics gives you the advantage to force you make explicit justification for the choice of the model.
          Then, we have the quantitative analysis which is done in Math. So people well trained in Math can judge the model based on its implication about the reality and what authors say in English.
          Even with no model your brain is using the same principles of pattern recognition to make an educated guess.

    • Evan says:

      “really? as far as I know there’s no model that has produced a consistently accurate forecast. How is that any better than a guess?”

      You came to the right place to have that question answered. It turns out a model can make predictions that are somewhere between guessing and 100% accurate. Assuming you were referring to models forecasting total number of deaths, guesses could have been anywhere from 0 to roughly 7 billion people living on Earth. Without any prior knowledge or model for the death rate, the expectation for the “guessing” model should be roughly 3.5B deaths. I’m pretty sure most models that I have seen have been more accurate than that.

      Bottom line if your “guess” is better than most models, than it likely is a model of some sort. Sorry, there’s no escaping models!

      • jim says:

        The “everything-you-do-or-think-is-a-model” claim is patently obvious, so spare me the strawman criticisms.

        The claim that “everyone died” and “no one died” is a starting point for a “model” is also ridiculous. The claim has no relevance to the argument about modelling but it is a statement about how some scientists dramatically over-rate their own knowledge and therefore over-rate the value they bring to any issue.

  3. Andrew Jaffe says:

    > To paraphrase Bill James, the alternative to a good model is not “no model,” it’s a bad model.

    Hey, Andrew, I know this is from a while back, but what’s the original Bill James quote you’re referring to?

Leave a Reply