Skip to content

Bayesian analysis of Santa Clara study: Run it yourself in Google Collab, play around with the model, etc!

The other day we posted some Stan models of coronavirus infection rate from the Stanford study in Santa Clara county.

The Bayesian setup worked well because it allowed us to directly incorporate uncertainty in the specificity, sensitivity, and underlying infection rate.

Mitzi Morris put all this in a Google Collab notebook so you can run it online. here.


To run it in Python, go here:

Click on Open in Collab and log in using your gmail, and you’ll see this:

Then just run the code online, one paragraph at a time, by clicking in the open brackets [ ] at the top left of each paragraph, going down the page.

The first time you click, you’ll get this annoying warning message:

Just click on Run Anyway and it will work.

The first few paragraphs load in CmdStan and upload the model and data. After that the fun begins and you can run the models.


Same thing, just start here.

Altering the Stan programs online

There was a way on these Collab pages to go in and alter the code and then re-run the model, which was really helpful in understanding what was going on, as it allowed you to play around with the Stan code. I can’t figure out how to do this with the above pages, but for now you can find everything in Github.

P.S. Loki (pictured above) wants to push this commit, but first he wants you to do some unit tests and clean his litter box.


  1. Mitzi says:

    hi Andrew,

    I added a section to the CmdStanPy notebook which shows you how to upload your own files to the Colab notebook so that you can play around with the programs and the data.

    Just a reminder, Colab gives you a virtual machine that runs for at most 12 hours, so these notebooks are useful for playing around, but they’re far from compute clusters.

  2. Mitzi says:

    Follow-up to question “how can I play around with model and data files?”

    The Colab notebook interface has a “files” utility – activated by clicking on the small folder icon on the left hand side of the main window. It will let you open and edit files with known extensions – unfortunately, not files with extension “.stan”. As a workaround, you can rename “.stan” files to “.txt”, edit them, and then rename them back. So, easy to edit the data files, extremely clunky if you want to play around with program files. Still looking for a more elegant solution – suggestions welcome.

  3. btnaughton says:

    By coincidence, I did the same thing, but lighter weight, using pystan. Pystan is pre-installed in colab!

    • Mitzi says:

      I think CmdStanPy is also pre-installed on Colab because it’s used by FBprophet, but it uses version 0.4.0 and CmdStanPy is inching towards version 1.0.

      CmdStanPy provides faster compilation and Stan 2.23 goodness.

  4. Mitzi says:

    There’s a case study that goes into more detail on how to put R Jupyter notebooks up on Colab:

    My goal in writing these is to help the folks who would normally be doing their teaching in R, and are now faced with the challenge of doing this online, perhaps without a lot of institutional bandwidth or IT support, in which case Colab notebooks seem like a good way to go.

  5. In a similar vein, I came up with an algorithm that directly computed the prevalence posterior for an imperfect test. You can run it at

    (there’s a preprint describing the math and the algorithm:

    It’s essentially doing the same thing as the R version, but using pure Monte Carlo which converges much faster. Fast enough that the Monte Carlo integration runs in the browser. I am hoping research groups can use this to get quick estimates of prevalance on-the-fly, without needing to be well versed enough in python, R, Bayes stats, etc. There’s several examples linked at the site – all the various versions of the Santa Clara study and one from Kobe Japan.

    The implementation is on github:

Leave a Reply