Skip to content
Archive of posts filed under the Statistical computing category.

We need better default plots for regression.

Robin Lee writes: To check for linearity and homoscedasticity, we are taught to plot residuals against y fitted value in many statistics classes. However, plotting residuals against y fitted value has always been a confusing practice that I know that I should use but can’t quite explain why. It is not until this week I […]

New Within-Chain Parallelisation in Stan 2.23: This One‘s Easy for Everyone!

What’s new? The new and shiny reduce_sum facility released with Stan 2.23 is far more user-friendly and makes it easier to scale Stan programs with more CPU cores than it was before. While Stan is awesome for writing models, as the size of the data or complexity of the model increases it can become impractical […]

Bayesian analysis of Santa Clara study: Run it yourself in Google Collab, play around with the model, etc!

The other day we posted some Stan models of coronavirus infection rate from the Stanford study in Santa Clara county. The Bayesian setup worked well because it allowed us to directly incorporate uncertainty in the specificity, sensitivity, and underlying infection rate. Mitzi Morris put all this in a Google Collab notebook so you can run […]

MRP with R and Stan; MRP with Python and Tensorflow

Lauren and Jonah wrote this case study which shows how to do Mister P in R using Stan. It’s a great case study: it’s not just the code for setting up and fitting the multilevel model, it also discusses the poststratification data, graphical exploration of the inferences, and alternative implementations of the model. Adam Haber […]

Webinar on approximate Bayesian computation

X points us to this online seminar series which is starting this Thursday! Some speakers and titles of talks are listed. I just wish I could click on the titles and see the abstracts and papers! The seminar is at the University of Warwick in England, which is not so convenient—I seem to recall that […]

Conference on Mister P online tomorrow and Saturday, 3-4 Apr 2020

We have a conference on multilevel regression and poststratification (MRP) this Friday and Saturday, organized by Lauren Kennedy, Yajuan Si, and me. The conference was originally scheduled to be at Columbia but now it is online. Here is the information. If you want to join the conference, you must register for it ahead of time; […]

More coronavirus research: Using Stan to fit differential equation models in epidemiology

Seth Flaxman and others at Imperial College London are using Stan to model coronavirus progression; see here (and I’ve heard they plan to fix the horrible graphs!) and this Github page. They also pointed us to this article from December 2019, Contemporary statistical inference for infectious disease models using Stan, by Anastasia Chatzilena et al. […]

Fit nonlinear regressions in R using stan_nlmer

This comment from Ben reminded me that lots of people are running nonlinear regressions using least squares and other unstable methods of point estimation. You can do better, people! Try stan_nlmer, which fits nonlinear models and also allows parameters to vary by groups. I think people have the sense that maximum likelihood or least squares […]

Estimates of the severity of COVID-19 disease: another Bayesian model with poststratification

Following up on our discussions here and here of poststratified models of coronavirus risk, Jon Zelner writes: Here’s a paper [by Robert Verity et al.] that I think shows what could be done with an MRP approach. From the abstract: We used individual-case data from mainland China and cases detected outside mainland China to estimate […]

Prior predictive, posterior predictive, and cross-validation as graphical models

I just wrote up a bunch of chapters for the Stan user’s guide on prior predictive checks, posterior predictive checks, cross-validation, decision analysis, poststratification (with the obligatory multilevel regression up front), and even bootstrap (which has a surprisingly elegant formulation in Stan now that we have RNGs in trnasformed data). Andrew then urged me to […]

“A Path Forward for Stan,” from Sean Talts, former director of Stan’s Technical Working Group

Sean Talts was talking about his ideas of how Stan should move forward, given anticipated developments in the probabilistic programming infrastructure. I encouraged his to write up his ideas in some sort of manifesto form, and he did so. Here it is. The title is “A Path Forward for Stan,” and it begins: Stan has […]

100 Things to Know, from Lane Kenworthy

The sociologist has this great post: Here are a hundred things worth knowing about our world and about the United States. Because a picture is worth quite a few words and providing information in graphical form reduces misperceptions, I [Kenworthy] present each of them via a chart, with some accompanying text. This is great stuff. […]

Naming conventions for variables, functions, etc.

The golden rule of code layout is that code should be written to be readable. And that means readable by others, including you in the future. Three principles of naming follow: 1. Names should mean something. 2. Names should be as short as possible. 3. Use your judgement to balance (1) and (2). The third […]

Computer-generated writing that looks real; real writing that looks computer-generated

You know that thing where you stare at a word for long enough, it starts to just look weird? The letters start to separate from each other, and you become hyper-aware of the arbitrariness of associating a concept with some specific combination of sounds? There’s gotta be a word for this. Anyway, I was reminded […]

Is data science a discipline?

Jeannette Wing, director of the Columbia Data Science Institute, sent along this link to this featured story (their phrase) on their web site. Is data science a discipline? Data science is a field of study: one can get a degree in data science, get a job as a data scientist, and get funded to do […]

What can we do with complex numbers in Stan?

I’m wrapping up support for complex number types in the Stan math library. Now I’m wondering what we can do with complex numbers in statistical models. Functions operating in the complex domain The initial plan is to add some matrix functions that use complex numbers internally: fast fourier transforms asymmetric eigendecomposition Schur decomposition The eigendecomposition […]

Deep learning workflow

Ido Rosen points us to this interesting and detailed post by Andrej Karpathy, “A Recipe for Training Neural Networks.” It reminds me a lot of various things that Bob Carpenter has said regarding the way that some fitting algorithms are often oversold because the presenters don’t explain the tuning that was required to get good […]

Rank-normalization, folding, and localization: An improved R-hat for assessing convergence of MCMC

With Aki, Dan, Bob, and Paul: Markov chain Monte Carlo is a key computational tool in Bayesian statistics, but it can be challenging to monitor the convergence of an iterative stochastic algorithm. In this paper we show that the convergence diagnostic R-hat of Gelman and Rubin (1992) has serious flaws. R-hat will fail to correctly […]

An article in a statistics or medical journal, “Using Simulations to Convince People of the Importance of Random Variation When Interpreting Statistics.”

Andy Stein writes: On one of my projects, I had a plot like the one above of drug concentration vs response, where we divided the patients into 4 groups. I look at the data below and think “wow, these are some wide confidence intervals and random looking data, let’s not spend too much time more […]

A normalizing flow by any other name

Another week, another nice survey paper from Google. This time: Papamakarios, G., Nalisnick, E., Rezende, D.J., Mohamed, S. and Lakshminarayanan, B., 2019. Normalizing Flows for Probabilistic Modeling and Inference. arXiv 1912.02762. What’s a normalizing flow? A normalizing flow is a change of variables. Just like you learned way back in calculus and linear algebra. Normalizing […]