Archive of posts filed under the Statistical computing category.

## epidemia: An R package for Bayesian epidemiological modeling

Jamie Scott writes: I am a PhD candidate at Imperial College, and have been working with colleagues here to write an R package for fitting Bayesian epidemiological models using Stan. We thought this might interest readers of your blog, as it is based on work previously featured there. The package is similar in spirit to […]

## More on absolute error vs. relative error in Monte Carlo methods

This came up again in a discussion from someone asking if we can use Stan to evaluate arbitrary integrals. The integral I was given was the following: $latex \displaystyle \alpha = \int_{y \in R\textrm{-ball}} \textrm{multinormal}(y \mid 0, \sigma^2 \textrm{I}) \, \textrm{d}y$ where the $latex R$-ball is assumed to be in $latex D$ dimensions so that […]

## Regression and Other Stories translated into Python!

Ravin Kumar writes in with some great news: As readers of this blog likely know Andrew Gelman, Jennifer Hill, and Aki Vehtari have recently published a new book, Regression and Other Stories. What readers likely don’t know is that there is an active effort to translate the code examples written in R and the rstanarm […]

## StanCon 2020 is on Thursday!

For all that registered for the conference, THANK YOU! We, the organizers, are truly moved by how global and inclusive the community has become. We are currently at 230 registrants from 33 countries. And 25 scholarships were provided to people in 12 countries. Please join us. Registration is \$50. We have scholarships still available (more […]

## Somethings do not seem to spread easily – the role of simulation in statistical practice and perhaps theory.

Unlike Covid19, somethings don’t seem to spread easily and the role of simulation in statistical practice (and perhaps theory) may well be one of those. In a recent comment, Andrew provided a link to an interview about the new book Regression and Other Stories by Aki Vehtari, Andrew Gelman, and Jennifer Hill. An interview that covered […]

## The typical set and its relevance to Bayesian computation

[Note: The technical discussion w.r.t. Stan is continuing on the Stan forums.] tl;dr The typical set (at some level of coverage) is the set of parameter values for which the log density (the target function) is close to its expected value. As has been much discussed, this is not the same as the posterior mode. […]

## Ugly code is buggy code

People have been (correctly) mocking my 1990s-style code. They’re right to mock! My coding style works for me, kinda, but it does have problems. Here’s an example from a little project I’m working on right now. I was motivated to write this partly as a followup to Bob’s post yesterday about coding practices. I fit […]

## Drunk-under-the-lamppost testing

Edit: Glancing over this again, it struck me that the title may be interpreted as being mean. Sorry about that. It wasn’t my intent. I was trying to be constructive and I really like that analogy. The original post is mostly reasonable other than on this one point that I thought was important to call […]

## Shortest posterior intervals

By default we use central posterior intervals. For example, the central 95% interval is the (2.5%, 97.5%) quantiles. But sometimes the central interval doesn’t seem right. This came up recently with a coronavirus testing example, where the posterior distribution for the parameter of interest was asymmetric so that the central interval is not such a […]

## Validating Bayesian model comparison using fake data

A neuroscience graduate student named James writes in with a question regarding validating Bayesian model comparison using synthetic data: I [James] perform an experiment and collect real data. I want to determine which of 2 candidate models best accounts for the data. I perform (approximate) Bayesian model comparison (e.g., using BIC – not ideal I […]

## Stacking for Non-mixing Bayesian Computations: The Curse and Blessing of Multimodal Posteriors

Yuling, Aki, and I write: When working with multimodal Bayesian posterior distributions, Markov chain Monte Carlo (MCMC) algorithms can have difficulty moving between modes, and default variational or mode-based approximate inferences will understate posterior uncertainty. And, even if the most important modes can be found, it is difficult to evaluate their relative weights in the […]

## Aki’s talk about reference models in model selection in Laplace’s demon series

I (Aki) talk about reference models in model selection in Laplace’s demon series 24 June 15UTC (Finland 18, Paris 17, New York 11). See the seminar series website for a registration link, the schedule for other talks, and the list of the recorded talks. The short summary: 1) Why a bigger model helps inference for […]

## “Laplace’s Demon: A Seminar Series about Bayesian Machine Learning at Scale” and my answers to their questions

Here’s the description of the online seminar series: Machine learning is changing the world we live in at a break neck pace. From image recognition and generation, to the deployment of recommender systems, it seems to be breaking new ground constantly and influencing almost every aspect of our lives. In ths seminar series we ask […]

## Challenges to the Reproducibility of Machine Learning Models in Health Care; also a brief discussion about not overrating randomized clinical trials

Mark Tuttle pointed me to this article by Andrew Beam, Arjun Manrai, and Marzyeh Ghassemi, Challenges to the Reproducibility of Machine Learning Models in Health Care, which appeared in the Journal of the American Medical Association. Beam et al. write: Reproducibility has been an important and intensely debated topic in science and medicine for the […]

## Faster than ever before: Hamiltonian Monte Carlo using an adjoint-differentiated Laplace approximation

Charles Margossian, Aki Vehtari, Daniel Simpson, Raj Agrawal write: Gaussian latent variable models are a key class of Bayesian hierarchical models with applications in many fields. Performing Bayesian inference on such models can be challenging as Markov chain Monte Carlo algorithms struggle with the geometry of the resulting posterior distribution and can be prohibitively slow. […]

## Super-duper online matrix derivative calculator vs. the matrix normal (for Stan)

I’m implementing the matrix normal distribution for Stan, which provides a multivariate density for a matrix with covariance factored into row and column covariances. The motivation A new colleague of mine at Flatiron’s Center for Comp Bio, Jamie Morton, is using the matrix normal to model the ocean biome. A few years ago, folks in […]

## Statistical Workflow and the Fractal Nature of Scientific Revolutions (my talk this Wed at the Santa Fe Institute)

Wed 3 June 2020 at 12:15pm U.S. Mountain time: Statistical Workflow and the Fractal Nature of Scientific Revolutions How would an A.I. do statistics? Fitting a model is the easy part. The other steps of workflow—model building, checking, and revision—are not so clearly algorithmic. It could be fruitful to simultaneously think about automated inference and […]

## Stan pedantic mode

This used to be on the Stan wiki but that page got reorganized so I’m putting it here. Blog is not as good as wiki for this purpose: you can add comments but you can’t edit. But better blog than nothing, so here it is. I wrote this a couple years ago and it was […]

## It’s “a single arena-based heap allocation” . . . whatever that is!

After getting 80 zillion comments on that last post with all that political content, I wanted to share something that’s purely technical. It’s something Bob Carpenter wrote in a conversation regarding implementing algorithms in Stan: One thing we are doing is having the matrix library return more expression templates rather than copying on return as […]

## Laplace’s Demon: A Seminar Series about Bayesian Machine Learning at Scale

David Rohde points us to this new seminar series that has the following description: Machine learning is changing the world we live in at a break neck pace. From image recognition and generation, to the deployment of recommender systems, it seems to be breaking new ground constantly and influencing almost every aspect of our lives. […]