Stephen Kissler and Yonatan Grad launched a Shiny app, Effective SARS-CoV-2 test sensitivity, to help you answer the question, How many infectious people are likely to show up to an event, given a screening test administered n days prior to the event? Here’s a screenshot. The app is based on some modeling they did with […]

## How to describe Pfizer’s beta(0.7, 1) prior on vaccine effect?

Now it’s time for some statistical semantics. Specifically, how do we describe the prior that Pfizer is using for their COVID-19 study? Here’s a link to the report. A PHASE 1/2/3, PLACEBO-CONTROLLED, RANDOMIZED, OBSERVER-BLIND, DOSE-FINDING STUDY TO EVALUATE THE SAFETY, TOLERABILITY, IMMUNOGENICITY, AND EFFICACY OF SARS-COV-2 RNA VACCINE CANDIDATES AGAINST COVID-19 IN HEALTHY INDIVIDUALS Way […]

## UX issues around voting

While Andrew’s worrying about how to measure calibration and sharpness on small N probabilistic predictions, let’s consider some computer and cognitive science issues around voting. How well do elections measure individual voter intent? What is the probability that a voter who tries to vote has their intended votes across the ballot registered? Spoiler alert. It’s […]

*The Economist* not hedging the predictions

Andrew’s hedge that he’s predicting vote intent and not accounting for any voting irregularities either never made it to the editorial staff at The Economist or they chose to ignore it. Their headline still reports that they’re predicting the electoral college, not voter intent: Predicting voter intent is a largely academic exercise in that all […]

## Hiring at all levels at Flatiron Institute’s Center for Computational Mathematics

We’re hiring at all levels at my new academic home, the Center for Computational Mathematics (CCM) at the Flatiron Insitute in New York City. We’re going to start reviewing applications January 1, 2021. A lot of hiring We’re hoping to hire many people for each of the job ads. The plan is to grow CCM […]

## More on absolute error vs. relative error in Monte Carlo methods

This came up again in a discussion from someone asking if we can use Stan to evaluate arbitrary integrals. The integral I was given was the following: $latex \displaystyle \alpha = \int_{y \in R\textrm{-ball}} \textrm{multinormal}(y \mid 0, \sigma^2 \textrm{I}) \, \textrm{d}y$ where the $latex R$-ball is assumed to be in $latex D$ dimensions so that […]

## Probabilities for action and resistance in Blades in the Dark

Later this week, I’m going to be GM-ing my first session of Blades in the Dark, a role-playing game designed by John Harper. We’ve already assembled a crew of scoundrels in Session 0 and set the first score. Unlike most of the other games I’ve run, I’ve never played Blades in the Dark, I’ve only […]

## Drunk-under-the-lamppost testing

Edit: Glancing over this again, it struck me that the title may be interpreted as being mean. Sorry about that. It wasn’t my intent. I was trying to be constructive and I really like that analogy. The original post is mostly reasonable other than on this one point that I thought was important to call […]

## Super-duper online matrix derivative calculator vs. the matrix normal (for Stan)

I’m implementing the matrix normal distribution for Stan, which provides a multivariate density for a matrix with covariance factored into row and column covariances. The motivation A new colleague of mine at Flatiron’s Center for Comp Bio, Jamie Morton, is using the matrix normal to model the ocean biome. A few years ago, folks in […]

## Make Andrew happy with one simple ggplot trick

By default, ggplot expands the space above and below the x-axis (and to the left and right of the y-axis). Andrew has made it pretty clear that he thinks the x axis should be drawn at y = 0. To remove the extra space around the axes when you have continuous (not discrete or log […]

## Prior predictive, posterior predictive, and cross-validation as graphical models

I just wrote up a bunch of chapters for the Stan user’s guide on prior predictive checks, posterior predictive checks, cross-validation, decision analysis, poststratification (with the obligatory multilevel regression up front), and even bootstrap (which has a surprisingly elegant formulation in Stan now that we have RNGs in trnasformed data). Andrew then urged me to […]

## Naming conventions for variables, functions, etc.

The golden rule of code layout is that code should be written to be readable. And that means readable by others, including you in the future. Three principles of naming follow: 1. Names should mean something. 2. Names should be as short as possible. 3. Use your judgement to balance (1) and (2). The third […]

## Is data science a discipline?

Jeannette Wing, director of the Columbia Data Science Institute, sent along this link to this featured story (their phrase) on their web site. Is data science a discipline? Data science is a field of study: one can get a degree in data science, get a job as a data scientist, and get funded to do […]

## What can we do with complex numbers in Stan?

I’m wrapping up support for complex number types in the Stan math library. Now I’m wondering what we can do with complex numbers in statistical models. Functions operating in the complex domain The initial plan is to add some matrix functions that use complex numbers internally: fast fourier transforms asymmetric eigendecomposition Schur decomposition The eigendecomposition […]

## A normalizing flow by any other name

Another week, another nice survey paper from Google. This time: Papamakarios, G., Nalisnick, E., Rezende, D.J., Mohamed, S. and Lakshminarayanan, B., 2019. Normalizing Flows for Probabilistic Modeling and Inference. arXiv 1912.02762. What’s a normalizing flow? A normalizing flow is a change of variables. Just like you learned way back in calculus and linear algebra. Normalizing […]

## Monte Carlo and winning the lottery

Suppose I want to estimate my chances of winning the lottery by buying a ticket every day. That is, I want to do a pure Monte Carlo estimate of my probability of winning. How long will it take before I have an estimate that’s within 10% of the true value? It’ll take… There’s a big […]

## Abuse of expectation notation

I’ve been reading a lot of statistical and computational literature and it seems like expectation notation is absued as shorthand for integrals by decorating the expectation symbol with a subscripted distribution like so: $latex \displaystyle \mathbb{E}_{p(\theta)}[f(\theta)] =_{\textrm{def}} \int f(\theta) \cdot p(\theta) \, \textrm{d}\theta.$ This is super confusing, because expectations are properly defined as functions of […]

## Rao-Blackwellization and discrete parameters in Stan

I’m reading a really dense and beautifully written survey of Monte Carlo gradient estimation for machine learning by Shakir Mohamed, Mihaela Rosca, Michael Figurnov, and Andriy Mnih. There are great explanations of everything including variance reduction techniques like coupling, control variates, and Rao-Blackwellization. The latter’s the topic of today’s post, as it relates directly to […]

## Royal Society spam & more

Just a rant about spam (and more spam) from pay-to-publish and closed-access journals. Nothing much to see here. The latest offender is from something called the “Royal Society.” I don’t even know to which king or queen this particular society owes allegiance, because they have a .org URL. Exercising their royal prerogative, they created an […]

## Beautiful paper on HMMs and derivatives

I’ve been talking to Michael Betancourt and Charles Margossian about implementing analytic derivatives for HMMs in Stan to reduce memory overhead and increase speed. For now, one has to implement the forward algorithm in the Stan program and let Stan autodiff through it. I worked out the adjoint method (aka reverse-mode autodiff) derivatives of the […]