To what extent does the LaPlace approximation solve the issues with ADVI? (in terms of never knowing when the estimates are not to be trusted)

]]>I’d go for the aesthetic purity of self-referential symmetry.

Keep Figure 1, but label it Figure 1a. For Figure 1b, rescale the y axis so that the adjoint method curve in 1b matches the benchmark curve in 1a, and the benchmark curve in 1b becomes a straight vertical line.

]]>Gauss’ citation count would certainly be impressive, but wasn’t the normal distribution first introduced by De Moivre and then Laplace before Gauss?

]]>Are there lognormal latent variable models?

Yours Sincerely

]]>This will work for multivariate probit models (which augment the data with unobserved latent normally distributed variables), right?

Are there plans to implement this into Stan?

]]>Yes, one way or another.

There’s also a proposal Philip Greengard et al. in their paper, A Fast Linear Regression via SVD and Marginalization. The title’s a bit misleading because it’s considering a hierarchcical model, not a simple linear regression.

And then there are longstanding proposals from Andrew (don’t know if there’s a public reference anywhere) similar to what we use for autodiff variational inference (ADVI), which look like generic forms of the Markov chain Monte Carlo Expectation Maximization (MCMC-EM) algorithm, but with a Laplace approximation in the middle.

]]>Does this mean we’re going to get a Laplace approximation in the Stan interface?

]]>I’d also suggest fixing the y axis.

With a linear scale, it’s impossible to read anything from the graph other than the growth of the non-adjoint method and that the adjoint method is a lot faster. How much faster? Can’t tell. A log scale for y would let us read that off of the plot.

I like to go one step further and normalize the baseline comparison system to 0. Then the y axis is just time as a fraction of baseline time, so you can directly read off the speedups. That’s how I presented results in the autodiff paper.

]]>To be more descriptive, the title would have to be paragraphs long!

Hamiltonian Monte Carlo using an adjoint-differentiated Laplace approximation: Bayesian inference for latent Gaussian models and beyond

Monte Carlo: technique for computing integrals based on random numbers

Hamiltonian Monte Carlo: an efficient form of Markov chain Monte Carlo (MCMC) that uses gradients of the log posterior; this is what Stan does.

Laplace approximation: an approximation of a density by a multivariate normal centered at the density’s mode

Bayesian inference: this is all about computing posterior expectations, which are expectations of quantities of interest conditioned on observation, and include predictions for future quantitiesm, parameter estimates, and event probability estimates.

Gaussian model: one where parameters get a normal distribution (I’d like to see Gauss’s citation count!)

latent Gaussian model: one where there are unobserved parameters that get a normal distribution

adjoint differentiatied: adjoints are derivatives of final values w.r.t. intermediate values; reverse-mode autodiff is a form of adjoint algorithm

Now what this all does is use the Laplace approximation to marginalize the latent Gaussians out of a model. So if your model is p(alpha, beta) and beta are latent Gaussian parameters, then we want to compute p(alpha) by marginalizing out beta. That’s a lot easier with the Laplace approximation. It’s what INLA does for Bayes and lme4 does for max marginal likelihood. Having an adjoint algorithm for differentiating p(alpha), the marginalized form of p(alpha, beta) is huge.

]]>If you put a college math syllabus in a word blender, it might produce this result.

]]>