## Webinar on approximate Bayesian computation

X points us to this online seminar series which is starting this Thursday! Some speakers and titles of talks are listed. I just wish I could click on the titles and see the abstracts and papers!

The seminar is at the University of Warwick in England, which is not so convenient—I seem to recall that to get there you have to take a train and then a bus, or something like that!—but it seems that they will be conducting the seminar remotely, which is both convenient and (nearly) carbon neutral.

1. The only thing I don’t love about ABC is the name. in general I don’t think we should think of it as approximate anything. it’s a kind of model in which we provide a probability that a summary statistic will fall within a certain distance of a predicted value using a computational predictor… It should be called something that doesn’t make it séem like it’s “just an approximation” to a “real model”

• Phil says:

I’m not 100% sure that I disagree with this (Daniel) but I think I disagree. I think ABC refers to situations in which you’d like to fit a certain model, but it’s too complicated, so you settle for an approximation to that model instead. I acknowledge that the model that you actually fit is, itself, a model, which you can look at as “an exact version of itself” rather than “an approximation to what you want to fit.” But just because you _can_ look at it that way doesn’t mean it’s a helpful way to look at it. If you want to fit model A but have to settle for approximate results rather than full convergence on the full model, I think it’s fair to say you’ve done an ‘approximate’ computation.

But I’m not 100% sure I have this right.

• Giacomo Petrillo says:

ABC can in principle compute a posterior with arbitrary accuracy, much like MCMC. Maybe it is more approximate in the sense that: if you start an MCMC, the computation converges to the result as you increase the number of samples, while with ABC if you increase the number of monte carlo samples without also decreasing the acceptance, it converges to a convolution of the result with the acceptance.

I agree with Daniel that the name is not so good, I have known the method for a long time before discovering it was called ABC, because every time I glanced or heard ABC somewhere or even read the acronym expansion it would not connect in any way to the thing.

• Phil, often there is nothing that you are approximating. The usual context is that you have a deterministic, or pseudorandom computational simulator. From this you can draw a single or small ensemble of predictions. How do you turn this into a posterior distribution? you must rate the plausibility of the parameter based on the agreement between the deterministic computational prediction and the data. So you summarize the data, and the prediction and come up with a kernel that is a peaked nonnegative function of the closeness of the two summaries. This isn’t an approximation of anything, it’s just a model of agreement.

I think this situation is far more common than any situation in which you have an underlying “real” model that you try to approximate explicitly.

• Phil says:

Isn’t this how all Bayesian computation is done, though? You generate samples from a distribution, and either accept or reject them?

Eh, don’t bother explaining, at least not on my account: I probably have the wrong end of the stick about this method I don’t know anything about, and I’m too lazy to read up on it. I should at least read the Wikipedia page or something.

• Haha!

Here’s the ABC method in a nutshell…

1) Use a prior to generate a parameter vector with an RNG.

2) Use a computational method to generate a detailed prediction about outcomes with the parameter vector (example: a weather prediction, or a finite element prediction, or an agent based model of electricity production at all windmills in all the various wind farms…)

3) Take the detailed output of the computational model and summarize it into a smaller dimensional summary: mean wind speed over 1km patches, temperature at 4 key locations where sensors are placed, net wind power of all windmills for each farm.

4) Here’s where some people consider “approximate”: either

a) accept the parameter vector if the predicted summary is *exactly the same as the actual data to 64 bit floating point precision*

or

b) accept the parameter vector with probability proportional to K(Ssim, Smeas) where Ssim is the summary of the simulations and Smeas is the same summary from measured values and K is some nonnegative kernel function.

repeat.

Now, in my opinion 4a is a misguided idea of what it means to have a model. Your deterministic computational model is not telling you “there’s a delta function probability that you’ll get exactly what this simulation predicts”… But if you consider that to be true then 4b is an “approximation” of this delta function distribution as a peaked kernel function.

I disagree almost all the time, and think that your deterministic model is just a “maximum probability” prediction, and there should always be some kind of “probability of error” which the kernel represents. USUALLY, 4b is the actual model you want to fit, not 4a.

• Phil:
You are right and wrong and Daniel is a bit righter.

With small samples with discrete outcomes, the ABC naive method is exactly full Bayes. Limited computation limits other situations to have to be approximate in order to work.

Now, many people don’t see right away or without a lot of effort that P(u|x) ~ P(u) P(x|u) is just sample from the prior and then sample from P(x|u) by rejection (just keep is it equals x = observed).

When I first starting doing this in 2005 people would tell me I was wrong (some here on this blog back then).

Rasmus Baath is fixing that with this video Introduction to Bayesian data analysis – part 1: What is Bayes?https://www.youtube.com/watch?v=3OJEae7Qb_o

150,000 views so far.

Now, you are right in that you will unlikely see any of that in webinar at University of Warwick.

• I’m hoping to see the recorded video from the first talk, but it’s not up yet. However, the paper seems to have a nice idea. Basically come up with a flexible parameterized family that can represent the posterior, sample from that, then importance sample from the sample to get a more correct sample, then retrain your normalizing flow based on that importance sample, and sample from the new normalizing flow…lather rinse repeat.

Each step gets you closer to a proposal distribution that “looks like” the real posterior, and then the importance sampling throws away the samples that are too “atypical”. This is an idea I really like and is sort of similar to some things I’ve tried in the past. For example doing MCMC on a “simpler” distribution to get an ensemble that “covers” the posterior, and then do a kind of diffusive monte carlo + simulated tempering to get what is essentially an importance sample.

• Not sure which talk you are referring to, but Mike Evans (University of Toronto) did speculate about 5 to 10 years ago that, eventually, importance sampling would replace MCMC.

• Shravan says:

I suggest Sensible Estimation as an alternative.

Although I like it that people like Xi’an (IIRC) exploited the ABC name to give talks titled the ABCs of ABC. That was something I wanted to do but he beat me to it so I’m dejected.

2. Hi Andrew,
following your post, by clicking on the titles you can now find the relevant papers. Abstracts of the talks have been also added. Links to the recorded talks are planned to be published as well.

Since the UK and most of the world is currently under lockdown, all seminars will be online, with speakers talking from their homes.

Best, Massimiliano

3. I signed up for info on the mailing list but I’m now realizing that this is going to be at 3:30 AM Pacific time in the US… will these be recorded so we can watch them at a decent hour ?