Skip to content

Stacking and multiverse

It’s a coincidence that there is another multiverse posting today.

Recently Tim Disher asked in Stan discussion forum a question “Multiverse analysis – concatenating posteriors?”

Tim refers to a paper “Increasing Transparency Through a Multiverse Analysis” by Sara Steegen, Francis Tuerlinckx, Andrew Gelman, and Wolf Vanpaemel. The abstract says

Empirical research inevitably includes constructing a data set by processing raw data into a form ready for statistical analysis. Data processing often involves choices among several reasonable options for excluding, transforming, and coding data. We suggest that instead of performing only one analysis, researchers could perform a multiverse analysis, which involves performing all analyses across the whole set of alternatively processed data sets corresponding to a large set of reasonable scenarios. Using an example focusing on the effect of fertility on religiosity and political attitudes, we show that analyzing a single data set can be misleading and propose a multiverse analysis as an alternative practice. A multiverse analysis offers an idea of how much the conclusions change because of arbitrary choices in data construction and gives pointers as to which choices are most consequential in the fragility of the result.

In that paper the focus is in looking at the possible results from the multiverse of forking paths, but Tim asked whether it would “make sense at all to combine the posteriors from a multiverse analysis in a similar way to how we would combine multiple datasets in multiple imputation”?

After I (Aki) thought about this, my answer is

  • in multiple imputation the different data sets are posterior draws from the missing data distribution and thus usually equally weighted
  • I think multiverse analysis is similar to case of having a set of models with different variables, variable transformations, interactions and non-linearities like in our Stacking paper (Yao, Vehtari, Simpson, Gelman), where we have different models for arsenic well data (section 4.6). Then stacking would be sensible way to combine *predictions* (as we may have different model parameters for differently processed data) with non-equal weights. Stacking is a good choice for model combination here as
    1. we don’t need to assign prior probabilities for different forking paths
    2. stacking favors paths which give good predictions
    3. it avoids “prior dilutation problem” if some processed datasets happen to be very similar with each other (see fig 2c in Stacking paper)


  1. Keith O'Rourke says:

    Would agree for prediction but not for anything explanatory.

    One of the primary roles of multiverse analyses may be to assess if some findings are common among the differing analysis (e.g. treatment effect always positive). Similar to meta-analysis where the most important step is to anticipate and critically assess what should be common in the different studies. (Even though the entry puts it as _In addition to_ providing an estimate of the unknown common truth – its primary.)

    Interestingly stacking is “ideal when the K different models being fit have nothing in common”!

    Similar (harder) challenges arise with _combining_ multiple priors literature from 1980,s (e.g. Aggregating opinions through logarithmic pooling) which I know some folks are now trying to recast.

    Given multiple posteriors are tomorrow’s multiple priors that perspective might be helpful. For instance, Multiple experts. Mathematical and behavioural aggregation. Pooling methods.

Leave a Reply