Andy Timm writes:

I’m curious if you have any suggestions for dealing with item nonresponse when using MRP. I haven’t seen anything particularly compelling in a literature review, but it seems like this has to have come up. It seems like a surprisingly large number of papers just go for a complete cases analysis, or don’t mention clearly how missingness in predictors was handled. I can generally see how that could make sense, but I’m not sure if that does in my case.

Specifically, I’m dealing with a set of polls on support for a border wall with some refusals/unknowns on demographic questions like race, income, and so forth. The refusals seem very informative—those who refused at least 1 demographic question are ~5 points less likely to oppose a border wall, driven mostly by people who refused the income question.

If I was poststratifying to the voter file, I could just do what you did with Yair Ghitza recently, and model the unknowns as their own category, and poststratify to the unknowns in the voter file. But for poststratification to PUMS or similar where we don’t have poststratification data for the unknowns, this indicator variable strategy wouldn’t be helpful I believe.

The alternative would be something like multiple imputation, but imputation for data that is all of 1) like missing not at random, 2) multilevel, 3) coming from surveys with weights seems like a challenge. I found some work that has imputation strategies for multilevel survey data with weights when you have most of the design information, but that’s not really the case with a large set of polls from iPoll—the construction of weights isn’t fully explained.

My thought is some sort of multiple imputation where the uncertainty could be propagated forward would be the optimal solution (and most conceptually elegant one), so for now I’m continuing to try various imputation models to see what produces sensible imputations.

My reply:

1. Yeah, this comes up a lot! The right approach has got to be to jointly model the predictors and the outcome and then do the imputations using this joint model. The multilevel structure would be included in this model. Liu, King, and I actually did this in 1998 using a hierarchical multivariate normal model, and it seemed to work well in our application (see also here). The model and code for this example are so old that it would be best just to redo it from scratch.

2. You mention survey weights. This should present no problem at all: just include in your regression model all the variables used in the survey weights, and then it’s appropriate to do an unweighted analysis. The information in the weights is encoded in population totals that you would use for poststratification for your ultimate inferences for your population of interest.

3. You also mention selection bias (missing not at random). If this missingness only depends on the variables in the model, you’ll be ok with the joint modeling approach described above. If the missingness can depend on unobserved variables, then you’d want to include these as latent variables in your multivariate model.

4. Another issue that can arise is imputation of discrete missing variables. This can be done using a conditional imputation approach (as we discuss here) or using a joint model with latent continuous variables.

It’s been a long time since I’ve done this sort of hierarchical multivariate imputation modeling, but it seems like the right thing to do, actually!

Considering this as a research project, I’d proceed along two tracks:

– A generic multilevel multivariate normal model for imputing missing values from multiple polls, using latent continuous variables for discrete responses. This modeling-based approach can be compared with off-the-shelf multiple imputation programs that don’t include the multilevel structure.

– A Bayesian latent-variable model for informative nonresponse, focusing on this border wall question.

Then, when each of these two parts is working, you can put them together. It should be possible to do this all in Stan.

At this point, you can approach the questions of interest (distribution of survey responses given demographics, poststratified across the population) directly, by including this regression in your big Stan model that you’re fitting; or you can use the above procedure as a method for multiple imputation, then construct some imputed datasets, and go on and do MRP with those.

I sent the above note to Timm, and he responded:

Here are some initial results from testing out a few imputation models based on your suggestions and some recent more recent multilevel imputation literature.

Your point about the weights make perfect sense, and simplifies things quite a bit. Since the imputation models use a superset of the variables that were used to build the weights, that part should be resolved for the most part.

In addition to your MIMS paper, Stef Van Buuren’s chapter and recommendations on multilevel imputation strategies and this simulation study paper from Grund, Lüdtke, and Robitzsch (2018) were helpful, particularly in suggesting FCS approaches with passive imputation of group means and interaction terms. Grund, Lüdtke, and Robitzsch mention that estimates of interaction terms from JM imputation can be biased, hence the preference for FCS in this context. I’m not sure JM vs. FCS would make a huge difference given similar models otherwise, but I’m trusting the above authors recommendation on that. So I took their suggestion for model form, but also roughly followed what you did in your MIMS paper, pooling over surveys.

The 4 models I tried were:

1. A simple FCS model in mice with no interaction terms (to start with a baseline model that should have problems)

2. Similar to your paper’s idea, a FCS model with a random intercept at the survey level, and lots of two way interactions using passive imputation.

3. Similar to 2, but also with a random intercept on state.

4. A random forest based imputation with predictive mean matching.As you might expect, the 1st model didn’t work too well. For example, it struggled to impute a larger proportion of hispanics in polls with hispanic oversamples. The other three imputation models all performed fairly well, suggesting that a major gain in imputation reasonableness came from including the interactions. Building 50 such imputed datasets, fitting MRP models in brms on them, and then mixing the draws across imputations appears to have worked well for making the final inferences.

Since there’s no population level ground truth for “support for a border wall” though, unlike say vote share, it’s hard to rigorously compare the quality of the final predictions from brms models built on top of each type of imputation. Thus, I’m currently presenting them in a sort of “robustness check” framework, where the final predictions I’m most interested in are fairly robust to different imputation models.