Someone who wishes to remain anonymous writes:

Lacking proper experience with multilevel modeling, I have a question regarding a nation-wide project on hospital mortality that I’ve recently come into contact with. The primary aim of the project is to benchmark hospital performances in terms of mortality (binary outcome) while controlling for “case mix”, that is, the particular mix of patients (in terms e.g., age, gender, comorbidity, …) in the hospital. But it is not unthinkable that the project can lead to e.g., early warning systems. In short:

The data:

– about 1e6 records for all patients for the last 10 years in all participating hospitals, including among other things the outcome, diagnosis, age, gender, comorbidity, treatment year, …

– there are two obvious higher level groupings: hospitals (n = +-30) and diagnosis groups (+-300 groups, all containing at least about 1000 observations)The analysis at this point:

– a separate logistic regressions for each diagnostic group

– LOOCV for variable selection for each regression

– Hospitals are not included as factor in the analysesThe benchmark:

– Hospitals are benchmarked for outcomes in a particular diagnostic group using the appropriate regression model on the basis of their patients in the selected period (thus, the prediction is tailored to their patients), comparing observed and predicted mortality rate.Having some background in statistics (in a previous life I have read your book on multilevel modeling), I wondered whether:

– multi-level analysis can add anything over and above the current approach. I can imagine there will not be much shrinkage given the large numbers in each group, but still, I have the itching intuition that some effects (e.g., age) can be better gauged using all the data. Thus, a model with varying intercepts (diagnostic groups) and at least some varying slopes (for example, age, age^2, comorbidity, …)

– excluding the hospitals makes sense? My first reaction was: use them in the regression, evaluate the effects, and if required, make predictions without their coefficient.

My reply:

You could have a billion or a trillion observations but you’d still have only 10 years of data, so I’d expect this variation over time to drive your uncertainty. So, yeah, I’d put in varying intercepts for time, hospital, and their interaction, for sure. Probably some varying slopes would make sense too.

There are two things to worry about here: modeling and computation.

Setting computation aside for a moment, you’d definitely want to put in those varying intercepts as a start. One thing we haven’t worked out so well is workflow: what varying slopes to include, how much to build out your model before it becomes so large that it’s hard to understand, etc. It could also make sense to use informative priors for the group-level variances. And once you’ve fit your model, you can plot time series of those varying intercepts, for each hospital. Also plot those intercepts for the diagnostic groups.

The other thing is computing. Lots of issues here. Again, we need a clear workflow.

**P.S.** I wrote the above a few months ago. Since then, my colleagues and I wrote this article on Bayesian workflow.