## Question on multilevel modeling reminds me that we need a good modeling workflow (building up your model by including varying intercepts, slopes, etc.) and a good computing workflow

Someone who wishes to remain anonymous writes:

Lacking proper experience with multilevel modeling, I have a question regarding a nation-wide project on hospital mortality that I’ve recently come into contact with. The primary aim of the project is to benchmark hospital performances in terms of mortality (binary outcome) while controlling for “case mix”, that is, the particular mix of patients (in terms e.g., age, gender, comorbidity, …) in the hospital. But it is not unthinkable that the project can lead to e.g., early warning systems. In short:

The data:
– about 1e6 records for all patients for the last 10 years in all participating hospitals, including among other things the outcome, diagnosis, age, gender, comorbidity, treatment year, …
– there are two obvious higher level groupings: hospitals (n = +-30) and diagnosis groups (+-300 groups, all containing at least about 1000 observations)

The analysis at this point:
– a separate logistic regressions for each diagnostic group
– LOOCV for variable selection for each regression
– Hospitals are not included as factor in the analyses

The benchmark:
– Hospitals are benchmarked for outcomes in a particular diagnostic group using the appropriate regression model on the basis of their patients in the selected period (thus, the prediction is tailored to their patients), comparing observed and predicted mortality rate.

Having some background in statistics (in a previous life I have read your book on multilevel modeling), I wondered whether:
– multi-level analysis can add anything over and above the current approach. I can imagine there will not be much shrinkage given the large numbers in each group, but still, I have the itching intuition that some effects (e.g., age) can be better gauged using all the data. Thus, a model with varying intercepts (diagnostic groups) and at least some varying slopes (for example, age, age^2, comorbidity, …)
– excluding the hospitals makes sense? My first reaction was: use them in the regression, evaluate the effects, and if required, make predictions without their coefficient.​