Thanks.

For the question about simulating data from the prior (last paragraph), I was actually referring to a model with many group level effects but not hierarchical. Like disease ~ time + (1|group1) + (1|group2) + (1|group3), where groups are not nested. I had a model like this where I set what I thought were reasonable priors for the sd of each group level effect (reasonable in the sense that they constrained the sd of each group into something within the realm of possibility). However, when I simulated data from the model, sampling from the priors, then there was a lot of data that was much outside the realm of possibility. This was because while any one particular group in the real world might reasonably have an sd of say 2.5, not all groups would. When you simulate data from the priors though, it’s possible to get data from the model where the sample from the prior was 2.5 for every group, which in something like a negative binomial model, will add up fast to produce wonky data.

It just seems like the more group level effects you have, then the smaller the prior may need to get when trying to simulate data from the model sampling from the priors (especially for something like a negative binomial model).

Does that make any sense? I’ve actually asked this before somewhere else on this blog, but never got a response. So I figured it was a dumb question/observation or poorly explained by me.

N(mu,s), with

mu~N(m1,s1)

m1~N(m2,s2)

m2~N(m3,s3)

m3~N(m4,s4)

m4~N(m5,s5)

m5~N(m6,s6)

m6~N(m7,s7)

m7~N(m8,s8)

Justin

]]>I think you may have answered your first question with your second, and vice versa.

> what you mean by a “model with a lot of parameters”?

Setting a prior on the upper level of a hierarchy automatically and implicitly sets priors for all parameters at lower levels. You provide an example of such a model in your second question

> a model with lots of varying effects

And you point out the danger involved in such a model, wherein seemingly benign prior choices at higher levels cascade downward into wonky predictions by virtue of their joint effects at lower levels. So this is a situation in which it is absolutely critical to simulate data from the prior because that’s the only way to understand the consequences of the priors you’ve set at the higher level.

To be clear, I’m not saying that complex hierarchical models are bad, just that simulation is a universally valuable tool for wrapping one’s head around them. And this is particularly so *because* it is generally not possible to set explicit priors at every level of the hierarchy.

]]>Here, when you point out that each point in that simulated joint distribution is a valid fake reality which can be used to check the veracity* of your current Bayesian analysis model (ideally somewhat different from the model that generated the joint distribution to reflect models always being wrong), most folks seem to get why/how unexamined priors (and data generating models) can be disastrous.

* veracity as habitual truthfulness (and away to avoid mention frequency of error explicitly).

]]>I would very much like a follow up to: https://gelmanstatdev.wpengine.com/2019/12/03/whats-wrong-with-bayes/ if you feel inspired to type out those technical criticisms of Bayesian methods.

I like where this is going but it feels fishy that u* is fixed and isn’t going to u_{true} as n -> infinity or whatnot.

]]>I’m almost done writing this and related material up for the Stan user’s guide. It’s going in a whole new part with chapters covering prediction, simulation-based calibration, prior and posterior predictive checks, cross-validation, and the bootstrap. The current version’s still a only a pull request until I finish the x-val chapter.

]]>“And that should be a bit of a concern. Because if you’ve got a model with a lot of parameters”

Does this depend on what you mean by a “model with a lot of parameters”? It seems like you are talking about a situation where you set a prior for each parameter individually. When I run a hierarchical model in brms, for example, disease ~ time + (time|state/county/site); I have a lot of parameters (and maybe few data points per group). But setting the prior involves a prior on the sd of group level intercepts and slopes, not each varying intercept or slope…like setting a prior on the partial pooling. Do you mean a situation where you have a lot of population level or ‘fixed’ effects?

For step 2, with a model with lots of varying effects, like disease ~ time + (1|group1) + (1|group2) + (1|group3), do you still simulate data from the prior? The reason that I ask is that it seems that the priors for the sd of group level effects are often much too large if you have a bunch of effects. Especially if the model is on the log scale, then those effects add up quick, and if you simulate data from those priors it is often relatively easy to get data that would be outside the realm of possibility. I think this could happen even if you put priors that were the same sd as the actual data, because when you simulate, it is possible to get all high or all low values from the groups.

]]>