I’m a bit surprised by the advice to interact the treatment indicator with covariates. Is there somewhere I can read more about this? And also about the equivalence to matching?

]]>That said, too many modelers (everyone?) rely on a single loss minimization metric as the sole criterion for model evaluation and selection. While this approach has a long history in statistics and has the advantage of being easily implemented, breaking out of this box wrt exclusive reliance on a single metric seems like a good idea. For instance, why wouldn’t an analyst want to know if the model which minimizes error is also, at the same time, the model which maximizes dependence? Expanding the evaluation from one into a cloud of metrics which includes, not just alternative measures of error (robust, asymmetric, etc.), but also metrics of linear and nonlinear dependence should only be insightful and salubrious.

If the goal is to choose a single ‘best’ model from the set, depending on the breadth and depth of the models developed and the metrics employed, it’s easy to imagine a summary matrix of models by metrics which would lend itself to some variant of multivariate dimension reduction, e.g., scored results from PCA, KMapper, correspondence analysis, whatever. Pick the top-scoring model.

Note, however, that any form of multiverse modeling does not lend itself to automated machine learning algorithms, the bread and butter of all applied environments. These are custom, ad hoc suggestions which few are likely to explore since they are time and cpu intensive, a deal breaker for most.

]]>Also, I think these comparisons are a bit weird to frame in terms of, e.g., OLS vs. matching since common advice now is to center covariates and interact them with treatment. With categorical covariates this is equivalent to post-stratification — aka many-to-many (potentially coarsened) exact matching.

]]>Ryan—What you’re looking for in the traditional p-value literature is “post-selection inference”. This tries to adjust p-values for the effect of things like variable selection in a regression model.

There’s also a literature on exploratory data analysis (EDA). That isn’t necessarily model-specific in a formal way, but it plays much the same role as fitting simple models—you can see basic patterns in the data. If you do EDA before selecting a model, it introduces the same kind of data-dependent model-selection bias.

So the real question is whether the kind of continuous model expansion and prior enrichment we propose in the Bayesian workflow paper going to bias our inferences, and if so, how? Our working assumption is that as long as we work with posterior predictions and only introduce as much model as the data supports (something like a penalized complexity prior as a way of life—it’s the theory behind Andrew’s Fantabulous Unfolding Flower), then we should be OK. Just moving to a cross-validation viewpoint from a posterior predictive check viewpoint is helpful here, as is moving away from null hypothesis significance testing toward measuring more practical decision-theoretic utility. I seriously believe that the re-orientation from hypothesis testing to prediction is the main reason why machine learning has been eating statistics’ lunch for the past couple of decades.

]]>Very interesting explanation, and thanks for the reference.

]]>There was a series of discussion pieces not long ago in Computational Brain & Behavior that addressed the topic of “robust modeling”:

https://www.springer.com/journal/42113

Transparency of the kind you describe is one of the many ideas that is brought up in that discussion. Another point that comes up in those discussions is that there are many different types of models developed for different purposes, and the same suggestions don’t necessarily apply to all of them.

For example, a statistical model meant to describe the data might be a variant of an “off-the-shelf” approach, like logistic regression. In that case, it would be straightforward to describe the different members of that model family because the choices you can make within that framework are somewhat more proscribed.

On the other hand, we also build models as representations of scientific theories, and here it is almost never obvious what *all* the choice points are. Of course, there still might be several “leading contender” models that the authors would do well to describe. But in general, with such a wide-open field, I think it’s better for the authors to make clear what they did rather than what they might have done. A big problem is that often a choice needs to be made and there is little compelling reason to pick a single option.

For example, I might say something like, “my theory is that a certain decision gets made by accumulating evidence to a threshold,” and then say, “I will model evidence arrivals as a Poisson process, but this is just an assumption to be able to derive predictions and other models of evidence arrival are possible.” I can justify the choice of Poisson if it actually fits the data, but that’s it. Nonetheless, I’ve also made clear to a reader that it was a choice point that others might amend if additional constraints are found.

This way, I’m still being transparent, but I don’t need to describe every possible variant one might pick.

]]>@Anoneuoid: I think a logistic regression intercept has an interpretation even if the model’s not well specified. But it’s a different interpretation than the same model with a new predictor. If we standardize predictors, then the interpretation scale changes for coefficients even if we change the prior to compensate. I’ve been discussing just this issue with one of the Columbia grad students, but didn’t want to put them on the spot.

]]>Thats why it needs to be derived from a set of agreed upon assumptions if you want to interpret it rather than just use the model to predict something.

]]>