Skip to content

Responding to Richard Morey on p-values and inference

Jonathan Falk points to this post by Richard Morey, who writes:

I [Morey] am convinced that most experienced scientists and statisticians have internalized statistical insights that frequentist statistics attempts to formalize: how you can be fooled by randomness; how what we see can be the result of biasing mechanisms; the importance of understanding sampling distributions. In typical scientific practice, the “null hypothesis significance test” (NHST) has taken the place of these insights.

NHST takes the form of frequentist signficance testing, but not its function, so experienced scientists and statisticians rightly shun it. But they have so internalized its function that they can call for the general abolition of significance testing. . . .

Here is my basic point: it is wrong to consider a p value as yielding an inference. It is better to think of it as affording critique of potential inferences.

I agree . . . kind of. It depends on what you mean by “inference.”

In Bayesian data analysis (and in Bayesian Data Analysis) we speak of three steps:
1. Model building,
2. Inference conditional on a model,
3. Model checking and improvement.
Hypothesis testing is part of step 3.

So, yes, if you follow BDA terminology and consider “inference” to represent statements about unknowns, conditional on data and a model, then a p-value—or, more generally, a hypothesis test or a model check—is not part of inference; it a critique of potential inferences.

But I think that in the mainstream of theoretical statistics, “inference” refers not just to point estimation, interval estimation, prediction, etc., but also to hypothesis testing. Using that terminology, a p-value is a form of inference. Indeed, in much of statistical theory, null hypothesis significance testing is taken to be fundamental, so that virtually all inference corresponds to some transformations of p-values and families of p-values. I don’t hold that view myself (see here), but it is a view.

The other thing I want to emphasize is that the important idea is model checking, not p-values. You can do everything that Morey wants to do in his post without ever computing a p-value, just by doing posterior predictive checks or the non-Bayesian equivalent, comparing observed data to their predictions under the model. The p-value is one way to do this, but I think it’s rarely a good way to do it. When I was first looking into posterior predictive checks, I was computing lots of p-values, but during the decades since, I’ve moved toward other summaries.

Leave a Reply