Skip to content
 

The garden of forking paths: Why multiple comparisons can be a problem, even when there is no “fishing expedition” or “p-hacking” and the research hypothesis was posited ahead of time

Kevin Lewis points us to this article by Joachim Vosgerau, Uri Simonsohn, Leif Nelson, and Joseph Simmons, which begins:

Several researchers have relied on, or advocated for, internal meta-analysis, which involves statistically aggregating multiple studies in a paper . . . Here we show that the validity of internal meta-analysis rests on the assumption that no studies or analyses were selectively reported. That is, the technique is only valid if (a) all conducted studies were included (i.e., an empty file drawer), and (b) for each included study, exactly one analysis was attempted (i.e., there was no p-hacking).

This is all fine, and it’s consistent with the general principle that statistical analysis must take into account data collection, in particular that you should condition on all information involved in measurement and selection of observed data (see chapter 8 of BDA3, or chapter 7 of the earlier editions, for derivation and explanation from a Bayesian perspective).

I just want to point out one little thing.

This bit is wrong:

“exactly one analysis was attempted (i.e., there was no p-hacking)”

There is still a problem even if only one analysis was performed on the given data. What is required is that the analysis would have been done the same way, had the data been different (i.e., there were no forking paths). As Eric Loken and I put it, multiple comparisons can be a problem, even when there is no “fishing expedition” or “p-hacking” and the research hypothesis was posited ahead of time.

Vosgerau et al. clarify this point in the last sentence of their abstract, where they emphasize that “preregistrations would have to be followed in all essential aspects”—so I know they understand the above point about forking paths. I just wouldn’t want people to just read the first part and mistakenly think that, because they did only one analysis on their data, they’re not “p-hacking” and so they have nothing to worry about.

Leave a Reply