Skip to content

Authors repeat same error in 2019 that they acknowledged and admitted was wrong in 2015

David Allison points to this story:

Kobel et al. (2019) report results of a cluster randomized trial examining the effectiveness of the “Join the Healthy Boat” kindergarten intervention on BMI percentile, physical activity, and several exploratory outcomes. The authors pre-registered their study and described the outcomes and analysis plan in detail previously, which are to be commended. However, we noted four issues that some of us recently outlined in a paper on childhood obesity interventions: 1) ignoring clustering in studies that randomize groups of children, 2) changing the outcomes, 3) emphasizing results that were statistically significant from a host of analyses, and 4) using self-reported outcomes that are part of the intervention.

First and most critically, the statistical analyses reported in the article were inadequate and deviated from the analysis plan in the study’s methods article – an error the authors are aware of and had acknowledged after some of us identified it in one of their prior publications about this same program. . . .

Second, the authors switched their primary and secondary outcomes from their original plan. . . .

Third, while the authors focus on an effect of the intervention of p ≤ 0.04 in the abstract, controlling for migration background in their full model raised this to p = 0.153. Because inclusion or exclusion of migration background does not appear to be a pre-specified analytical decision, this selective reporting in the abstract amounts to spinning of the results to favor the intervention.

Fourth, “physical activity and other health behaviours … were assessed using a parental questionnaire.” Given that these variables were also part of the intervention itself, with the control having “no contact during that year,” subjective evaluation may have resulted in differential, social-desirability bias, which may be of particular concern in family research. Although the authors mention this in the limitations, the body of literature demonstrating the likelihood of these biases invalidating the measurements raises the question of whether they should be used at all.

This is a big deal. The authors of the cited paper knew about these problems—to the extent of previously acknowledging them in print—but then did them again.

They authors did this thing of making a strong claim and then hedging it in their limitations. That’s bad. From the abstract of the linked paper:

Children in the IG [intervention group] spent significantly more days in sufficient PA [physical activity] than children in the CG [control group] (3.1 ± 2.1 days vs. 2.5 ± 1.9 days; p ≤ 0.005).

Then, deep within the paper:

Nonetheless, this study is not without limitations, which need to be considered when interpreting these results. Although this study has an acceptable sample size and body composition and endurance capacity were assessed objectively, the use of subjective measures (parental report) of physical activity and the associated recall biases is a limitation of this study. Furthermore, participating in this study may have led to an increased social desirability and potential over-reporting bias with regards to the measured variables as awareness was raised for the importance of physical activity and other health behaviours.

This is a limitation that the authors judge to be worth mentioning in the paper but not in the abstract or in the conclusion, where the authors write that their intervention “should become an integral part of all kindergartens” and is “ideal for integrating health promotion more intensively into the everyday life of children and into the education of kindergarten teachers.”

The point here is not to slam this particular research paper but rather to talk about a general problem with science communication, involving over-claiming of results and deliberate use of methods that are problematic but offer the short-term advantage of allowing researchers to make stronger claims and get published.

P.S. Allison follows up by pointing to this Pubpeer thread.


  1. The necessary inclusion of limitations in the abstract was raised in this Panel (I believe by Magdalena Skipper, Editor in Chief, Nature):

    Building Trust: Evidence and its Communication
    Panelists: Amy Abernethy (Principal Deputy Commissioner of the U.S. FDA), Patti Brennan (Director of the National Library of Medicine, NIH), Magdalena Skipper (Editor in Chief, Nature), Deborah Nelson (Associate Professor of Investigative Journalism, Univ. of Maryland), Roni Caryn Rabin (science reporter for the New York Times)

    Moderator: George Hripcsak

  2. A. Tasso says:

    This thing you do where you say “I’m not trying to beat up on these particular authors for doing this bad thing, I’m just trying to talk about a general thing” is pretty tiresome. If you don’t want to beat up on specific people, then just don’t beat up on specific people. It’s not that difficult. Or, if you really can’t resist calling out specific authors, then collect a few examples before posting. That would lend a little more credibility to your “I’m not here to slam anyone specifically” claims.

    • Andrew says:


      I’m not beating up on anyone. I’m talking about two published papers. It’s not personal. I only mentioned the name of one of the authors because it was in Allison’s note that I was quoting.

      Speaking more generally, I hate hate hate hate hate the attitude that criticizing technical details in a published paper is “beating up” on someone. People criticize my work too. They’re not beating up on me. They’re pointing out my mistakes. That’s what science is all about. Science isn’t science if you can’t point out mistakes, or if pointing out mistakes is considered “beating up on specific people.”

    • Andrew says:

      P.S. Regarding your request that I “collect a few examples before posting”: I’ve collected lots and lots of examples! Lots and lots of blogging on “general problem with science communication, involving over-claiming of results and deliberate use of methods that are problematic but offer the short-term advantage of allowing researchers to make stronger claims and get published.” I’m not going to put all these examples in one post. Regular readers will we familiar with these examples; intermittent readers can do some googling.

Leave a Reply