Skip to content

These 3 problems destroy many clinical trials (in context of some papers on problems with non-inferiority trials, or problems with clinical trials in general)

Paul Alper points to this news article in Health News Review, which says:

A news release or story that proclaims a new treatment is “just as effective” or “comparable to” or “as good as” an existing therapy might spring from a non-inferiority trial.

Technically speaking, these studies are designed to test whether an intervention is “not acceptably worse” in terms of its effectiveness than what’s currently used. . . .

These trials have proliferated as drug and device makers find it harder to improve upon existing treatments. So instead, they devise products they hope work just as well but with an extra benefit, such as more convenient dosing, lower cost, or fewer side effects.

If a company can show its product is just as effective as the current standard treatment but with an added perk, it might gain a marketing edge.

Sounds like no problem so far: Why not have some drug that performs as well as its competitor but is better in some secondary way?

But the article continues:

Problem is, the studies used to generate that edge often aren’t considered trustworthy.

Generally speaking, non-inferiority trials are considered less credible than a more common trial design, the superiority trial, which determines whether one treatment outperforms another treatment or a placebo. That’s because non-inferiority trials are often based on murky assumptions that could favor the new product being tested.

Rarely do non-inferiority trials conclude that a new treatment is not non-inferior . . . That scarcity of negative findings “raises the provocative questions of whether industry-sponsored non-inferiority trials offer any value—aside from capturing market share,” wrote Vinay Prasad, MD, in an editorial in the Journal of Internal Medicine entitled “Non-Interiority Trials in Medicine: Practice Changing or a Self-Fulfilling Prophecy?”

In a separate concern, ethical issues have been raised about whether some non-inferiority trials should be conducted at all, because they might expose patients to potentially worse treatments in order to advance a commercial goal.

From an article, “Non-inferiority trials: are they inferior? A systematic review of reporting in major medical journals,” by Sunita Rehal et al.:

Reporting and conduct of non-inferiority trials is inconsistent and does not follow the recommendations in available statistical guidelines, which are not wholly consistent themselves.

There’s a lot of discussion of “type 1 error rate,” which I don’t care about. True effects, or population differences, are never zero.

The general point is that non-inferiority trials, like clinical trials in general, can be gamed, and they are gamed.

The way it looks to me is that non-inferiority trials do have a lot of problems, and that these are problems that “regular” clinical trials have also. The problems include:
1. A statistical framework that is focused on the uninteresting question of zero true effect and zero systematic error,
2. A desire and an expectation to come up with certain conclusions from noisy data,
3. Incentives to cheat.

Regarding point 2: it’s worse than you might think. It’s not just that “statistical significance” is typically taken as tantamount to a certain claim that a treatment is effective. It’s also that non-significance is commonly taken as a certain claim that a treatment has no effect (see for example our discussion of stents). Since every result is either statistically significant or not, this gives you automatic certainty, no matter what the data are!

P.S. Full disclosure: I’ve had business relationships with Novartis, Astrazeneca and other drug companies.


  1. DK says:

    In other words, there is nothing inherently wrong with non-inferiority trials but there are tons of shit clinical trials in general. Well then, what do we have the FDA for?

    • Duh how are you going to use regulatory capture to reduce competition if there is no regulatory body to capture?

    • Keith O’Rourke says:


      Given the FDA approached me and Jim Berger and others at SAMSI in 2008 to review non-inferiority trials I can comment freely on that.

      What took us a bit to get a grasp of is that the real concern is not whether an intervention is “not acceptably worse” but rather is it actually worse than placebo. At least if the margin of equivalence (say 80 percent of currently believed effect on the current treatment) is not fixed before hand. Without a fixed margin of equivalence the analysis is focused on an indirect estimate of new drug versus placebo. From a likelihood perspective there is a missing likelihood component for that comparison so a likelihood component is re-used (a real double use of data). Under usual assumptions that gives rise to the same formulas that many use for indirect comparisons and network meta-analysis. From a missing data perspective these amount to a single imputation of a whole missing placebo group. This under represents the true uncertainties (random and systematic)in . Its one of those cases where it is hard to see how to fix it without bringing an appropriate prior to replace the borrowed likelihood component. Think Stephen Senn just published something on it a few years ago.

      On the other hand, my understanding is that within the FDA process problems 1, 2 and 3 are usually effectively dealt with. There are penalties for cheating, a conclusion of not sufficient evidence of benefit is always on the table and default significance levels can sometimes be overridden with judgement.

      Now what gets published in journals on the same studies often bears no resemblance to the final submission to the FDA. Believe I read they are starting to let the journals know about this.

  2. Austin Fournier says:

    Wait, the Type I error rate? Isn’t it the Type II error rate that’s relevant here? It is non-significance people are looking for in these cases, is it not?

    • a reader says:

      No, these studies don’t try to show non-significance. Rather, they try to show that treat_effect_new is greater than alpha times treat_effect_old, with alpha being some “acceptable” level (i.e., 0.8 or something like that).

  3. Paul Alper says:

    Andrew writes “see for example our discussion of stents” in which he criticized the statistical analysis associated with the so-called ORBITA trial which concluded that stents failed to live up to expectations. Andrew did not mention either the cost of stents or the subsequent problems which stents may cause. For an update on problems due to stents:

    “If you are a patient with coronary artery disease and your doctor is recommending a stent, you should put up the stop sign and ask your doctor to reconsider his/her premise. Given the issues and unanswered questions that attend the use of any stent, is a stent really necessary? Are other treatments available that can be applied before resorting to a stent?”

    “However, stents should be avoided whenever possible. In addition to the risk involved with the performance of the PCI procedure itself, the presence of a stent creates a long-term management problem, for both the doctor and the patient, whose ultimate resolution remains unclear. Namely, is it ever safe to stop the powerful anti-platelet drugs needed after PCI? (Notably, several patients in the ORIBTA trial who had the sham procedure suffered major bleeding episodes during follow-up.)”

    For a wider look at implants:

    “Pacemakers, artificial hips, contraceptives and breast implants are among the devices that have caused injuries and resulted in patients having to undergo follow-up operations or in some cases losing their lives.

    In some cases, the implants had not been tested in patients before being allowed on to the market.”

    “Among the concerns raised by the Implant Files project are that manufacturers are in charge of testing their own products after faults have developed – and are allowed to shop around for approval to market their products, without declaring any refusals.

    The Guardian has also heard about doctors who have close industry ties or seem eager to be early adopters of the latest devices to enhance their professional standing.”

  4. Martha (Smith) says:

    From the Gelman et al article linked:

    “In ORBITA, exercise time in a standardized treadmill test—the primary outcome in the preregistered design—increased on average by 28.4 sec in the treatment group compared to an increase of only 11.8 sec in the control group….

    … ORBITA was never meant to be definitive in a broad sense—it was designed to find a physiological effect of stenting on mean exercise time, without clarity on the clinical relevance of this outcome. Indeed, a likely reason why the study was limited to this endpoint was because this is all that could have passed an ethical board given the novelty of the placebo procedure in this setting.”

    My experience with people getting a stent is limited to one person who got a stent because of coronary blockage detected when the patient was being diagnosed with congestive heart failure. The stent insertion was followed up by an exercise rehab class, lasting for several weeks, to strengthen the heart muscle through exercise. So to me, asking if the stent itself increased exercise time (I assume in a single “test” on the treadmill) seems artificial. So the ORBITA trial sounds pretty artificial to me.

  5. a reader says:

    To be honest, I’m very disappointed with this blog post.

    There’s no discussion of any evidence, and no presentation of either how non-inferiority studies or their importance. Given that non-inferiority studies are how generics get approved, and that’s the mechanism to bring down drug prices, I think it goes without saying that if you’re going to say there’s a problem with non-inferiority studies, please come with evidence.

    I know you quote the “low rates of failure” as evidence that these studies are somehow being cheated, but please, try to be Bayesian about it; it’s much easier to reverse engineer something (i.e., create a generic) than come up with something novel. So the low rate of failure isn’t even surprising when you stop to think about it.

  6. Is there no way to do away with problems 2 and 3 by making drug testing completely independent of the drug companies? Why can’t a researcher spend the same research budget without any desire for a particular conclusion?

  7. Z says:

    Here’s a paper discussing adjustment for non-compliance in equivalence trials:

Leave a Reply