Skip to content

The rise and fall and rise of randomized controlled trials (RCTs) in international development

Gil Eyal sends along this fascinating paper coauthored with Luciana de Souza Leão, “The rise of randomized controlled trials (RCTs) in international development in historical perspective.” Here’s the story:

Although the buzz around RCT evaluations dates from the 2000s, we show that what we are witnessing now is a second wave of RCTs, while a first wave began in the 1960s and ended by the early 1980s. Drawing on content analysis of 123 RCTs, participant observation, and secondary sources, we compare the two waves in terms of the participants in the network of expertise required to carry out field experiments and the characteristics of the projects evaluated. The comparison demonstrates that researchers in the second wave were better positioned to navigate the political difficulties caused by randomization.

What were the key differences between the two waves? Leão and Eyal start with the most available explanation:

What could explain the rise of RCTs in international development? Randomistas tend to present it as due to the intrinsic merits of their method, its ability to produce “hard” evidence as compared with the “softer” evidence provided by case studies or regressions. They compare development RCTs to clinical trials in medicine, implying that their success is due to the same “gold standard” status in the hierarchy of evidence: “It’s not the Middle Ages anymore, it’s the 21st century … RCTs have revolutionized medicine by allowing us to distinguish between drugs that work and drugs that don’t work. And you can do the same randomized controlled trial for social policy” (Duflo 2010).

But they don’t buy it:

This explanation does not pass muster and need not detain us for very long. Econometricians have convincingly challenged the claim that RCTs produce better, “harder” evidence than other methods. Their skepticism is amply supported by evidence that medical RCTs suffer from numerous methodological shortcomings, and that political considerations played a key role in their adoption. These objections accord with the basic insight of science studies, namely, that the success of innovations cannot be explained by their prima facie superiority over others, because in the early phases of adoption such superiority is not yet evident.

I’d like to unpack this argument, because I agree with some but not all of it.

I agree that medical randomized controlled trials have been oversold; and even if I accept the the idea of RCT as a gold standard, I have to admit that almost all my own research is observational.

I also respect Leão and Eyal’s point that methodological innovations typically start with some external motivation, and it can take some time before their performance is clearly superior.

On the other hand, we can port useful ideas from other fields of research, and sometimes new ideas really are better. So it’s complicated.

Consider an example that I’m familiar with: Mister P. We published the first MRP article in 1997, and I knew right away that it was a big deal—but it indeed took something like 20 years for it to become standard practice. I remember in fall, 2000, standing up in front of a bunch of people from the exit poll consortium, telling them about MRP and related ideas, and they just didn’t see the point. It made me want to scream—they were so tied into classical sampling theory, they seemed to have no idea that something could be learned by studying the precinct-by-precinct swing between elections. It’s hard for me to see why two decades were necessary to get the point across, but there you have it.

My point here is that my MRP story is consistent with the randomistas’ story and also with the sociologists’. On one hand, yes, this was a game-changing innovation that ultimately was adopted because it could do the job better than what came before. (With MRP, the job was adjusting for survey nonresponse; with RCT, the job was estimating causal effects; in both cases, the big and increasing concern was unmeasured bias.) On the other hand, why did the methods become popular when they did? That’s for the sociologists to answer, and I think they’re right that the answer has to depend on the social structure of science, not just on the inherent merit or drawbacks of the methods.

As Leão and Eyal put it, any explanation of the recent success of RCTs within economics must “recognize that the key problem is to explain the creation of an enduring link between fields” and address “the resistance faced by those who attempt to build this link,” while avoiding “too much of the explanatory burden on the foresight and interested strategizing of the actors.”

Indeed, if I consider the example of MRP, the method itself was developed by putting together two existing ideas in survey research (multilevel modeling for small area estimation, and poststratification to adjust for nonresponse bias), and when we came up with it, yes I thought it was the thing to do, but I also thought the idea was clear enough that it would pretty much catch on right away. It’s not like we had any strategy for global domination.

The first wave of RCT for social interventions

Where Leão and Eyal’s article really gets interesting, though, is when they talk about the earlier push for RCTs, several decades ago:

While the buzz around RCTs certainly dates from the 2000s, the assumption—implicit in both the randomistas’ and their critics’ accounts—that the experimental approach is new to the field of international development—is wrong. In reality, we are witnessing now a second wave of RCTs in international development, while a first wave of experiments in family planning, public health, and education in developing countries began in the 1960s and ended by the early 1980s. In between the two periods, development programs were evaluated by other means.

Just as an aside—I love that above sentence with three dashes. Dashes are great punctuation, way underused in my opinion.

Anyway, they now set up the stylized fact, the puzzle:

Instead of asking, “why are RCTs increasing now?” we ask, “why didn’t RCTs spread to the same extent in the 1970s, and why were they discontinued?” In other words, how we explain the success of the second wave must be consistent with how we explain the failure of the first.

Good question, illustrating an interesting interaction between historical facts and social science theorizing.

Leão and Eyal continue:

The comparison demonstrates that the recent widespread adoption of RCTs is not due to their inherent technical merits nor to rhetorical and organizational strategies. Instead, it reflects the ability of actors in the second wave to overcome the political resistance to randomized assignment, which has bedeviled the first wave, and to forge an enduring link between the fields of development aid and academic economics.

As they put it:

The problem common to both the first and second waves of RCTs was how to turn foreign aid into a “science” of development. Since foreign aid is about the allocation of scarce resources, the decisions of donors and policy-makers need to be legitimized.

They argue that a key aspect of the success of the second wave of RCTs was the connection to academic economics.

Where next?

I think RCTs and causal inference in economics and political science and international development are moving in the right direction, in that there’s an increasing awareness of variation in treatment effects, and an increasing awareness that doing an RCT is not enough in itself. Also, Leão and Eyal talk a lot about “nudges,” but I think the whole nudge thing is dead, and serious economists are way past that whole nudging thing. The nudge people can keep themselves busy with Ted talks, book tours, and TV appearances while the rest of us get on with the real work.


  1. Josh says:

    Hi Andrew,

    I agree that nudges are over-hyped but I don’t think it’s completely dead. Have you seen this resent study on the redesign of summons forms in NYC?

  2. I wonder though whether RCT are going in the right direction in medicine.

  3. Martin says:

    Hi Andrew,

    I’m having a hard time putting a finger on what exactly bothers you with nudges. Is it:
    a. The replication issue?
    b. The overhyping of not-so-significant for non-scientific audiences?
    c. The fact that you don’t believe that the potential impact of nudge-like interventions is like for the policy world?
    d. All the above?

  4. John Williams says:

    On a quick skim, the sentence that made the most sense to me was “Resistance is less significant also because second wave RCTs typically evaluate interventions that are much shorter and smaller in scale than in the past.”

    As an environmental scientist dealing mostly with salmon and salmon habitats, I’ve struggled with how to develop better evidence to inform management. In the environmental field, this usually gets called “adaptive management,” originally by analogy to adaptive control theory in engineering. Doing this well has been hard, in large part because of political resistance to management experiments, and smaller and shorter experiments have been easier to sell. However, shorter and smaller experiments generally don’t answer the important questions. To my mind, focusing on better data collection and using hierarchical Bayesian modeling have been the most successful approaches.

  5. A.P. Salverda says:

    I love dashes and agree that they are underused, but the third dash in the sentence below is illegitimate:

    “While the buzz around RCTs certainly dates from the 2000s, the assumption—implicit in both the randomistas’ and their critics’ accounts—that the experimental approach is new to the field of international development—is wrong.”

    • Martha (Smith) says:

      I think that the problem is using dashes at all; I think a better way to write the sentence would be:

      “While the buzz around RCTs certainly dates from the 2000s, the assumption (implicit in both the randomistas’ and their critics’ accounts) that the experimental approach is new to the field of international development is wrong.”

      • anon e mouse says:

        Although I personally love parentheticals, pretty much anyone I’ve ever known with an editorial background has thought they should almost never be used in scientific writing. “If it belongs in parentheses, you don’t need to say it and should strike it; if it doesn’t, take it out of parentheses.”

        • Martha (Smith) says:

          “Although I personally love parentheticals, pretty much anyone I’ve ever known with an editorial background has thought they should almost never be used in scientific writing. “If it belongs in parentheses, you don’t need to say it and should strike it; if it doesn’t, take it out of parentheses.””

          Wow! That sounds extreme to me. Perhaps my view is influenced by being a mathematician, where parentheses are used to distinguish between different possible interpretations of a string of symbols.

        • Phil says:

          I used to use too many parenthetical statements, or at least that was my instinct: often I would remove them in editing. Now I’m not 100% opposed but I try hard to avoid them (and when I do use a parenthetical note I put it at the end of a sentence, not in the middle). But I do think it’s usually best to get rid of them. If I want the reader to read it, what’s the point of the parentheses? And if I don’t want the reader to read it, why am I writing it?

  6. Nick Adams says:

    I do love judicious use of punctuation marks, but I would have my red pencil out for that triple-dash sentence.
    With regard to RCTs: they are very difficult to do in many circumstances, and are often not done well, but a properly designed and performed RCT is logically unassailable.

  7. Steve says:

    Nothing is logically unassailable, not even logic. And, “properly designed and performed” depends on the problem and population you are studying, which you can’t evaluate until you do the study. You can’t know if you’re randomization failed until you have results. You cannot know whether your recruitment and follow up on subjects was “proper” until the study is over.

  8. Jonathan (another one) says:

    I think a more parsimonious answer as to why the first wave of RCTs failed is that the interventions proposed back then were thought to be prima facie efficacious. There was a certain arrogance that solutions could be imposed from the superior Western viewpoint and that native resistance was a problem, but a short-run problem. So the only measurement of success required was to be observable in the change aggregate well-being of the aid recipients.

    But then we had the rise of dozens of articles suggesting that this was *all* wasted money. At that point, proving that the money wasn’t going to be wasted became a thing. In the prior stage, the notion that the money was going to be wasted was a sufficiently minority position to be safely ignored.

    • Jonathan (another one) says:

      (I hit by accident.) This is all mentioned in the paper at page 21, with discussion of the Sachs-Easterly controversy in the late 90’s. What I’m not clear on is why it needs any more explanation than that.

  9. Luke says:

    I find their argument dismissing the appeal of RCTs based on merit disingenuous. Are RCTs perfect? No. Are there many cases where observational studies are superior? Of course. But a well identified observational study is not an actual the choice when it comes to evaluating many development interventions. The actual choice is between an RCT and nothing. If the rise of RCTs means a diversion of money away from organizations like Heffer International and towards organizations like Give Directly, that strikes me as a real improvement. It ain’t perfect, but at least there’s some evidence rather than none at all.

  10. Fafa says:

    There are some really interesting descriptive elements of this paper, but the analysis is a real mess. What seems to trip up the authors the most is the extent to which seemingly fixed categories–academics, development organizations, developing countries–have changed so much between the two eras. Obviously, there are not going to be development RCTs in Taiwan now, and the type of person who would have run such an RCT is much more likely to be an academic because most development agencies have turned into contractors. So the authors are constantly distracted by meaningless differences between the two waves.

    But there is certainly something to the political economy story. Unfortunately, they miss many of the institutional incentives on the researcher side, which is why the vision of one economist (Duflo) somehow becomes transmogrified in their telling as the vision of all development economists who have participated in an RCT.

    It’s telling that the authors close the article by misrepresenting the ‘Worms’ debate and the strength of the claims against the validity of the original study, even as their substantive point about the prominence of that paper is correct.

Leave a Reply