Pipeline Designs and Equipoise: How Can They Go Together?

I am writing about phase-in / pipeline designs. Again. I’ve already done it here. and more here. but.

The premise of a pipeline or phase-in design is that groups will be randomized or otherwise experimentally allocated to receive a given intervention earlier or later. The ‘later’ group can then serve as the comparison for the ‘early’ group, allowing for a causal claim about impact to be made. I am specifically talking about phase-in designs premised on the idea that the ‘later’ group is planned (and has perhaps been promised) to receive the intervention later. I take this to be a ‘standard’ approach to phase-in designs.

I’d like to revisit the issue of phase-in designs from the angle of equipoise, which implies some sense of uncertainty about the causal impact of a given intervention. This uncertainty provides the justification for studying making use of an ex ante impact evaluation. Equipoise literally translates to equal weight / force / interest. Here, the force in question is the force of argument about the impact of an intervention and which direction it will go (or whether there will be one at all).

There have already been some great conversations, if not decisive answers, as to whether, in social science research, the justification for using experimental allocation of an intervention needs to meet the standards of clinical equipoise or policy equipoise.* The key difference is the contrast between ‘a good impact’ (clinical equipoise) and ‘the best impact achievable the resources’ (policy equipoise). In either case, it is clear that some variant of equipoise is considered a necessary justification. For theoretical and/or empirical reasons, it just isn’t clear whether an intervention is (a) good (investment).

Whichever definition of equipoise you pursue, the underlying premise is one of a genuine uncertainty and an operational knowledge gap about how well a certain intervention will work in a certain setting at a certain point in time and at what degree of relative resource efficiency. This uncertainty is what lends credibility to an ex ante impact evaluation (IE) and the ethical justification for a leave-out (‘business as usual’ or perhaps ‘minimal/basic package’) comparison group. Hence, no RCTs on parachutes.

Uncertainty implies that the impact results could plausibly, if not with fully equal likelihood, come back positive, negative, null or mixed. At least some of those outcomes imply that a program is not a good use of resources, if not actually generating adverse effects. Such a program, we might assume, should be stopped or swapped for some alternative intervention (see Berk’s comments here).

To move forward from the idea of uncertainty, the following two statements simply do not go together despite often being implicitly paired:

  1. We are uncertain about the effectiveness impact our intervention will bring about / cause, so we are doing an (any type of ex ante) IE.
  2. We plan to scale this intervention for everyone (implicitly, at least, because we believe it works – that is, the impacts are largely in the desired direction). Because of resource constraints, we will have to phase it in over time to the population.

Yes, the second point could be and is carried on to say, ‘this offers a good opportunity to have a clean identification strategy and therefore to do IE.’ But this doesn’t actually square the circle between the two statements. It still requires the type of sleight of hand around the issue of uncertainty that I raised here about policy champions..

Unless there are some built-in plans to modify (or even cancel) the program along the phase-in process, the ethics of statement 2 rests solely on the resource constraint (relative to actual or planned demand), not on any variant of equipoise. This is an important point when justifying the ethics of ex ante IE. And it is worth noting how few development programs have been halted because of IE results. It would be a helpful global public good if someone would start compiling a list of interventions that have been stopped, plausibly, because of IE outcomes, perhaps making note of the specific research design used. Please and thank you.

Moreover, unless there is some built-in planning about improving, tweaking or even scrapping the program along the way, it is not clear that the ex ante IE based on a phase-in design can fully claim to be policy relevant. This is a point I plan to elaborate in a future post but, for now, suffice it to say that I am increasingly skeptical that being about a policy (being ‘policy adjacent’ by situating a study in a policy) is the same as informing decisions about that policy (being ‘decision relevant’).

To me, the latter has stronger claims on being truly policy relevant and helping making wise and informed decisions about the use of scarce resources – which I think is the crux of this whole IE game anyway. IEs of phase-in designs without clear potential for mid-course corrections (i.e. genuine decision points) seem destined for policy adjacency, at best. Again, the underlying premise of a phase-in design is that it is a resource constraint, not an evidence constraint, which is dictating the roll-out of the program. But the intention to make a decision at least partly based on the evidence generated by an IE again rests on the premise of ex ante uncertainty about the potential for (the most cost-efficient) impact.

To come back to the issue of equipoise and phase-in designs: if the ethics of much of the work we do rests on a commitment to equipoise, then more needs to be done to clarify how we assess it and whether IRB/ethics review committees take it seriously when considering research designs. What information does a review board need to make that assessment?

Moreover, it requires giving a good think on what types of research designs align with the agreed concept of equipoise (whichever that may be). My sense is that phase-in designs can only be commensurate with the idea of equipoise if they are well-conceived, with well-conceived indicating that uncertainty about impact is indeed recognized and contingencies planned for in a meaningful way – that is, that the intervention can be stopped or altered during the phase-in process.

* I don’t propose to settle this debate between clinical and policy equipoise here, though I am sympathetic to the policy equipoise argument (and would be more so if more ex ante IEs tended towards explicitly testing two variants of an intervention against one another to see which proves the better use of resources moving forward – because forward is the general direction people intend to move in development).

wait, we’re experimenting, right?

many of the descriptions of the ideal next World Bank president – at least the ones with which I agree – have called for a little more humility about how much we actually know about economic & human development and poverty reduction.

so it’s frustrating to see articles like this, which imply a low level of humility about the work we are doing and an unclear commitment to learning what actually does and does not work (regardless of felt commitment to poverty reduction & development).

a large part of the reason that experiments and impact evaluations in development have become popular is that we weren’t getting as far as we needed with theory, intuition or observation alone. money and other resources were being put into programs when we don’t know if they are effective (even if things seemed to be changing in the presence of the program), let alone how they compared to other programs in terms of efficacy or cost-efficiency. process and implementation evaluations that could have improved subsequent program interventions were not being conducted and/or shared.

it seems like we need to pause and think about how and why we are experimenting.

  • we experiment because we don’t know what works – or whether something that works in one location will work in another. if we knew what worked, we would potentially be under some ethical obligation to do that thing for all people in all places we thought it would work. when we don’t know what works, or when there is at least genuine disagreement about the best approaches, an experimental design is justified. in short, we need to bring equipoise into social science research. in part, this means that we should be testing our new (experimental) idea against the best known or available intervention with a similar goal. new drugs are usually tested against a placebo and a regularly used treatment.
  • because we are experimenting, we should encourage the publication of null findings and laud these as equally important learning experiences. this requires funders recognizing such reporting as essential for reporting on the accountability of studies and program implementations.  it also requires changing the strong bias of journal editors and reviewers to only publish significant findings. confidence intervals aside, null findings may be just as “significant” for our understanding of what works and doesn’t work in development as reporting statistically significant results.
  • evaluations probably need to start to look more like programs that could be scaled-up. there are good experimental reasons for manipulating only one or two key variable(s) at  time and trying to limit all other contamination, but there has to be increasing movement toward learning what works in situ, even if that means there is more than one moving part. and if it is really unclear how the findings from an experiment would be scaled-up in a program or policy, then the experiment likely needs to be re-thought.
  • also, we need to think more about the ethics of doing social science experiments in low- and middle-income countries. there are increasing obligations for clinical research by large pharmaceutical or academic institutions, if the drug proves effective, to – at a minimum – make the drug available to – again, at a minimum – the host community. this is because the host community bore some risk in participating in an experimental intervention — but more generally because any intervention alters biological and social patterns that will remain changed after the trial ends and the researchers leave the community to publish their results in scientific journals.
  • experimenting is good in a context in which we aren’t sure what works. NGO- and state-run programs need to be linked with evaluation efforts. there are roughly a bajillion graduate students interested in program evaluation, development economics, and so on and there are a large number of programs that are being run by governments or NGOs without any rigorous evaluation or clear delineation of ‘lessons learned’ – or at least evaluations that get talked about in the technocratic space. none of these programs will offer a perfect experimental design but, hey, that’s where the complex statistics come in. all we need is a yenta to link grad students to programs (and evaluation funding) and we’re set.
  • experiments, programs, policies, etc, need to allow us to learn about the implementation process as well as the outcomes. deviations from initial design and unexpected hurdles along the way should be reported so that everyone can learn from them. yes, the reality of actually running these programs may make it more difficult to make causal inference with certainty – but these aren’t just aberrations in an experimental design, they’re part of the reality into which any scaled-up effort would be plugged. this is similar to the distinction between “efficacy” and “effectiveness” in clinical research. knowing how an intervention performs under ideal experimental conditions (efficacy) may not tell us how the same intervention program performs applied under real world circumstances or scaled up to other communities (effectiveness).
  • replication is central to the natural sciences, but still largely under-utilized the social sciences and development research. but we need to recognize the importance of replication in confirming or dis-confirming the results from program implementation studies and encourage greater publication of replication studies.

*see, for example, “Moral standards for research in developing countries: from ‘reasonable availability’ to ‘fair benefits’” or “What makes clinical research in developing countries ethical? The benchmarks of ethical research”

*other inspiration

**big thanks to Mike for helping me sort through a lot of the ideas in this post