Aside

“Politically Robust” Experimental Design in Democracies and a Plea For More Experience Sharing

Sometimes I re-read a paper and remember how nice a sentence or paragraph was (especially when thinking that a benevolent or benign dictator might make research so much easier, as though easy was the main goal of research).

So it is with the paper by Gary King and colleagues (2007) on “a ‘politically robust’ experimental design for public policy evaluation, with application to the mexican universal health insurance program”.

Scholars need to remember that responsive political behavior by political elites is an integral and essential feature of democratic political systems and should not be treated with disdain or as an inconvenience. Instead, the reality of democratic politics needs to be built into evaluation designs from the start — or else researchers risk their plans being doomed to an unpleasant demise. thus, although not always fully recognized, all public policy evaluations are projects in both political science and political science.

What would be nice is if researchers would share more of their experiences and lessons learned not just in robust research design (though this is critical) but also in working to (and failing to) persuade local political leaders to go along with randomization schemes and to potentially hold off any kind of scale-up until the results are in… and only if they are promising!

Advertisements

Pipeline Designs and Equipoise: How Can They Go Together?

I am writing about phase-in / pipeline designs. Again. I’ve already done it here. and more here. but.

The premise of a pipeline or phase-in design is that groups will be randomized or otherwise experimentally allocated to receive a given intervention earlier or later. The ‘later’ group can then serve as the comparison for the ‘early’ group, allowing for a causal claim about impact to be made. I am specifically talking about phase-in designs premised on the idea that the ‘later’ group is planned (and has perhaps been promised) to receive the intervention later. I take this to be a ‘standard’ approach to phase-in designs.

I’d like to revisit the issue of phase-in designs from the angle of equipoise, which implies some sense of uncertainty about the causal impact of a given intervention. This uncertainty provides the justification for studying making use of an ex ante impact evaluation. Equipoise literally translates to equal weight / force / interest. Here, the force in question is the force of argument about the impact of an intervention and which direction it will go (or whether there will be one at all).

There have already been some great conversations, if not decisive answers, as to whether, in social science research, the justification for using experimental allocation of an intervention needs to meet the standards of clinical equipoise or policy equipoise.* The key difference is the contrast between ‘a good impact’ (clinical equipoise) and ‘the best impact achievable the resources’ (policy equipoise). In either case, it is clear that some variant of equipoise is considered a necessary justification. For theoretical and/or empirical reasons, it just isn’t clear whether an intervention is (a) good (investment).

Whichever definition of equipoise you pursue, the underlying premise is one of a genuine uncertainty and an operational knowledge gap about how well a certain intervention will work in a certain setting at a certain point in time and at what degree of relative resource efficiency. This uncertainty is what lends credibility to an ex ante impact evaluation (IE) and the ethical justification for a leave-out (‘business as usual’ or perhaps ‘minimal/basic package’) comparison group. Hence, no RCTs on parachutes.

Uncertainty implies that the impact results could plausibly, if not with fully equal likelihood, come back positive, negative, null or mixed. At least some of those outcomes imply that a program is not a good use of resources, if not actually generating adverse effects. Such a program, we might assume, should be stopped or swapped for some alternative intervention (see Berk’s comments here).

To move forward from the idea of uncertainty, the following two statements simply do not go together despite often being implicitly paired:

  1. We are uncertain about the effectiveness impact our intervention will bring about / cause, so we are doing an (any type of ex ante) IE.
  2. We plan to scale this intervention for everyone (implicitly, at least, because we believe it works – that is, the impacts are largely in the desired direction). Because of resource constraints, we will have to phase it in over time to the population.

Yes, the second point could be and is carried on to say, ‘this offers a good opportunity to have a clean identification strategy and therefore to do IE.’ But this doesn’t actually square the circle between the two statements. It still requires the type of sleight of hand around the issue of uncertainty that I raised here about policy champions..

Unless there are some built-in plans to modify (or even cancel) the program along the phase-in process, the ethics of statement 2 rests solely on the resource constraint (relative to actual or planned demand), not on any variant of equipoise. This is an important point when justifying the ethics of ex ante IE. And it is worth noting how few development programs have been halted because of IE results. It would be a helpful global public good if someone would start compiling a list of interventions that have been stopped, plausibly, because of IE outcomes, perhaps making note of the specific research design used. Please and thank you.

Moreover, unless there is some built-in planning about improving, tweaking or even scrapping the program along the way, it is not clear that the ex ante IE based on a phase-in design can fully claim to be policy relevant. This is a point I plan to elaborate in a future post but, for now, suffice it to say that I am increasingly skeptical that being about a policy (being ‘policy adjacent’ by situating a study in a policy) is the same as informing decisions about that policy (being ‘decision relevant’).

To me, the latter has stronger claims on being truly policy relevant and helping making wise and informed decisions about the use of scarce resources – which I think is the crux of this whole IE game anyway. IEs of phase-in designs without clear potential for mid-course corrections (i.e. genuine decision points) seem destined for policy adjacency, at best. Again, the underlying premise of a phase-in design is that it is a resource constraint, not an evidence constraint, which is dictating the roll-out of the program. But the intention to make a decision at least partly based on the evidence generated by an IE again rests on the premise of ex ante uncertainty about the potential for (the most cost-efficient) impact.

To come back to the issue of equipoise and phase-in designs: if the ethics of much of the work we do rests on a commitment to equipoise, then more needs to be done to clarify how we assess it and whether IRB/ethics review committees take it seriously when considering research designs. What information does a review board need to make that assessment?

Moreover, it requires giving a good think on what types of research designs align with the agreed concept of equipoise (whichever that may be). My sense is that phase-in designs can only be commensurate with the idea of equipoise if they are well-conceived, with well-conceived indicating that uncertainty about impact is indeed recognized and contingencies planned for in a meaningful way – that is, that the intervention can be stopped or altered during the phase-in process.

* I don’t propose to settle this debate between clinical and policy equipoise here, though I am sympathetic to the policy equipoise argument (and would be more so if more ex ante IEs tended towards explicitly testing two variants of an intervention against one another to see which proves the better use of resources moving forward – because forward is the general direction people intend to move in development).

bringing in the state for experiments and development efforts — when? how?

there have been a lot of mentions of governments, experiments, ownership, & development in the past two weeks that sparked a few ideas in my head. the underlying theme is that we need to start considering not just the political economy of the contexts in which we work but how to actually bring political and economic considerations – and interests in long-term sustainability, accountability, and ‘ownership’ – into program design and implementation, as well as into the experiments to test those programs. i first consider lessons from a totally hypothetical RCT. then, two quick reviews of new programs related to public sector involvement in development efforts.

first, the political economy of experimentation: lessons stemming from a totally hypothetical nation-wide RCT run in conjunction with the government of a totally hypothetical state.

as suggested before, experiments, should often be designed with the (likely) ultimate implementer in mind – ideally, in consultation with them. because much of development deals with public goods and market failures, there is a good chance the state would ultimately be involved in an experiment-born or experiment-tested program/policy in order to bring it to scale, institutionalize it, & sustain it.

this requires experimental design to start to look more like program design. research design cannot substitute for program or policy design. program & policy design requires inputs from the local context and from the field of study & practice relevant to the content of the intervention.

these designs must also account for politics. working with the government – at a large and visible scale – means what once may have been a more ‘neutral’ or ‘local’ experiment is now political. who gets access to the intervention, when, and how (Lasswell, 1936) becomes critical. this will often be inherently at odds with randomized evaluation (King et al, 2007) (Fox & Reich, forthcoming). it’s not impossible, it just takes a good deal of planning & savvy in both program and research design & implementation.

even in the face of political, technical, and social challenges to implementation of the program & the study, there will always be important lessons to be learned — but only if solid monitoring is taking place and process evals are reported. all stakeholders need to be on board with that up front (tough!).

  • when does the government come in?
    • option 1: the experiment starts because of a program/policy the government wants to try or tweak. thus, it is government- or implementer-initiated. these experiments are, i believe, along the lines of what David Brooks suggests here.
    • option 2: researchers/technocrats have an idea that they want to try at a large-scale, and therefore approach the government. the gov/implementer should be on board before the design is finalized. they will have a much clearer idea than almost any researcher of what is politically feasible and what incentives will need to be offered to get the buy-in of on-the-ground implementers and bureaucratic structures. that is, the program design needs to be rigorous, with technical and theoretical aspects adjusted to the local political economy and an understanding of different stakeholders’ interests and capacity.
    • in either case: if the implementer seems unwilling to scale-up an idea in the design phase, the experiment should be re-considered and re-designed. if the government lacks capacity or is stretched too thin to scale-up and sustain the idea, then the extent to which capacity-building can and should be built into the experiment needs to be strongly considered (as well as whether an alternative approach is needed).
  • how can the state & other stakeholders be brought in?
    • a wide variety of stakeholders should be involved in deciding how to measure an acceptable level of progress and ‘success.’ all stakeholders (including politicians, bureaucrats) should articulate what will make the experiment worthwhile to them and what they would need to see in order to ‘be convinced’ that the program/policy is worth continuing to pursue. treatment effect will be far from the only thing that matters.
    • responsibility, political risk, & political timelines need to be discussed explicitly. having technocrats and researchers involved may shield politicians from some blame if the program does not work as intended – but a plan to laud politicians for engaging in evidence-building policy and to allow them to take some credit when things go well – will be important (Fox & Reich, forthcoming).
  • other lessons that could be learned from this hypothetical study:
    • for a variety of appearance & funding reasons – because of commitment to public sector involvement – we may want it to appear that the government is running the experimental program. but if it’s not what’s happening on the ground, re-assess and either build capacity or adapt the program to reality.
    • proof-of-concept experiments will still have a role in helping certain ideas seem less risky to stakeholders – because large-scale policy changes are risky, both in terms of domestic politics and global politics. these small-scale experiments are also the best testing ground for multiple treatment arms, rather than proliferating treatment arms when working at national-scale. however, even in these experiments, we should consider what outcomes will be politically relevant and what technical kinks we encounter, so that: (1) the results are more easily ‘sell-able’ to politicians and other implementers down the line and (2) it will be clearer what capacities and resources are necessary for the program/policy in question.

moving on from our hypothetical example (whew!)… on to two non-research-based examples of designing and implementing development efforts in ways to encourage state involvement and to increase its accountability & sustainability. what lessons about incentives & partnering could the experimental world – with an increasing eye toward scale-up and sustainability – learn from these?

the Gates Foundation has an interesting (and uncharacteristic) initiative on the table to get state governments in Nigeria more involved in vaccination and MDG efforts (h/t @KarenGrepin). this effort recognizes the key role of governmental leadership in the implementation of development programs (though these grants will go to the ‘implementing partner;’ states will be able to demonstrate ownership in part through co-funding awards projects (?)).

there are 11 process & outcome criteria on which programs will be judged, with awards going to the highest-performing state (also for the most-improved) in each of Nigeria’s geo-political zones. ultimately, it is not clear to me exactly how this will reward political effort or encourage the design of programs that can be institutionalized (vaccines – you have to keep giving them year after year!) – but kind of a cool idea to have all those government- & partner-designed pilots running simultaneously and rewarding some outcomes. it would be great to know what other sorts of evaluation will be going on outside of the 11 award criteria, so that we may all learn from challenges and successes.

meanwhile, Water for People (@NedBreslin) has an interesting commentary (h/t How Matters) on sustainability and ownership. NB suggests that fiscal and operational capacity and discipline of the implementers need to be central considerations in program design and implementation. NB suggests that programs and capital investments need to be designed and tailored to the demonstrated ability-to-pay of the community in question. that is, co-financing between a development agency (including researchers?), communities, and local government is required before new infrastructure was installed. the general idea seems to be this: ‘we (researchers(?)/philathropists/etc) want to work towards achieving X outcome in your [geographic area]; achieving X outcomes requires sustained effort and investment. we have a mix-and-match menu of ways to put together a program to work towards X goal; we should determine which option will be best for this community based on, in part, your ability to invest some money up front, to the extent that your payments will be required to continue to achieve X over the long-term.’ NB goes on to say, “free projects facilitate corruption… [gov] funding starts to be allocated by governments if NGOs (researchers?) use their finances as leverage, in financial partnership with host-country governments, rather than absolving them of their financial and developmental responsibilities.”

as experiments become more like programs (and programs more like experiments), and take place at larger scales and in greater collaboration with the public sector, we need to think carefully about what is required for sound program design and implementation, such that we can still have impact after an experiment ends (though, of course, long-term measurement of process & outcomes should be supported).

experimenting with intention

this post revisits some issues i have touched on before.

first of all, good find by roving bandit. the gist is that an experimental program undertaken in ‘ideal’ (NGO-run) conditions did not show any effect when the same program was run by the government. oops.

i think this raises several possible questions related to carrying out experiments (i am sure there are more than i cover below):

  • before undertaking (getting funding for) an experimental intervention, how clear should we be on who would be sustaining the effort and/or taking it to scale? what kind of agreement would need to be in place? would we have some effect-size threshold that would mean that we would aim for scale and sustainability, below which an idea is scrapped?
  • how do we distinguish between proof-of-concept studies and if-this-works-it’s-going-to-scale studies? how many replications of the former would we want before we did the latter?
  • how involved should the putative implementer be in the design & conduct of the experiment?
  • how much training and capacity building with the future implementer should be built into the experimental process? would we start to consider ethical requirements in this regard (i.e. experimenters have some obligation to train as well, as needed)?
  • if something doesn’t work, what responsibility do we have to help enhance the public sector’s (or other implementer’s) capacity? i.e. is the response to a null finding a scrapping of the idea or a re-tooling of the implementer? or something else?
  • how much more process evaluation & monitoring should be put in place in ‘in situ’ experiments so that we can learn more about precisely went right and wrong in implementation? how can we encourage the publication and sharing of these results, not just the treatment effect? (i swear i have an ‘in praise of process evaluation’ post coming soon. i have to atone for all the times i have denigrated it.)
  • even when a program doesn’t work, how do we make sure that the public sector (or other) implementer doesn’t get blamed for the effort and reward honesty instead of only exciting results?