pipeline designs and equipoise: how can they go together?

i am writing about phase-in / pipeline designs. again. i’ve already done it here. and more here. but.

.

the premise of a pipeline or phase-in design is that groups will be randomized or otherwise experimentally allocated to receive a given intervention earlier or later. the ‘later’ group can then serve as the comparison for the ‘early’ group, allowing for a causal claim about impact to be made. i am specifically talking about phase-in designs premised on the idea that the ‘later’ group is planned (and has perhaps been promised) to receive the intervention later. i take this to be a ‘standard’ approach to phase-in designs.

.

i’d like to revisit the issue of phase-in designs from the angle of equipoise, which implies some sense of uncertainty about the causal impact of a given intervention. this uncertainty provides the justification for studying making use of an ex ante impact evaluation. equipoise literally translates to equal weight / force / interest. here, the force in question is the force of argument about the impact of an intervention and which direction it will go (or whether there will be one at all).

.

there have already been some great conversations, if not decisive answers, as to whether, in social science research, the justification for using experimental allocation of an intervention needs to meet the standards of clinical equipoise or policy equipoise.* the key difference is the contrast between ‘a good impact’ (clinical equipoise) and ‘the best impact achievable the resources’ (policy equipoise). in either case, it is clear that some variant of equipoise is considered a necessary justification. for theoretical and/or empirical reasons, it just isn’t clear whether an intervention is (a) good (investment).

.

whichever definition of equipoise you pursue, the underlying premise is one of a genuine uncertainty and an operational knowledge gap about how well a certain intervention will work in a certain setting at a certain point in time and at what degree of relative resource efficiency. this uncertainty is what lends credibility to an ex ante impact evaluation (IE) and the ethical justification for a leave-out (‘business as usual’ or perhaps ‘minimal/basic package’) comparison group. hence, no RCTs on parachutes.

.

uncertainty implies that the impact results could plausibly, if not with fully equal likelihood, come back positive, negative, null or mixed. at least some of those outcomes imply that a program is not a good use of resources, if not actually generating adverse effects. such a program, we might assume, should be stopped or swapped for some alternative intervention (see berk’s comments here).

.

to move forward from the idea of uncertainty, the following two statements simply do not go together despite often being implicitly paired:

  1. we are uncertain about the effectiveness impact our intervention will bring about / cause, so we are doing an (any type of ex ante) IE.
  2. we plan to scale this intervention for everyone (implicitly, at least, because we believe it works – that is, the impacts are largely in the desired direction). because of resource constraints, we will have to phase it in over time to the population.

.

yes, the second point could be and is carried on to say, ‘this offers a good opportunity to have a clean identification strategy and therefore to do IE.’ but this doesn’t actually square the circle between the two statements. it still requires the type of sleight of hand around the issue of uncertainty that I raised here about policy champions.

.

unless there are some built-in plans to modify (or even cancel) the program along the phase-in process, the ethics of statement 2 rests solely on the resource constraint (relative to actual or planned demand), not on any variant of equipoise. this is an important point when justifying the ethics of ex ante IE. and it is worth noting how few development programs have been halted because of IE results. it would be a helpful global public good if someone would start compiling a list of interventions that have been stopped, plausibly, because of IE outcomes, perhaps making note of the specific research design used. please and thank you.

.

moreover, unless there is some built-in planning about improving, tweaking or even scrapping the program along the way, it is not clear that the ex ante IE based on a phase-in design can fully claim to be policy relevant. this is a point I plan to elaborate in a future post but, for now, suffice it to say that I am increasingly skeptical that being about a policy (being ‘policy adjacent’ by situating a study in a policy) is the same as informing decisions about that policy (being ‘decision relevant’).

.

to me, the latter has stronger claims on being truly policy relevant and helping making wise and informed decisions about the use of scarce resources – which I think is the crux of this whole IE game anyway. IEs of phase-in designs without clear potential for mid-course corrections (i.e. genuine decision points) seem destined for policy adjacency, at best. again, the underlying premise of a phase-in design is that it is a resource constraint, not an evidence constraint, which is dictating the roll-out of the program. but the intention to make a decision at least partly based on the evidence generated by an IE again rests on the premise of ex ante uncertainty about the potential for (the most cost-efficient) impact.

.

to come back to the issue of equipoise and phase-in designs: if the ethics of much of the work we do rests on a commitment to equipoise, then more needs to be done to clarify how we assess it and whether IRB/ethics review committees take it seriously when considering research designs. what information does a review board need to make that assessment?

.

moreover, it requires giving a good think on what types of research designs align with the agreed concept of equipoise (whichever that may be). my sense is that phase-in designs can only be commensurate with the idea of equipoise if they are well-conceived, with well-conceived indicating that uncertainty about impact is indeed recognized and contingencies planned for in a meaningful way – that is, that the intervention can be stopped or altered during the phase-in process.

.

* i don’t propose to settle this debate between clinical and policy equipoise here, though I am sympathetic to the policy equipoise argument (and would be more so if more ex ante IEs tended towards explicitly testing two variants of an intervention against one another to see which proves the better use of resources moving forward – because forward is the general direction people intend to move in development).

on science, from eula biss’s on immunity.

a nice reminder from eula biss (via on immunity: an inoculation) that science is a series of building blocks, with small tests and then bigger ones to see if each brick helps us reach higher and see farther.

Science is, as scientists like to say, “self-correcting,” meaning that errors in preliminary studies are, ideally, revealed in subsequent studies. One of the primary principles of the scientific method is that the results of a study must be reproducible. Until the results of a small study are duplicated in a larger study, they are little more than a suggestion for further research. Most studies are not incredibly meaningful on their own, but gain or lose meaning form the work that has been done around them… This doesn’t mean that published research should be disregarded but that, as John Ioannidis concludes, “what matters is the totally of the evidence” (p. 133)…

Thinking of our knowledge as a body suggests the harm that can be done when one part of that body is torn from its context. Quite a bit of this sort of dismemberment goes on in discussions about vaccination, when individual studies are often used to support positions or ideas that are not supported by the body as a whole… When one is investigating scientific evidence, on must consider the full body of information (p. 135).

center and peripherary in doing development differently

i have spent almost three weeks back in TX, which was supposed to be, in part, a time of immense productivity in front of our fireplace (yes, it is chilly here. probably enough to warrant a fire but still.) i brought home big batches of reading but am taking back far more of it with me to delhi than i had planned.

.

nevertheless, i did finally make it through duncan green’s post on his immediate thoughts on doing development differently from matt andrews and team. so, that’s only three months behind schedule.

.

many things are, of course, striking and exciting about this movement, including the idea of rapid iterations to promote (experiential) learning and tweaks, the importance of morale and relationships, and the time horizon.

.

but the most striking thing had to do with immersion, deep study and deep play*.

deep study of the system, based on continuous observation and listening. In Nicaragua, UNICEF sent public officials out to try and access the public services they were administering, and even made the men carry 30lb backpacks to experience what it’s like being pregnant! This is all about immersion, rather than the traditional ‘fly in, fly out’ consultant culture.

the idea is, it seema, to strike a blow at the ‘consultant culture’ of folks from dc, london and geneva parachuting in to solve problems (there’s probably an interesting discussion to be had about the relevance of area studies in this approach). but that is for another time. what is most immediately striking is that duncan doesn’t report on unicef folks making consultants visiting nicaragua from NYC head out to remote areas and try to access services with pregnant-backpacks.

.

if i read the anecdote correctly (is there more written about this somewhere?), the target was public officials, which i take to mean nicaraguan civil servants and politicians based in the capital or another metropole. which is an important (re-)lesson. being from X country doesn’t automatically make you knowledgeable about all areas and details of X country (duh). probably many of us have sat with civil servants who talk about ‘the hinterlands’ and ‘backwards’ areas and who seem quite surprised at what they find there, if they visit at all. there is a vast difference between the high-level and the street-level, between big decisions about adopting and championing a policy and the many small decisions involved in implementing that idea. implementation is, as always, profoundly local. (this idea, incidentally, also applies to study design and the relationships between PIs, their research assistants and the field teams.)

.

this all suggests that, maybe, doing development differently (and probably doing evaluation differently) also has to do with shifting ideas about center and periphery (globally as well as nationally), about who has relevant knowledge, and thinking about immersion for program designers and decision-makers of a variety of types, whether from the country in question or not. this, in part, raises questions about who is doing the iteration and learning and how lessons are passed up as well as down different hierarchies (and spread horizontally). looking forward to hearing and thinking more.

.

*it’s hard to resist a geertz reference, since ‘continual observation and listening’ sounds an awful lot like ‘participant-observation,’ a study technique that almost *never* comes up in “mixed-methods’ evaluation proposals.

Win a prize guessing how much trial registration reduces publication bias!

Originally posted on Berkeley Initiative for Transparency in the Social Sciences:

Does trial registration make an impact on publication bias? Knowing the answer could earn you a cash prize!


Macartan Humphreys  (Columbia, Political Science) and collaborators  Albert Fang  and  Grant Gordon  are doing research on how publication (and publication bias) changed after the introduction of registration in clinical trials. They also want you to guess what the changes were. The bidder with the closest guess will win a $200 cash prize.  Click here  to read more and enter a guess.

Enthusiastic supporters of research transparency are often keen on advocating for the registration of trial experiments. But in the social sciences the practice remains fairly rare and its impact on publication bias is relatively unknown. Fortunately, social scientists can learn from their peers in the medical sciences who have been required to register their medical trials since 2005. The research of Humphreys et al. will look to see if there was a…

View original 39 more words

buffet of champions: what kind do we need for impact evaluations and policy?

this post is also cross-posted here.

i realize that the thesis of “we may need a new kind of champion” sounds like a rather anemic pitch for guardians of the galaxy. moreover, it may lead to inflated hopes that i am going to propose that dance-offs be used more often to decide policy questions. while i don’t necessarily deny that this is a fantastic idea (and would certainly boost c-span viewership), i what to quickly dash hopes that this is the main premise of this post. rather, i am curious why “we” believe that policy champions will be keen on promoting and using impact evaluation (and subsequent evidence syntheses of these) and to suggest that another range of actors, which i call “evidence” and “issue” champions may be more natural allies. . there has been a recurring storyline in recent literature and musings on (impact) evaluation and policy- or decision-making:

  • first, the aspiration: the general desire of researchers (and others) to see more evidence used in decision-making (let’s say both judgment and learning) related to aid and development so that scarce resources are allocated more wisely and/or so that more resources are brought to bear on the problem.
  • second, the dashed hopes: the realization that data and evidence currently play a limited role in decision-making (see, for example, the report on the evidence on evidence-informed policy-making as well as here).
  • third, the new hope: the recognition that “policy champions” (also “policy entrepreneurs” and “policy opportunists”) may be a bridge between the two.
  • fourth, the new plan of attack: bring “policy champions” and other stakeholders in to the research process much earlier in order to get up-take of evaluation results into the debates and decisions. this even includes bringing policy champions (say, bureaucrats) on as research PIs.

. there seems to be a sleight of hand at work in the above formulation and it is somewhat worrying in terms of equipoise and the possible use of the range of results that can emerge from an impact evaluation study. said another way, it seems potentially at odds with the idea that the answer to an evaluation is unknown at the start of the evaluation. . while i am not sure that “policy champion” has been precisely defined (and, indeed, this may be part of the problem), this has been done for the policy entrepreneur concept. so far as i can tell, the first time to articulate the entrepreneurial (brokering, middle-man, risk-taking) role in policy-making comes from david e. price in 1971. the idea was repeated and refined in the 1980s and then became more commonplace in 1990s’ discussions of public policy, in part through the work of john kingdon. (there is also an formative and informative 1991 piece by nancy roberts and paula king.) much of the initial discussion, it seems, came out of studying US national and state-level congressional politics but the ideas have been repeatedly shown to have merit in other deliberative settings. much of the initial work also focused on agenda-setting — which problems and solutions gain attention — but similar functions are also important in the adoption and implementation of policy solutions. . kingdon is fairly precise about the qualities of a policy entrepreneur — someone who has, as kingdon calls it, a pet policy that they nurture over years, waiting for good moments of opportunity to suggest their policy as the solution to a pressing problem.

  • first, such a person must have a “claim to a hearing” — that is, at least behind-the-scenes, people must respect and be willing to be listen to this person on this topic (especially if this person is not directly in a position with decision-making power).
  • second, such a person must have networks and connections as well as an ability to bargain and negotiate within them. this is a person that can broker ideas across diverse groups of people, can “soften-up” people to the entrepreneur’s preferred policy solution, etc.
  • third, such a person must have tenacity, persistence and a willingness to risk personal reputation and resources for a policy idea.

in kingdon’s and others’ conception, a policy entrepreneur has to work at selling their idea over a long period of time (which is presumably why weissert (1991) also introduced the idea of policy opportunists, who only start to champion ideas once they make it to the deliberating table and seem likely to move forward.) . in short, policy entrepreneurs (and through the sloppy use of near-synonyms, policy champions,) believe strongly in a policy solution and for some reason and have put in time, effort, and reputation into moving the idea forward. note the nebulous use of “some reason” — i have not found a definition that specifies that policy entrepreneurs must come to promote a policy through a particular impetus. glory, gold, god, goodness, and (g’)evidence also seem to be viable motivators to fit the definition. . my question is: is this what we need to support the use of research (and, specifically impact evaluations and syntheses thereof) on decision-making. it is not clear to me that we do. policy entrepreneurs are people already sold on a particular policy solution, whereas the question behind much evaluation work is ‘is this the best policy solution for this context?’ (recognizing the importance of contextual and policy, if not clinical, uncertainty about the answer in order for an evaluation to be worthwhile. . it seems to me, then, that what we (researchers and evaluators) actually need, then, are people deeply committed to one of two things: (1) the use of data and evidence, in general, (“evidence champions” or, at least loosely, technocrats) as an important tool in sound decision-making and/or (2) a particular issue or problem (“issue champions” — no doubt a sexier phrase is available). i’ll spend more time on the second. . an “issue champion,” for example, may be someone who has similar qualities of a policy entrepreneur but, rather than using claims to a hearing, a network, and tenacity to bring forward a policy solution, s/he uses these tools to bring attention to a problem — say, malaria mortality. this person feels that malaria is a problem that must be solved — and is open to finding the most (cost-) effective solution to the problem (or means to do a good job with implementing that solution). s/he is not, by contrast, someone already committed to believing that prevention, diagnostics, or treatment in any particular form or at any particular price are the best way forward until s/he has seen evidence of this in a relevant context. this is different from a “policy champion” who has, for example, been pushing for universal bednet coverage for the past 20 years. . this is not to say that you don’t want the bednet champion to be well aware of your study and to even have input into defining the research questions and approving the research design (in fact, this seems vital in lending credibility and usefulness to the results). but, the way the study is structured will be important to whether the bednet champion is open to taking up the range of possible results from your study. if your question is: does approach A or approach B result in more efficient distribution of bednets, then yes, both sets of results will be interesting to the bednet champion. but if the question is more of the type: are bednets the most cost-effective approach to addressing malaria mortality in our country? then the bednet champion is likely to only be particularly interested in trumpeting about one set of results: those that are significantly in favor of bednets as a solution to the malaria problem. the malaria champion (or general evidence enthusiast), on the other hand, may be more open to thinking about how to interpret and use the range of possible results from the study, which may also be mixed, inconclusive, or even negative. (throughout this discussion, i recognize that malaria, like all problems in human and economic development, don’t have silver bullet answers and that, therefore, “A or not-A”-type evaluation questions will only get us so far in getting the right mix of tools in the right place at the right time. i.e. the answer is likely neither that bednets do not good nor that they are the only thing needed to tackle malaria.) . the worrisomeness, then, of the policy champion is that they are already committed to a policy solution. will they change their mind on the basis of one study? probably not (nor, necessarily, should they. but a meta-analysis may not sway them either.) but insofar as “we” want decision-makers to learn about our evidence and to consider it in the deliberations, it may be issue, rather than policy, champions that are particularly important. they may make use of the results regardless of what they are. we cannot necessarily expect the same of the policy champion. of course, a small army of evidence champions is also helpful. . i do want to stress that it is critical to have policy champions and other stakeholders involved early in the research-design process, so that the right questions can be asked and the politically and contextually salient outcomes and magnitudes considered. but as an ally in the evaluation process and, say, a potential PI on an evaluation, it seems that the issue champions are the folks likely to stick with it. . and, yes, issue champions should probably have some moves ready, in case of a dance-off (as there will always be factors beyond evidence and data influencing decisions).

hey, you’re sick? can we know where you live? (simple experiments, complicated designs)

i have been working on a blog for 3ie (*now posted, here!), based on a delhi seminar talk i gave recently, in turn based on this paper and this blog on using text messages to increase completion of anti-malarial treatment. not all of the material i initially wrote will appear in the final blog — which is good, it is more focused now. nevertheless, i thought i would share the excised part that i wrote about the actual experimental set-up. part of the point i made in the seminar — and which i intend to make in this blog — was that as a ‘simple’ and ‘hands-off’ intervention (meaning minimal input from either the surveyors or our hosts, medical vendors, in encouraging or facilitating take-up) can require an all-the-more complicated study design. .

..

rainy season!

rainy season!

the study was meant to be a proof-of-concept — it is the first randomized evaluation of a direct-to-patient (rather than to health workers) text messaging program for malaria in sub-saharan africa. as guided by our funder, CHAI (as an operational research project for the Affordable Medicines Facility – malaria (AMFm), we wanted to learn about the minimum of supportive moving parts required to get patients enrolled into a text messaging system of reminders to complete education. a relatively simple mHealth intervention – much of the intervention work was done by a computer, a modem, some lines of code, and occasionally an electric generator – it was the rainy season after all) – required a fairly intricate experimental set-up.

.. .

diligently organizing follow-up

diligently organizing follow-up

it was a deviation for ipa-ghana in terms of not-household-or-firm-based-surveying and it took a lot of logistical coordination (as shown in the photo), attention to timetables, and a willingness on the part of the surveyors to set out without perfectly clear instructions.  .. .

.

things worked roughly like this: with participating anti-malarial vendors, we (the research team) worked to maintain a façade of a separate flyer system to invite people to enroll in a mobile health information system, distinct from the surveyor talking to people obtaining an antimalarial medication. during that interview about the medicines purchased and price paid, if the respondent proved willing and eligible, the surveyor asked how we could find them at home. and for their phone number. and for directions to a neighbor’s house. and the neighbor’s phone. (i have written previously about the fun of trying to track people down based on the directions they give.) just in case we decided to follow-up to see how they were doing. and the surveyors were purposefully cagey about whether and when this would be happening and to what end. .

.

[i am still amazed that so many people agreed to this (thank you, participants!) — if someone approached me in a drug store while i was sick and asked me to chat and then how to find me at home, even my deepest beliefs in survey karma would probably be abandoned in favor of some fairly unbecoming language.] . .

.

but, in fact, we had a very precise purpose and timing in mind: to find the respondent between ~58 and 72 hours after we met them in a drug shop. with some basic assumptions about when they would start their antimalarial medication, we wanted to land up at people’s homes just after they had completed their medication. respondents often didn’t give very precise directions to their homes, resulting in a lot of motorcycle-mounted wild goose chases – but we also achieved a follow-up rate of 99% within our specified time window. phew! . this is really a tribute to the commitment and perseverance of the survey team and our field managers.