further thoughts on phase-in/pipeline designs for causal inference

not long back, i put down my thoughts (here) about pipeline or phase-in designs. my basic premise is that while they may allow for causal inference, it is not clear that they are usually designed to allow generated evidence to be used where it is most relevant — to that program itself. that seems bad from an evidence-informed decision-making point of view and potentially questionable from an ethical point of view.

i raised this issue during a recent conversation on the development impact blog about the ethics of randomization. i reproduce my comment and berk ozler‘s kind reply, below.

 

me

usually, the appealing premise of a phased-in design is that there is some resource constraint that would prevent simultaneous scale-up in any case. in this scenario, no matter how heavy the burden of waiting, there will be to be some rationing. In which case, why not randomization rather than something else, like patronage?

then things get odd. the suggestion seems to be that we may know, ex ante, that at least some types of people (elderly, immune-compromised) will benefit greatly from immediate receipt of the treatment. In which case, we are not in equipoise and whether an RCT (or at least unconditional randomization) is appropriate in any case. things, of course, get trickier when a resource constraint is not binding simultaneous scale-up.

second, I feel we should reflect on the purpose and ethics of a phased-in design, especially one with full information. again, a resource constraint may make it politically acceptable for a governor to say that she will roll-in health insurance randomly across the state, which can allow an opportunity to learn something about the impact of health insurance. so, she stands up and says everyone will get (this) health insurance at some point and here’s the roll-out schedule.

but the reason for making use of this randomization is to learn if something works (because we genuinely aren’t sure if it will, hence needing the experiment) and maybe to have ‘policy impact’. so what if what is learnt from comparing the Phase I and Phase II groups is that there is no impact, the program is rubbish or even harmful? or, at a minimum, it doesn’t meet some pre-defined criterion of success. is the governor in a position to renege on rolling out the treatment/policy because of these findings? does the fine print for everyone other than those in Phase I say “you’ll either get health insurance, or, if the findings are null, a subscription to a jelly-of-the-month club”? in some ways, a full-disclosure phased roll-in seems to pre-empt and prevent policy learning and impact *in the case under study* because of the pre-commitment of the governor.

i find that phased roll-in designs without a plan to pause, analyse, reassess and at least tweak the design between Phases I and II to be ethically troubling. i’d be interested in your thoughts.

 

berk

in economics, unlike in medicine, many times the programs we have involve transferring something to individuals, households, or communities (assets, information, money, etc.). without negative spillovers, we don’t think of these as ever not increasing individual welfare, at least temporarily: if i give you a cow, this is great for you. if you don’t like it, sell it: your individual welfare will increase (would have been even higher if i just gave you the cash).

but, what if my program’s goal is not a temporary jump in your welfare, but you escaping poverty as close to permanently as possible? the program could be deemed unsuccessful even though it raised welfare of its beneficiaries for a short period.

the point is, it does seem wrong to break your promise to give something (something people would like to have) to people who drew Phase II in the lottery because you deemed your program unsuccessful for reaching its goals. you promised people that you’d give them the treatment at the outset, so i’d argue that if you’ll break your promise you have to give them something at least as good if not better. if you can come up with this (and the phase II group is happy with your decision), perhaps they can even become your phase I group in a new experiment — in a process where you experiment, tweak, experiment again, … kind of like what Pritchett et al. argue we should do: lot more experiments not less…

thinking of your examples. with the Oregon healthcare reform, it would be hard to push a stop or pause button with legislation. government action takes time and there is the credibility of your policymakers at stake. i don’t think you could really argue for a stop/pause because those impacts (even if unequivocal) are considered too small to treat the lottery losers.

in the case of a project that is giving cows, i am more optimistic: it might be possible for the project to find an alternative treatment that is of equal or higher value, that is acceptable to the phase II group, and that is feasible to roll out quickly. in such cases, i could see a tweak of the intervention between the two phases.

have evidence, will… um, erm? (3 of 6, relevent reasons)

this is a joint post with suvojit. it is also posted on people, spaces, deliberation.

.

in our last post, we wrote about factors – evidence and otherwise – influencing decision-making about development programmes. to do so, we have considered the premise of an agency deciding whether to continue or scale a given program after piloting it and including an accompanying evaluation commissioned explicitly to inform that decision. this is a potential ‘ideal case’ of evidence-informed decision-making. yet, the role of evidence in informing decisions is often unclear in practice.

what is clear is that transparent parameters for making decisions about how to allocate resources following a pilot may improve the legitimacy of those decisions.  we have started, and continue in this post, to explore whether decision-making deliberations can be shaped  ex ante so that, regardless of the outcome, stakeholders feel it was arrived at fairly. such pre-commitment to the process of deliberation could carve out a specific role for evidence in decision-making. clarifying the role of evidence would inform what types of questions decision-makers need answered and with what kinds of data, as we discussed here.

in considering deliberative processes, we are guided by Daniels’s, normative “accountability for reasonableness” framework (A4R). Daniels proposes four criteria to bring legitimacy to deliberations and, he argues, consequent fairness to the decision.

.

relevant reasons

this post focuses on the first A4R criterion: the “relevant reasons” that, when considered, allow “the minority [to] at least assure itself that the preference of the majority rests on the kinds of reason that even the minority must acknowledge appropriately plays a role in the deliberation.” our goal is not to assert which reasons, including evidence, provide legitimate grounds for deliberation.

rather, we outline possible categories of reasons that may, ex ante, be placed on or off the table for deliberation. we then briefly consider the role of stakeholders, arguing that their involvement is most critical for setting and vetting relevant reasons. finally, we briefly consider the implications of ‘relevant reasons’ for planning evaluations useful for decision-making.

.

efficacy and effectiveness

one set of reasons relates to the proven efficacy and safety of program’s/ policy’s content. have the materials or technologies used in a program received national regulatory approval?are they safe or appropriate for all sub-groups?

will the piloted program or portfolio of programs be judged solely on its absolute effectiveness, or if some threshold of “success” of effect size will be pre-set? in addition, will only average effects be considered or will effectiveness among certain sub-groups (e.g. historically disadvantaged) be made a separate reason?

will cost-effectiveness and affordability be considered and, if so, will benchmarks be set in advance? will decisions be taken on relative effectiveness (opportunity costs) of an intervention? if yes, relative to which other interventions (for example, other programs in the same sector or same portfolio or other programs from any sector but addressing similar outcomes)? (1)

.

feasibility and logistics

are resources for scaling (financial, material, human) allocated and ring-fenced, in case a positive decision is reached? if needed, can decision-makers mobilize resources needed for scaling? what types of information about the resources needed for scaling, and the likelihood of their being mobilized, be brought to the decision table?

given what was learned about efforts and costs in piloting the project/program/policy, is the relevant implementing agency is capable of running it? at scale? if only some state or provincial agencies are capable or have the requisite infrastructure, how should this information be used in decision making? do decision-makers want to pre-commit to an everywhere-or-nowhere decision?

will (and if so, how will) decision-makers assess genuine implementation capacity from isomorphic mimicry? will the potential to build capacity be considered in the deliberation or do decision-makers want to restrict themselves to considering what can be done with resources currently in existence?

what political considerations will be part of the deliberative process, including political realities of constituencies and lobbies in both the donor country and the country in which the pilot took place?

.

involvement of stakeholders: setting reasons and making decisions

stakeholders should have an important voice in which reasons are deemed relevant for decision-making, though the extent of “voice” is not clear (e.g.). there are two distinct roles for stakeholders in decision-making: those designated (elected, appointed, or otherwise empowered) to take certain decisions and those with stake in what decision is taken (street-level implementers (and unions thereof) and intended beneficiaries). the reasons set as relevant may gain legitimacy if they are the product of negotiations between multiple types of stakeholders.

nevertheless, we stress that it is stakeholder  involvement in setting and vetting reasons relevant to deliberation – rather than directly participating in the deliberation – that can foster fairness and legitimacy as well as feasibility and efficiency. representativeness of the decision-makers is neither necessary nor sufficient for a legitimate deliberative process that leads to a fair outcome. (2)

.

considerations in designing an evaluation for decision-making

to close, we circle back to a running theme in these posts: for decisions to be informed by evidence, evidence needs to be useful to decision-making. what, then, do the relevance of some reasons tell us about what kinds of data and evaluation questions are relevant?

we have come across numerous evaluations attached to pilot programs, designed to allow decision-makers to choose a way forward, including from among several evaluated options. has it been agreed in advance that one of those options must be chosen? Putting off such questions until an evaluation’s results are analyzed — as we have seen done in practice — sets up the unhelpful cycle of not discussing what types of evidence are desired for decision-making and, therefore, not setting up the evaluation to collect that information.

when relevant reasons for deliberation are laid out in advance, they provide guidance on what types of data need to be collected and what types of questions need to be evaluated to inform a decision. while we don’t advocate for any particular reason to be deemed “relevant,” we believe the above discussion not only informs how fair decisions on resource allocation can be taken but also highlights again that evidence – whether generated quantitatively or qualitatively – needs deep consideration in order to be deemed ‘rigorous’ from the point of view of their usability in decision-making.

in our next post, we take up the second criterion in the A4R framework: publicity and transparency in decision-making, again reflecting on what it means for the legitimacy of deliberations as well as the implications for planning for and using evidence.

.

(1) here, Brock’s work on separate spheres and indirect benefits is of interest.

(2) more on ‘democracy’ as an unsolved rationing problem can be found here.

Aside

west african pirates

i’ve been slow on pirate news. sometimes my google alert for ‘pirate’ brings me good things; most of the time it is about sports teams.

in any case, something good came in today. a lieutenant commander in the french navy noted that west africa and the gulf of guinea

is a good place to be a pirate.

in my head, he says it just like mel brooks.

more importantly, the piracy is partially attributable for the state (in this case, nigeria) not distributing the rents it earns from resources.

many of the pirates targeting ships on the high seas come from the niger delta in southern nigeria, where indigenous groups are demanding a greater share of the region’s oil wealth.

losing the “different worlds” talk

this post is an elaboration of my tweet on the nytimes’ op-ed, “the end of the developing world“, by Dayo Olopade. the essay is good and important. imbibe it. here’s a sip:

it’s tough to pick a satisfying replacement. talk of first, second and third worlds is passé, and it’s hard to bear the Dickensian awkwardness of “industrialized nations.” forget, too, the more recent jargon about the “global south” and “global north.” it makes little sense to counterpose poor countries with “the West” when many of the biggest economic success stories in the past few decades have come from the East.

all of these antiquated terms imply that any given country is “developing” toward something, and that there is only one way to get there.

it’s time that we start describing the world as “fat” or “lean.”

countries. “fat” or “lean” countries. or nation-states. or regions. but not worlds.

olopade rightly points out the passé-ness of “talk of first, second and third worlds”. but it isn’t just passé, it’s dangerous. we use these phrases – first and third world, developing world, industrialized world, arab world, muslim world – all the time. how can such common terms be dangerous?

because we’re all distinctly on the same planet. we’re in this thing together. i’ve ridden through it’s a small world, so i know.

one ride,

one endlessly repeating song,

one world.

as a reminder of the origins of the phrase (with an embedded implicit plea to stop using it), i quote at length from matthew connelly‘s fantastic fatal misconception. parentheses are page numbers in the book. emphases are mine.

In 1946, John Boyd Orr (a British authority on nutrition and first director-general of FAO [Food & Agriculture Organization]) had argued that population growth posed the choice of ‘one world or none.’  After the 1952 WHA [World Health Assembly], [Alfred] Sauvy [demographer, anthropologist, historian] proposed a new way of envisaging the globe… He argued that rather than one world, or even two, there were really three: the communist bloc, the capitalist West and what he called ‘the Third World.’ (Like the Third Estate of revolutionary France, the Third World was desperately poor and increasingly overcrowded.)

For Sauvy, ideological rivals East and West actually depended on each other because the conflict defined their identity. But as they continued along their paths two modernity, the distinctions between them would eventually disappear… The differences with the South were far more profound – for Sauvy, they inhabited a different universe. It was not the case, as American demographers optimistically assumed, that this Third World merely had to advance along some imagined continuum from tradition to modernity. According to Sauvy, ‘these countries have our mortality of 1914 and our natality of the 18th century.’ Saving lives was cheap but giving people something to live for was expensive (153).

“Do you not hear, on the Cote d’Azur, the cries of misery coming to us from the other side of the Mediterranean?” Sauvy asked. “The pressure is growing in the human furnace.” (154).

Equally significant was the metaphor Sauvy chose in describing how this Third World was emerging. He likened it to a “slow and irresistible push, humble and ferocious, towards life.” “Because,” Sauvy concluded, “in the end, this Third World, ignored, exploited and misunderstood like the Third Estate – it also wants to be something” (154).

Sauvy’s idea of three worlds – seemingly progressive in its critique of the Cold War and advocacy of development aid – was, in fact, deeply conservative. The whole point was to banish forever the though that there might be only one world, in which all humanity shared mutual obligations (154).