have evidence, will… um, erm (5 of 6, revisibility)

this is part of a series of joint posts with suvojit.

throughout this series of posts (1, 2, 3, 4), we have considered two main issues. first, how can evidence and evaluation be shaped to be made more useful – that is, directly useable – in guiding decision-makers to initiate, modify, scale-up or drop a program? or, as recently pointed out by Jeff Hammer, how can we better evaluate opportunity costs between programs, to aid in making decisions. second, given that evidence will always be only part of policy/programmatic decision, how can we ensure that decisions are made (and perceived to be made) fairly?

for such assurance, we primarily rely on Daniels’ framework for promoting “accountability for reasonableness” (A4R) among decision-makers. if the four included criteria are met, Daniels argues, it brings legitimacy to deliberative processes and, he further argues, consequent fairness to the decision and coherence to decisions over time.

the first two criteria set us up for the third: first, decision-makers agree ex ante to constrain themselves to relevant reasons (determined by stakeholders) in deliberation and, second, make public the grounds for a decision after the deliberation. these first two, we argue, can aid organizational learning and coherence in decision-making over time by setting and using precedent over time – an issue that has been bopping around the blogosphere this week.

these criteria, and an approach ensuring A4R more generally, are also a partial response to increasing calls for donor transparency, made loudly in Mexico City this week via the Global Partnership for Effective Development Co-operation. these calls focus on the importance of public availability of data as the key ingredient of donor (and decision-maker) transparency. we concur on their importance. but we argue that it is incomplete without an inclusive process of setting relevant reasons on how those data are used (recognizing that they will always only be part of the process) and making the decision criteria as well public.

the publicity and transparency around decision-making opens the door for A4R’s third criterion (and the subject of this post): the possibility to appeal and revise decisions. as Daniels notes, this condition “closes the loop between decision-makers and those who are affected by their policies.”

as a quick reminder of our guiding scenario: we specifically focus on the scenario of an agency deciding whether to sustain, scale, or shut-down a given program after piloting it with an accompanying evaluation — commissioned explicitly to inform that decision.

in most decision-making of this kind, some stakeholders — often would-be beneficiaries — will not agree with the decision and even feel or be adversely affected. while we suggest that stakeholders be involved in the earlier process of setting relevant reasons, a grievance-redressal or dispute-resolution mechanism, as provided by the revisibility criterion, gives these stakeholders an opportunity to voice their perspectives, based on the original grounds of the decision.

they can do this because the decision-criteria are made public, via criterion 2. this “visible and public” space for further deliberation provides stakeholders have a route “back into the policy formulation process.” stakeholders can use evidence available to them to advocate a certain way forward; it also allows for stakeholders to revisit the decision-making criteria and the decisions they fostered. stakeholders therefore have the opportunity to make a case for a change in the decision.

why might past decisions be questioned? since the appeals process is largely based on the original decision criteria, appeals come if circumstances around those reasons changed. for example, in considering relevant reasons, feasibility was one category of criteria we proposed, such as government’s capacity to scale a program or their interest in the program. one can imagine that over time, over changes in regime, and over changes in politics and policy, the original answers to these criteria could change, opening space for appeals. an additional set of proposed relevant reasons related to cost, effectiveness, and cost-effectiveness. the costs of technologies and materials may change over time or fresh evidence could come out about long-term benefits of programs. this alters the original cost-benefit ratio, again, opening a space for appeals against the original decision.

such appeals may come from members of civil society (or government) that would like to see the program brought back to life (or to see it go away). these may also come from donors themselves wanting to look at their decision-making over time and implement changes in line with the changing context.

Daniels is careful to note, and we emphasize, that the power and purpose of this criterion is not that citizens will always overturn prior decisions.* decisions on limits are requisite, as needs generally outstrip resources. rather, the revisability criterion allows for reconsideration and reflection on those decisions by those knowledgeable about the topic and empowered to alter decisions, if seen fit and feasible. this can, Daniels notes, bring further legitimacy to decision-making processes and, again, improved decision-making over time.

we want to stress that these deliberations over decision-making and their ‘revisibility’ have to be situated in a rational and ethical decision-making framework, predicated on meeting needs fairly when not all can be met (distinct from, say, a legal framework). appeals will have to be judged on the original merits of the arguments as well as with recognition that aid resources have limits (although obviously, a different argument can be made that aid budgets should simply be bigger).  moreover, appeals need to be judged by people who understand the original decision and have the power to change it, if that is the decision taken. when decision-making criteria are set, they set the roadmap for a possible appeals process and should be accordingly discussed and agreed upon.

we started this series of posts by admitting the limited role evidence plays in decision-making — even when those commissioning evidence intend specifically to inform that decision. we considered how planning for decision-making can help in the production of more useful evidence and also how decisions can be made fairly, through the delineation of relevant reasons, the publicity of the decision criteria ultimately used, and now, the possibility of revisiting through criteria and revising decisions.

our thoughts in this series of posts should not make fair decision-making seem like an impossible task. not all aspects of each of these considerations can be taken into account – the constraints of the real world are not lost on us and A4R remains an ideal, though we think one that can be approached. in our final post of this series, we therefore attempt to close the loop by looking at enforcement – asking how these ideas can be enforced and decision-makers held accountable.

*see. e.g., Richard Horton’s recent slide about the limit-breaking decisions by courts and the effects on health care systems, as in cases like Colombia. experiments with health courts may be instructive. picture via @fanvictoria, citing @richardhorton1.

Allowing ‘revisibility’ in decisionmaking

Originally posted on Suvojit Chattopadhyay:

This is a joint post with Heather - fifth in our series on decisionmaking

***

Throughout this series of posts (1, 2, 3, 4), we have considered two main issues. First, how can evidence and evaluation be shaped to be made more useful – that is, directly useable – in guiding decision-makers to initiate, modify, scale-up or drop a program? Or, as recently pointed out by Jeff Hammer, how can we better evaluate opportunity costs between programs, to aid in making decisions. Second, given that evidence will always be only part of policy/programmatic decision, how can we ensure that decisions are made (and perceived to be made) fairly?

For such assurance, we primarily rely on Daniels’ framework for promoting “accountability for reasonableness” (A4R) among decision-makers. If the four included criteria are met, Daniels argues, it brings legitimacy to deliberative processes and, he…

View original 1,006 more words

i’m not sure that means what you think it means (gold standard)

some thoughts, from peter byass, here, for the next time you want to refer to a technique as the ‘gold standard’ and what may be behind such a guarantee:

The verbal autopsy literature has extensively used and abused the concept of “gold standards” for validating cause of death determination. Metallurgists would say that 100% pure gold is an impossibility; the highest possible quality is normally certified as being 99.9% gold, while most of the quality-assured gold we encounter on an everyday basis ranges from 37% to 75% purity. It is perhaps also worth reflecting that 99% pure gold is an extremely soft and somewhat impractical material. Cause of death, on the spectrum of measurable biomedical phenomena, is also a somewhat soft commodity. For that reason, any approach to assessing cause of death involves alloying professional expertise with the best evidence in order to generate robust outcomes.

h/t jq

have evidence, will… um, erm? (4 of 6, going public)

this is a joint post with suvojit. it is also posted on people, spaces, deliberation.

in our last post, we discussed how establishing “relevant reasons” for decision-making ex ante may enhance the legitimacy and fairness of deliberations on resource allocation. we also highlight that setting relevant decision-making criteria can inform evaluation design by highlighting what evidence needs to be collected.

we specifically focus on the scenario of an agency deciding whether to sustain, scale or shut down a given program after piloting it with an accompanying evaluation — commissioned explicitly to inform that decision. our key foci are both how to make evidence useful to informing decisions and how, recognizing that evidence plays a minor role in decision-making, to ensure decision-making is done fairly.

for such assurance, we primarily rely on Daniels’ framework for promoting “accountability for reasonableness” (A4R) among decision-makers. if the four included criteria are met, Daniels argues, it will bring legitimacy to deliberations and, he further argues, consequent fairness to the decision.

in this post, we continue with the second criterion to ensure A4R: the publicity of decisions taken drawing on the first criterion, relevant reasons. we consider why transparency – that is, making decision criteria public – enhances the fairness and coherence of those decisions. we also consider what ‘going public’ means for learning.

disclaimer: logistical uncertainties / room for conversation and experimentation

from the outset, we acknowledge the many unanswered questions about how much publicity or transparency suffice for fairness and how to carry it out.

  • should all deliberations be opened to the public? Made available ex post via transcripts or recordings? or, is semi-transparency — explicitly and publicly announcing ex post the criteria deemed necessary and sufficient to take the final decision — acceptable, while the deliberation remains behind closed doors?
  • who is the relevant public?
  • can transparency be passive – making the information available to those who seek it out – or does fairness require a more active approach?
  • what does ‘available’ or ‘public’ mean in contexts of low-literacy and limited media access?

we do not address these questions — which are logistical and empirical as well as moral — here. as the first-order concern, we consider why this criterion matters.

 

fairness in specific decisions

any decision about resource allocation and limit-setting will be contrary to the preferences of some stakeholders – both those at and not at the decision table. in our scenario, for example, some implementers will have invested some quantity of blood, sweat and tears into piloting a program and may, as a result, have opinions on whether the program should continue; or, those that were comfortable in their inaction (as a result of lack of directives or funds or just plain neglect) who will now have to participate in a scale-up. there will be participants who benefited during the pilot – and those who would have done so if the program were scaled – that may prefer to see the program maintained.

these types of unmet preferences shape Daniels’s central concern: what can an agency* say to those people whose preferences are not met by a decision to convince them that, indeed, the decision “seems reasonable and based on considerations that take… [their] welfare into account?”** being able to give acceptable explanations to stakeholders about a decision is central to fairness.

 

coherence across decisions

the acceptability of criteria for a given decision contribute to the fairness of that decision. But long-run legitimacy of decision-makers benefits from consistency and coherency in organizational policy. transparency, and the explicitness it requires, can foster this.

once reasons for a decision are made public, it is more difficult to not deal with similar cases similarly – the use of ‘precedent’ in judicial cases aptly illustrates this phenomenon. treating like as like is an important requirement of fairness. Daniels envisions that a series of explicated decisions can function as an organizational counterpart of ‘case law’. future decision-makers can draw on past deliberations to establish relevant reasons. deviations from past decisions would need to be justified by relevant reasons.

 

 

implications for learning, decision-making and evaluations

if all decision-makers acknowledge that, at least, the final reasons for their decisions will be publicly accessible, how might that change the way they commission an evaluation and set about using the evidence from it?

it should encourage a review of past deliberations to help determine currently relevant reasons. second, it might encourage decision-makers and evaluators to consider as relevant reasons and measures that will be explainable and understandable to the public(s) when justifying their decisions.

  • in planning evaluations, decision-makers and researchers will have to consider the clarity in methods of data collection and analysis — effectively, will it pass a ‘grandmother test’? moreover, does it pass such a test when that granny is someone affected by your allocative decision? remember the central question that makes this criterion necessary: what can an agency say to those whose preferences are not met by a decision that, indeed, the decision “seems reasonable and based on considerations that take… [their] welfare into account?”
  • there are reasons that decision-makers might shy away from transparency. in his work on health plans, Daniels notes that such organizations speculatively feared media and litigious attacks. in our pilot-and-evaluate scenario, some implementers may not be comfortable with publicizing pilots that may fail; or from raising expectations of beneficiaries that are part of pilots.
  • the fear of failure may influence implementers; this may lead to low-risk/low-innovation pilots. again, this is an important consideration raised above, in the questions we did not answer: when and how much transparency suffices for fairness?

 

in our last blog, we stressed on the importance of engaging stakeholders in setting ‘relevant reasons’ before a project begins, as a key step towards fair deliberative processes as well as a way of shaping evaluations to be useful for decision-making. ensuring publicity and transparency of the decision-making criteria strengthens the perception of a fair and reasonable process in individual cases and over time.

this also sets the stage for an appeals process, where stakeholders can use evidence available to them to advocate a certain way forward; it also allows for stakeholders to revisit the decision-making criteria and the decisions they fostered – the subject of our next post in this series.

***

*we note that donors don’t actually often have to answer directly to implementers and participants for their decisions. We do not, however, dismiss this as a terrible idea.

**we are explicitly not saying ‘broader’ welfare because we are not endorsing a strictly utilitarian view that the needs of some can be sacrificed if the greater good is enhanced, no matter where or how  that good is concentrated.

further thoughts on phase-in/pipeline designs for causal inference

not long back, i put down my thoughts (here) about pipeline or phase-in designs. my basic premise is that while they may allow for causal inference, it is not clear that they are usually designed to allow generated evidence to be used where it is most relevant — to that program itself. that seems bad from an evidence-informed decision-making point of view and potentially questionable from an ethical point of view.

i raised this issue during a recent conversation on the development impact blog about the ethics of randomization. i reproduce my comment and berk ozler‘s kind reply, below.

 

me

usually, the appealing premise of a phased-in design is that there is some resource constraint that would prevent simultaneous scale-up in any case. in this scenario, no matter how heavy the burden of waiting, there will be to be some rationing. In which case, why not randomization rather than something else, like patronage?

then things get odd. the suggestion seems to be that we may know, ex ante, that at least some types of people (elderly, immune-compromised) will benefit greatly from immediate receipt of the treatment. In which case, we are not in equipoise and whether an RCT (or at least unconditional randomization) is appropriate in any case. things, of course, get trickier when a resource constraint is not binding simultaneous scale-up.

second, I feel we should reflect on the purpose and ethics of a phased-in design, especially one with full information. again, a resource constraint may make it politically acceptable for a governor to say that she will roll-in health insurance randomly across the state, which can allow an opportunity to learn something about the impact of health insurance. so, she stands up and says everyone will get (this) health insurance at some point and here’s the roll-out schedule.

but the reason for making use of this randomization is to learn if something works (because we genuinely aren’t sure if it will, hence needing the experiment) and maybe to have ‘policy impact’. so what if what is learnt from comparing the Phase I and Phase II groups is that there is no impact, the program is rubbish or even harmful? or, at a minimum, it doesn’t meet some pre-defined criterion of success. is the governor in a position to renege on rolling out the treatment/policy because of these findings? does the fine print for everyone other than those in Phase I say “you’ll either get health insurance, or, if the findings are null, a subscription to a jelly-of-the-month club”? in some ways, a full-disclosure phased roll-in seems to pre-empt and prevent policy learning and impact *in the case under study* because of the pre-commitment of the governor.

i find that phased roll-in designs without a plan to pause, analyse, reassess and at least tweak the design between Phases I and II to be ethically troubling. i’d be interested in your thoughts.

 

berk

in economics, unlike in medicine, many times the programs we have involve transferring something to individuals, households, or communities (assets, information, money, etc.). without negative spillovers, we don’t think of these as ever not increasing individual welfare, at least temporarily: if i give you a cow, this is great for you. if you don’t like it, sell it: your individual welfare will increase (would have been even higher if i just gave you the cash).

but, what if my program’s goal is not a temporary jump in your welfare, but you escaping poverty as close to permanently as possible? the program could be deemed unsuccessful even though it raised welfare of its beneficiaries for a short period.

the point is, it does seem wrong to break your promise to give something (something people would like to have) to people who drew Phase II in the lottery because you deemed your program unsuccessful for reaching its goals. you promised people that you’d give them the treatment at the outset, so i’d argue that if you’ll break your promise you have to give them something at least as good if not better. if you can come up with this (and the phase II group is happy with your decision), perhaps they can even become your phase I group in a new experiment — in a process where you experiment, tweak, experiment again, … kind of like what Pritchett et al. argue we should do: lot more experiments not less…

thinking of your examples. with the Oregon healthcare reform, it would be hard to push a stop or pause button with legislation. government action takes time and there is the credibility of your policymakers at stake. i don’t think you could really argue for a stop/pause because those impacts (even if unequivocal) are considered too small to treat the lottery losers.

in the case of a project that is giving cows, i am more optimistic: it might be possible for the project to find an alternative treatment that is of equal or higher value, that is acceptable to the phase II group, and that is feasible to roll out quickly. in such cases, i could see a tweak of the intervention between the two phases.

have evidence, will… um, erm? (3 of 6, relevent reasons)

this is a joint post with suvojit. it is also posted on people, spaces, deliberation.

.

in our last post, we wrote about factors – evidence and otherwise – influencing decision-making about development programmes. to do so, we have considered the premise of an agency deciding whether to continue or scale a given program after piloting it and including an accompanying evaluation commissioned explicitly to inform that decision. this is a potential ‘ideal case’ of evidence-informed decision-making. yet, the role of evidence in informing decisions is often unclear in practice.

what is clear is that transparent parameters for making decisions about how to allocate resources following a pilot may improve the legitimacy of those decisions.  we have started, and continue in this post, to explore whether decision-making deliberations can be shaped  ex ante so that, regardless of the outcome, stakeholders feel it was arrived at fairly. such pre-commitment to the process of deliberation could carve out a specific role for evidence in decision-making. clarifying the role of evidence would inform what types of questions decision-makers need answered and with what kinds of data, as we discussed here.

in considering deliberative processes, we are guided by Daniels’s, normative “accountability for reasonableness” framework (A4R). Daniels proposes four criteria to bring legitimacy to deliberations and, he argues, consequent fairness to the decision.

.

relevant reasons

this post focuses on the first A4R criterion: the “relevant reasons” that, when considered, allow “the minority [to] at least assure itself that the preference of the majority rests on the kinds of reason that even the minority must acknowledge appropriately plays a role in the deliberation.” our goal is not to assert which reasons, including evidence, provide legitimate grounds for deliberation.

rather, we outline possible categories of reasons that may, ex ante, be placed on or off the table for deliberation. we then briefly consider the role of stakeholders, arguing that their involvement is most critical for setting and vetting relevant reasons. finally, we briefly consider the implications of ‘relevant reasons’ for planning evaluations useful for decision-making.

.

efficacy and effectiveness

one set of reasons relates to the proven efficacy and safety of program’s/ policy’s content. have the materials or technologies used in a program received national regulatory approval?are they safe or appropriate for all sub-groups?

will the piloted program or portfolio of programs be judged solely on its absolute effectiveness, or if some threshold of “success” of effect size will be pre-set? in addition, will only average effects be considered or will effectiveness among certain sub-groups (e.g. historically disadvantaged) be made a separate reason?

will cost-effectiveness and affordability be considered and, if so, will benchmarks be set in advance? will decisions be taken on relative effectiveness (opportunity costs) of an intervention? if yes, relative to which other interventions (for example, other programs in the same sector or same portfolio or other programs from any sector but addressing similar outcomes)? (1)

.

feasibility and logistics

are resources for scaling (financial, material, human) allocated and ring-fenced, in case a positive decision is reached? if needed, can decision-makers mobilize resources needed for scaling? what types of information about the resources needed for scaling, and the likelihood of their being mobilized, be brought to the decision table?

given what was learned about efforts and costs in piloting the project/program/policy, is the relevant implementing agency is capable of running it? at scale? if only some state or provincial agencies are capable or have the requisite infrastructure, how should this information be used in decision making? do decision-makers want to pre-commit to an everywhere-or-nowhere decision?

will (and if so, how will) decision-makers assess genuine implementation capacity from isomorphic mimicry? will the potential to build capacity be considered in the deliberation or do decision-makers want to restrict themselves to considering what can be done with resources currently in existence?

what political considerations will be part of the deliberative process, including political realities of constituencies and lobbies in both the donor country and the country in which the pilot took place?

.

involvement of stakeholders: setting reasons and making decisions

stakeholders should have an important voice in which reasons are deemed relevant for decision-making, though the extent of “voice” is not clear (e.g.). there are two distinct roles for stakeholders in decision-making: those designated (elected, appointed, or otherwise empowered) to take certain decisions and those with stake in what decision is taken (street-level implementers (and unions thereof) and intended beneficiaries). the reasons set as relevant may gain legitimacy if they are the product of negotiations between multiple types of stakeholders.

nevertheless, we stress that it is stakeholder  involvement in setting and vetting reasons relevant to deliberation – rather than directly participating in the deliberation – that can foster fairness and legitimacy as well as feasibility and efficiency. representativeness of the decision-makers is neither necessary nor sufficient for a legitimate deliberative process that leads to a fair outcome. (2)

.

considerations in designing an evaluation for decision-making

to close, we circle back to a running theme in these posts: for decisions to be informed by evidence, evidence needs to be useful to decision-making. what, then, do the relevance of some reasons tell us about what kinds of data and evaluation questions are relevant?

we have come across numerous evaluations attached to pilot programs, designed to allow decision-makers to choose a way forward, including from among several evaluated options. has it been agreed in advance that one of those options must be chosen? Putting off such questions until an evaluation’s results are analyzed — as we have seen done in practice — sets up the unhelpful cycle of not discussing what types of evidence are desired for decision-making and, therefore, not setting up the evaluation to collect that information.

when relevant reasons for deliberation are laid out in advance, they provide guidance on what types of data need to be collected and what types of questions need to be evaluated to inform a decision. while we don’t advocate for any particular reason to be deemed “relevant,” we believe the above discussion not only informs how fair decisions on resource allocation can be taken but also highlights again that evidence – whether generated quantitatively or qualitatively – needs deep consideration in order to be deemed ‘rigorous’ from the point of view of their usability in decision-making.

in our next post, we take up the second criterion in the A4R framework: publicity and transparency in decision-making, again reflecting on what it means for the legitimacy of deliberations as well as the implications for planning for and using evidence.

.

(1) here, Brock’s work on separate spheres and indirect benefits is of interest.

(2) more on ‘democracy’ as an unsolved rationing problem can be found here.

west african pirates

i’ve been slow on pirate news. sometimes my google alert for ‘pirate’ brings me good things; most of the time it is about sports teams.

in any case, something good came in today. a lieutenant commander in the french navy noted that west africa and the gulf of guinea

is a good place to be a pirate.

in my head, he says it just like mel brooks.

more importantly, the piracy is partially attributable for the state (in this case, nigeria) not distributing the rents it earns from resources.

many of the pirates targeting ships on the high seas come from the niger delta in southern nigeria, where indigenous groups are demanding a greater share of the region’s oil wealth.