thoughts from #evalcon on evidence uptake, capacity building

i attended a great panel today, hosted by the think take initiative and idrc and featuring representatives from three of tti’s cohort of think tanks. this is part of the broader global evaluation week (#evalcon) happening in kathmandu and focused on building bridges: use of evaluation for decision making and policy influence. the notes on evidence-uptake largely come from the session while the notes on capacity building are my own musings inspired by the event.


one point early-on was to contrast evidence-informed decision-making with opinion-informed decision-making. i’ve usually heard the contrast painted as faith-based decision-making and think the opinion framing was useful. it also comes in handy for one of the key takeaways from the session, which is that maybe the point (and feasible goal) isn’t to do away with opinion-based decision-making but rather to make sure that opinions are increasingly shaped by rigorous evaluative evidence. or to be more bayesian about it, we want decision-makers to continuously update their priors about different issues, drawing on evidence.


this leads to a second point. in focusing on policy influence, we may become too focused on influencing very specific decision-makers for very specific decisions. this may lead us to lose sight of the broader goal of (re-)shaping the opinions of a wide variety of stakeholders and decision-makers, even if not linked to the immediate policy or program under evaluation. so, again, the frame of shaping opinions and aiming for decision-maker/power-center rather than policy-specific influence may lead to altered approaches, goals, and benchmarks.


a third point that echoed throughout the panel is that policy influence takes time. new ideas need time to sink in and percolate before opinions are re-shaped. secretary suman prasad sharma of nepal noted that from a decision-maker point of view, evaluations are better and more digestible when they aim to build bit by bit. participants invoked a building blocks metaphor several times and contrasted it with “big bang” results. a related and familiar point about the time and timing required for evaluation to change opinions and shape decisions is that planning for the next phase of the program cycle generally begins midway through current programming. if evaluation is to inform this next stage of planning, it requires the communication of interim results — or a more thoughtful shift of the program planning cycle relative to monitoring and evaluation funding cycles in general.


a general point that came up repeatedly was what constitutes a good versus a bad evaluation. this leads to a key capacity-building point: we need more “capacity-building” to help decision-makers recognize credible, rigorous evidence and to mediate between conflicting findings. way too often, in my view, capacity-building ends up being about how particular methods are carried out, rather than on the central task of identifying credible methodologies and weighting the findings accordingly (or on broader principles of causal inference). that is, capacity-building among decision-makers needs to (a) understand how they currently assess credibility (on a radical premise that capacity-building exercises might generate capacity on both sides) and (b) help them become better consumers, not producers, of evidence.


a point that surfaced continuously about how decision-makers assess evidence was about objectivity and neutrality. ‘bad evaluations’ are biased and opinionated; ‘good evaluations’ are objective. there is probably a much larger conversation to be had about parsing objectivity from independence and engagement as well as further assessment of how decision-makers assess neutrality and how evaluators might establish and signal their objectivity. as a musing: a particular method doesn’t guarantee neutrality, which can also be violated in shaping the questions, selecting the site and sample, and so on.


other characteristics of ‘good evaluation’ that came out included those that don’t confuse being critical with only being negative. findings about what is working are also appreciated. ‘bad evaluation’ assigns blame and accountability to particular stakeholders without looking through a nuanced view of the context and events (internal and external) during the evaluation. ‘good evaluation’ involves setting eval objectives up front. ‘good evaluation’ also places the findings in the context of other evidence on the same topic; this literature/evidence review work, especially when it does not focus on a single methodology or discipline (and, yes, i am particularly alluding to RCT authors that tend to only cite other RCTs, at the expense of sectoral evidence and simply other methodologies), is very helpful to a decision-making audience, as is helping to make sense of conflicting findings.


a final set of issues related to timing and transaction costs. a clear refrain throughout the panel is the importance of the timing of sharing the findings. this means paying attention to the budget-making cycle and sharing results at just the right moment. it means seeing windows of receptivity to evidence on particular topics, reframing the evidence accordingly, and sharing it with decision-makers and the media. it probably means learning a lot more from effective lobbyists. staying in tune with policy and media cycles in a given evaluation context is hugely time consuming. a point was made and is well-taken that the transaction costs of this kind of staying-in-tune for policy influence is quite high for researchers. perhaps goals for influence by the immediate researchers and evaluators should be more modest, at least when shaping a specific decision was not the explicit purpose of the evaluation.


one is to communicate the findings clearly to and to do necessary capacity-building with naturally sympathetic decision-makers (say, parliamentarians or bureaucrats with an expressed interest in x issue) to become champions to keep the discussion going within decision-making bodies. to reiterate, my view is that a priority for capacity-building efforts should focus on helping decision-makers become evidence champions and good communicators of specific evaluation and research findings. this is an indirect road to influence but an important one, leveraging the credibility of decision-makers with one another. two, also indirect, is to communicate the findings clearly to and to do necessary capacity-building with the types of (advocacy? think tank?) organizations whose job is to focus on the timing of budget meetings and shifting political priorities and local events to which the evidence can be brought to bear.


the happy closing point was that a little bit of passion in evaluation, even while trying to remain neutral and objective, does not hurt.

Thinking About Stakeholder Risk and Accountability in Pilot Experiments

This post is also cross-posted here in slightly modified form.

Since I keep circling around issues related to my dissertation in this blog, I decided it was time to start writing about some of that work. As anyone who has stood or sat near to me for more than 5 minutes over the past 4.25 years will know, in my thesis I examine the political-economy of adopting and implementing a large global health program (the affordable medicines facility – malaria or “AMFm”). This program was designed at the global level (meaning largely in D.C. and Geneva with tweaking workshops in assorted African capitals). Global actors invited select Sub-Saharan African countries to apply to pilot the AMFm for two years before any decision was made to continue, modify, scale-up, or terminate. It should also be noted from the outset that it was not fully clear what role the evidence would play in the board’s decision and how the evidence would be interpreted. As I highlight below, this lack of clarity helped to foster feelings of risk as well as a resistance among some of the national-level stakeholders about participating in the pilot. . . as  . .

To push the semantics a bit, several critics have (e.g.) noted that scale and scope and requisite new systems and relationships involved in the AMFm disqualify it from being considered a ‘pilot,’ though i use that term for continuity with most other AMFm-related writing. . .

In my research, my focus is on the national and sub-national processes of deciding to participate in the initial pilot (‘phase I’) stage, focusing specifically on Ghana. Besides the project scale and resources mobilized, one thing that stood out about this project is that there was a reasonable amount of resistance to piloting this program among stakeholders in several of the invited countries. I have been very fortunate that my wonderful committee and outside supporters like Owen Barder have continued to push me over the years (and years) to try to explain this resistance to an ostensibly ‘good’ program. Moreover, I have been lucky and grateful that a set of key informants in Ghana that have been willing to converse openly with me over several years as I have tried to untangle the reasons behind the support and resistance and to try to get the story ‘right’. . .

The set-up of the global health pilot experiment, from the global perspective, the set-up was a paragon of planning for evidence-informed decision-making: pilot first, develop benchmarks for success and commission an independent evaluation (a well-monitored before and after comparison) — and make decisions later. . .

In my work, through a grounded qualitative analysis, I distil the variety of reasons for supporting and resisting Ghana’s participation in the AMFm pilot to three main types: those related to direct policy goals (in this case, increasing access to malaria medication and lowering malaria mortality), indirect policy goals (indirect insofar as they are not the explicit goals of the policy in question, such as employment and economic growth), and finally those related to risk and reputation (individual, organizational, and national). I take the latter as my main focus for the rest of this post. . . . .

A key question on which I have been pushed is the extent to which resistance to participation (which meant resisting an unprecedented volume of highly subsidized, high-quality anti-malarial treatments entering both the public and the private sector) emerges from the idea of the AMFm versus the idea of piloting the AMFm with uncertain follow-up plans. . ..

Some issues, such as threats to both direct and indirect policy goals often related to the AMFm mechanism itself, including the focus on malaria prevention rather than treatment as well as broader goals related to national pride and the support of local businesses. The idea of the AMFm itself, as well as it a harbinger of approaches (such as market-based approaches) to global health, provoked both support and resistance. . . .

But some sources of resistance stemmed more directly from the piloting process itself. By evidence-informed design, the global fund gave “no assurance to continue [AMFm] in the long-term,” so that the evaluation of the pilot would shape their decision. This presented limited risks to them. At the national level, this uncertainty proved troubling, as many local stakeholders felt it posed national, organizational, and personal risks for policy goals and reputations. Words like ‘vilification‘ and ‘chastisement‘ and ‘bitter‘ came up during key informant interviews. in a point of opposing objectives (if not a full catch-22, a phrase stricken from my thesis), some stakeholders may have supported the pilot if they knew the program would not be terminated (even if modified), whereas global actors wanted the pilot to see if the evidence suggested the program should (not) be terminated. Pilot-specific concerns related to uncertainties around the sunk investments of time in setting up the needed systems and relationships, which have an uncertain life expectancy. also, for a stakeholder trying to decide whether to support or resist a pilot, it doesn’t help when the reputation and other pay-offs from supporting are uncertain and may only materialize should the pilot prove successful and be carried to the next stage. . . .

A final but absolutely key set of concerns for anyone considering working with policy champions is what, precisely, the decision to continue would hinge upon. Would failure to meet benchmarks be taken as a failure of the mechanism and concept? A failure of national implementation capacity and managerial efforts in Ghana (in the face of a key donor)? A failure of individual efforts and initiatives in Ghana? .

Without clarity on these questions about how accountability and blame would be distributed, national stakeholders were understandably nervous and sometimes resistant (passively of actively) to Ghana’s applying to be a phase I pilot country. To paraphrase one key informant’s articulation of a common view, phase I of the AMFm should have been an experiment on how to continue, not whether to continue, the initiative. . . .

How does this fit in with our ideas of ideal evidence-informed decision-making about programs and policies? The experience recorded here raises some important questions when we talk about wanting policy champions and wanting to generate rigorous evidence about those policies. Assuming that the policies and programs under study adhere to one of the definitions of equipoise, the results from a rigorous evaluation could go either way.

What risks does the local champion(s) of a policy face in visibly supporting a policy?

Is clear accountability established for evaluation outcomes?

Are there built-in buffers for the personal and political reputation of champions and supporters in the evaluation design?

The more we talk about early stakeholder buy-in to evaluation and the desire for research uptake on the basis of evaluation results, the more we need to think about the political economy of pilots and those those stepping up to support policies and the (impact) evaluation of them. Do they exist in a learning environment where glitches and null results are considered part of the process? Can evaluations help to elucidate design and implementation failures in a way that has clear lines of accountability among the ‘ideas’ people, the champions, the managers, and the implementer’s? These questions need to be taken seriously if we expect government officials to engage in pilot research to help decide the best way to move a program or policy forward (including not moving it forward at all).

Buffet of Champions: What Kind Do We Need for Impact Evaluations and Policy?

This post is also cross-posted here and here.

I realize that the thesis of “we may need a new kind of champion” sounds like a rather anemic pitch for Guardians of the Galaxy. Moreover, it may lead to inflated hopes that i am going to propose that dance-offs be used more often to decide policy questions. While I don’t necessarily deny that this is a fantastic idea (and would certainly boost c-span viewership), I want to quickly dash hopes that this is the main premise of this post.

Rather, I am curious why “we” believe that policy champions will be keen on promoting and using impact evaluation (and subsequent evidence syntheses of these) and to suggest that another range of actors, which I call “evidence” and “issue” champions may be more natural allies. there has been a recurring storyline in recent literature and musings on (impact) evaluation and policy- or decision-making:

  • First, The aspiration: the general desire of researchers (and others) to see more evidence used in decision-making (let’s say both judgment and learning) related to aid and development so that scarce resources are allocated more wisely and/or so that more resources are brought to bear on the problem.
  • Second, The dashed hopes: the realization that data and evidence currently play a limited role in decision-making (see, for example, the report on the evidence on evidence-informed policy-making as well as here).
  • Third, The new hope: the recognition that “policy champions” (also “policy entrepreneurs” and “policy opportunists”) may be a bridge between the two.
  • Fourth, The new plan of attack: bring “policy champions” and other stakeholders in to the research process much earlier in order to get up-take of evaluation results into the debates and decisions. this even includes bringing policy champions (say, bureaucrats) on as research PIs.

There seems to be a sleight of hand at work in the above formulation and it is somewhat worrying in terms of equipoise and the possible use of the range of results that can emerge from an impact evaluation study. Said another way, it seems potentially at odds with the idea that the answer to an evaluation is unknown at the start of the evaluation. .

While I am not sure that “policy champion” has been precisely defined (and, indeed, this may be part of the problem), this has been done for the policy entrepreneur concept. So far as I can tell, the first time to articulate the entrepreneurial (brokering, middle-man, risk-taking) role in policy-making comes from David E. Price in 1971. The idea was repeated and refined in the 1980s and then became more commonplace in 1990s’ discussions of public policy, in part through the work of John Kingdon. (There is also an formative and informative 1991 piece by Nancy Roberts and Paula King.)

Much of the initial discussion, it seems, came out of studying US national and state-level congressional politics but the ideas have been repeatedly shown to have merit in other deliberative settings. Much of the initial work also focused on agenda-setting — which problems and solutions gain attention — but similar functions are also important in the adoption and implementation of policy solutions. Kingdon is fairly precise about the qualities of a policy entrepreneur — someone who has, as Kingdon calls it, a pet policy that they nurture over years, waiting for good moments of opportunity to suggest their policy as the solution to a pressing problem.

  • First, such a person must have a “claim to a hearing” — that is, at least behind-the-scenes, people must respect and be willing to be listen to this person on this topic (especially if this person is not directly in a position with decision-making power).
  • Second, such a person must have networks and connections as well as an ability to bargain and negotiate within them. this is a person that can broker ideas across diverse groups of people, can “soften-up” people to the entrepreneur’s preferred policy solution, etc.
  • Third, such a person must have tenacity, persistence and a willingness to risk personal reputation and resources for a policy idea.

In Kingdon’s and others’ conception, a policy entrepreneur has to work at selling their idea over a long period of time (which is presumably why Weissert (1991) also introduced the idea of policy opportunists, who only start to champion ideas once they make it to the deliberating table and seem likely to move forward.) In short, policy entrepreneurs (and through the sloppy use of near-synonyms, policy champions,) believe strongly in a policy solution and for some reason and have put in time, effort, and reputation into moving the idea forward. Note the nebulous use of “some reason” — I have not found a definition that specifies that policy entrepreneurs must come to promote a policy through a particular impetus. Glory, gold, God, goodness, and (g’)evidence also seem to be viable motivators to fit the definition. .

My question is: is this what we need to support the use of research (and, specifically impact evaluations and syntheses thereof) on decision-making. It is not clear to me that we do. Policy entrepreneurs are people already sold on a particular policy solution, whereas the question behind much evaluation work is ‘is this the best policy solution for this context?’ (Recognizing the importance of contextual and policy, if not clinical, uncertainty about the answer in order for an evaluation to be worthwhile. It seems to me, then, that what we (researchers and evaluator’s) actually need, then, are people deeply committed to one of two things:

(1) The use of data and evidence, in general, (“evidence champions” or, at least loosely, technocrats) as an important tool in sound decision-making and/or

(2) a particular issue or problem (“issue champions” — no doubt a sexier phrase is available). i’ll spend more time on the second. .

An “issue champion,” for example, may be someone who has similar qualities of a policy entrepreneur but, rather than using claims to a hearing, a network, and tenacity to bring forward a policy solution, s/he uses these tools to bring attention to a problem — say, malaria mortality. This person feels that malaria is a problem that must be solved — and is open to finding the most (cost-) effective solution to the problem (or means to do a good job with implementing that solution).

S/He is not, by contrast, someone already committed to believing that prevention, diagnostics, or treatment in any particular form or at any particular price are the best way forward until s/he has seen evidence of this in a relevant context. This is different from a “policy champion” who has, for example, been pushing for universal bednet coverage for the past 20 years. This is not to say that you don’t want the bednet champion to be well aware of your study and to even have input into defining the research questions and approving the research design (in fact, this seems vital in lending credibility and usefulness to the results). But, the way the study is structured will be important to whether the bednet champion is open to taking up the range of possible results from your study.

If your question is: does approach A or approach B result in more efficient distribution of bednets, then yes, both sets of results will be interesting to the bednet champion.

But if the question is more of the type: are bednets the most cost-effective approach to addressing malaria mortality in our country? then the bednet champion is likely to only be particularly interested in trumpeting about one set of results: those that are significantly in favor of bednets as a solution to the malaria problem.

The malaria/issue champion (or general evidence enthusiast), on the other hand, may be more open to thinking about how to interpret and use the range of possible results from the study, which may also be mixed, inconclusive, or even negative. (Throughout this discussion, I recognize that malaria, like all problems in human and economic development, don’t have silver bullet answers and that, therefore, “A or not-A”-type evaluation questions will only get us so far in getting the right mix of tools in the right place at the right time. i.e. the answer is likely neither that bednets do not good nor that they are the only thing needed to tackle malaria.) .

The worrisomeness, then, of the policy champion is that they are already committed to a policy solution. Will they change their mind on the basis of one study? Probably not (nor, necessarily, should they. But a meta-analysis may not sway them either.) But insofar as “we” want decision-makers to learn about our evidence and to consider it in the deliberations, it may be issue, rather than policy, champions that are particularly important. They may make use of the results regardless of what they are. We cannot necessarily expect the same of the policy champion. Of course, a small army of evidence champions is also helpful. I do want to stress that it is critical to have policy champions and other stakeholders involved early in the research-design process, so that the right questions can be asked and the politically and contextually salient outcomes and magnitudes considered. But as an ally in the evaluation process and, say, a potential PI on an evaluation, it seems that the issue champions are the folks likely to stick with it. .

And, yes, issue champions should probably have some moves ready, in case of a dance-off (as there will always be factors beyond evidence and data influencing decisions).

further thoughts on phase-in/pipeline designs for causal inference

not long back, i put down my thoughts (here) about pipeline or phase-in designs. my basic premise is that while they may allow for causal inference, it is not clear that they are usually designed to allow generated evidence to be used where it is most relevant — to that program itself. that seems bad from an evidence-informed decision-making point of view and potentially questionable from an ethical point of view.

i raised this issue during a recent conversation on the development impact blog about the ethics of randomization. i reproduce my comment and berk ozler‘s kind reply, below.



usually, the appealing premise of a phased-in design is that there is some resource constraint that would prevent simultaneous scale-up in any case. in this scenario, no matter how heavy the burden of waiting, there will be to be some rationing. In which case, why not randomization rather than something else, like patronage?

then things get odd. the suggestion seems to be that we may know, ex ante, that at least some types of people (elderly, immune-compromised) will benefit greatly from immediate receipt of the treatment. In which case, we are not in equipoise and whether an RCT (or at least unconditional randomization) is appropriate in any case. things, of course, get trickier when a resource constraint is not binding simultaneous scale-up.

second, I feel we should reflect on the purpose and ethics of a phased-in design, especially one with full information. again, a resource constraint may make it politically acceptable for a governor to say that she will roll-in health insurance randomly across the state, which can allow an opportunity to learn something about the impact of health insurance. so, she stands up and says everyone will get (this) health insurance at some point and here’s the roll-out schedule.

but the reason for making use of this randomization is to learn if something works (because we genuinely aren’t sure if it will, hence needing the experiment) and maybe to have ‘policy impact’. so what if what is learnt from comparing the Phase I and Phase II groups is that there is no impact, the program is rubbish or even harmful? or, at a minimum, it doesn’t meet some pre-defined criterion of success. is the governor in a position to renege on rolling out the treatment/policy because of these findings? does the fine print for everyone other than those in Phase I say “you’ll either get health insurance, or, if the findings are null, a subscription to a jelly-of-the-month club”? in some ways, a full-disclosure phased roll-in seems to pre-empt and prevent policy learning and impact *in the case under study* because of the pre-commitment of the governor.

i find that phased roll-in designs without a plan to pause, analyse, reassess and at least tweak the design between Phases I and II to be ethically troubling. i’d be interested in your thoughts.



in economics, unlike in medicine, many times the programs we have involve transferring something to individuals, households, or communities (assets, information, money, etc.). without negative spillovers, we don’t think of these as ever not increasing individual welfare, at least temporarily: if i give you a cow, this is great for you. if you don’t like it, sell it: your individual welfare will increase (would have been even higher if i just gave you the cash).

but, what if my program’s goal is not a temporary jump in your welfare, but you escaping poverty as close to permanently as possible? the program could be deemed unsuccessful even though it raised welfare of its beneficiaries for a short period.

the point is, it does seem wrong to break your promise to give something (something people would like to have) to people who drew Phase II in the lottery because you deemed your program unsuccessful for reaching its goals. you promised people that you’d give them the treatment at the outset, so i’d argue that if you’ll break your promise you have to give them something at least as good if not better. if you can come up with this (and the phase II group is happy with your decision), perhaps they can even become your phase I group in a new experiment — in a process where you experiment, tweak, experiment again, … kind of like what Pritchett et al. argue we should do: lot more experiments not less…

thinking of your examples. with the Oregon healthcare reform, it would be hard to push a stop or pause button with legislation. government action takes time and there is the credibility of your policymakers at stake. i don’t think you could really argue for a stop/pause because those impacts (even if unequivocal) are considered too small to treat the lottery losers.

in the case of a project that is giving cows, i am more optimistic: it might be possible for the project to find an alternative treatment that is of equal or higher value, that is acceptable to the phase II group, and that is feasible to roll out quickly. in such cases, i could see a tweak of the intervention between the two phases.

have evidence, will… um, erm (2 of 2)

this is the second in a series of posts with suvojit (and cross-posted here), initially planned as a series of two but growing to six…

reminder: the scenario 
in our last post, we set up a scenario that we* have both seen several times: a donor or large implementing agency (our focus, though we think our arguments apply to governmental ministries) commissions an evaluation, with explicit (or implicit) commitments to ‘use’ the evidence generated to drive their own decisions about continuing/scaling/modifying/scrapping a policy/program/project.

and yet. the role of evidence in decision-making of this kind is unclear.

in response, we argued for something akin to patton’s utilisation-focused evaluation. such an approach assesses the “quality” or “rigor” of evidence by considering how well it addresses the questions and purposes needed for decision-making with the most appropriate tools and timings to facilitate decision-making in particular political-economic moment, including the capacity of decision-makers to act on evidence.

(this parallels woolcock’s definition of rigor, here. to the extent that we focus on a scenario in which donors and the M(e)E team design an evaluation intended to inform scale-up decisions (of effectiveness (not efficacy) across the relevant geographies and with likely implementers, say), we sidestep some – though not all – of pritchett (and sandefur’s) critiques of rigor vis-a-vis the multiple dimensions of context.)

in this post, we continue to explore this scenario, which sets up a seemingly ideal case of evidence-informed decision-making (donor-commissioned directly instrumental evaluations (rather than symbolic or conceptual)). we move beyond designing usable/useful evaluations to ask what might facilitate donors making decisions that are, if not necessarily “right,” at least informed, reasoned and justifiable given available evidence. to this end, we present a potential set of criteria to (begin a conversation on how to) set-up a process that can yield thoughtful, reasoned and ‘fair’ decisions that take evidence into account.

to begin, we ask what does influence decision-making at present.

what does drive decision-making?

the recent semantic shift from “evidence-based” to “evidence-informed” decision-making reflects a brewing recognition among evidence nerds that decision-making isn’t — can’t be (?), shouldn’t be (??)** — made in a strictly technocratic way. most political scientists and policymakers — and certainly politicians — have known this for a very long time.

politics are constitutive of policymaking. full stop. it is naive to proceed under any illusions about evidence replacing politics or the need to think more generally. researchers, M&Eers can learn more about the processes; party platforms and ideologies; election cycles and decision timetables — and potentially understand how to leverage them — but they don’t go away no matter how hard we wish for technocratic decision-making.

participants at a 2012 conference on evidence-based policy-making generally agreed that “evidence is a relatively minor factor in most policy maker’s decision making” and that “many other factors” influence the decisions made. additional factors in policy decision-making include:

  • inertia, path-dependence, habit
  • administrative feasibility to implement
  • decision-maker and public values, ideologies and perceptions about the way things are and ought to be
  • political benefit/cost of adding or removing a visible program
  • alignment of program’s expected impact trajectory with political cycles, opportunity  windows
  • political manipulation
  • personal & professional ambition, the interests of powerful advocates and lobbyists
  • justifying past budgets and decisions
  • personal and expert experience, gut feelings
  • given that all this (and more) is usually part of any decision-making reality, we try lay out, below, an approach to guide decision-making.

deliberative process
our proposal draws heavily on norman daniels’s work on “accountability for reasonableness, (A4R)” a rawlsian-influenced approach to procedural justice with regards to distributing resources scarcer than the needs that require fulfilling.***** daniels asks whether in the absence of clearly fair outcomes or principles, if a fair process could be established in a particular context.

to this end, A4R pursues “pure (if imperfect) procedural justice” – a process by which, in the absence of clear principles4 of decision-making (for example, strictly following the results of a cost-effectiveness analysis** or giving complete priority to the worst-off), ex ante agreement on the process of decision-making will lead to outcomes that can be accepted as “fair.”***

in this case, we ask how we could shape the decision-making deliberation process ex ante so that, regardless of the decision taken by designated decision-makers, all stakeholders feel the decision is ‘fair’ because the process was deemed fair, even if it was not their favored outcome? daniels proposes four criteria to guide the formation of such a process.

below, we introduce the basic criteria. we will look at each of these in greater detail in a set of posts in future. (get excited!)

  1. relevant reasons

what types of reasons will be considered “relevant,” and therefore permissible, in decision-making? these reasons, once agreed could also influence the types of data collected in the evaluation itself. we are not proposing that each of the criteria could be giving an ex ante weight so that there is a precise algorithm for decision-making, only that it will be agreed in advance what is on and off the table.

another key consideration, of course, is who will be involved in setting the relevant reasons and who will be involved in the actual decision-making. would there, for example, be a mechanism for public participation or comment?

2. transparency

how transparent should the decision-making process be, including those reasons deemed relevant for decision-making? should everything be made public or does that make it too to speak honestly, some stakeholders will need ‘cover’ and to not have their full views publicized. might a commitment to transparency scare away implementing organisations from trying out innovative ideas for fear of failure – especially if it might be publicly known?

a commitment to transparency includes deciding the extent to which each of the following will be made public and at what point in time: the determined relevant reasons public, the process of deliberation, the full transcript or just a summary of the deliberation.

3. revisibility

in the initial A4R framework, based on health insurance, the revisibility criterion related to appeals made given new evidence. for donor programmes that employ a particular technology that yields it prohibitively expensive to scale, we can imagine that a breakthrough that lowers the price the technology should also lead the donor to revisit their decision not to scale.

another twist on revisibility in the case of development programmes could be an appeals process for the public / civil society that were part of the programme pilot, to argue for (or against) the programme being continued.

4. enforce-ability

the enforceability criterion requires that someone have the institutional authority to make sure that the other conditions are met.

summary / conclusion

we recognise there are many unresolved questions in operationalising the considerations we offer – relevant reasons, transparency, revisibility and enforceability. but this is an attempt to lay out considerations that matter and begin a much-needed conversation. thinking along the lines we propose and discussing how to expand on it is, we feel, the first step towards operationalising towards support decisions that evidence-informed but also, moreover, reasonable.

in sum, we want to call for better preparing evidence for decision-making (previous post) but also remind ourselves to recognise the actual – and desirable – place of evidence in these decisions (this post).

  • * we would like to acknowledge the sounding-board excellence of arjun, payal, sameer and urmy, as representatives of MNDC.
  • ** as a case of strictly following CEA and why evidence perhaps shouldn’t be - setting aside whether it can be -- the only driving reason for a decision, consider the 1990s’ Oregon effort at strictly technocratic priority-setting for medicaid.
  • *** daniels notes that the validity of the approach is premised on the supposition that we can better agree on a fair process than on principles -- this premise needs to be empirically documented and tested to move the conversation forward.
  • **** see daniels on the ‘four unsolved rationing problems’ with regard to health decision-making at a population level.
  • ***** daniels’ ideas have yet to be tested empirically.

have evidence, will… um, erm? (1 of 2)

this is a joint post with suvojit chattopadhyay, also cross-posted here.

commissioning evidence

among those who talk about development & welfare policy/programs/projects, it is tres chic to talk about evidence-informed decision-making (including the evidence on evidence-informed decision-making and the evidence on the evidence on…[insert infinite recursion]).

this concept — formerly best-known as evidence-based policy-making — is contrasted with faith-based or we-thought-really-really-hard-about-this-and-mean-well-based decision-making. it is also contrasted with the (sneaky) strategy of policy-based evidence-making. using these approaches may lead to not-optimal decision-making, adoption of not-optimal policies and subsequent not-optimal outcomes.

in contrast, proponents of the evidence-informed decision-making approach believe that through approach, decision-makers are able to make more sound judgments between those policies that will provide the best way forward, those that may not and/or those that should maybe be repealed or revised. this may lead them to make decisions on policies according to these judgments, which, if properly implemented or rolled-back may, in turn, improve development and welfare outcomes. it is also important to bear in mind however that it is not evidence alone that drives policymaking. we discuss this idea in more detail in our next post.

in this post, we work with a scenario where evidence is accepted as an important determinant of decision-making and this is acknowledged at least broadly by stakeholders who make explicit (or implicit) commitments to ‘use’ the evidence generated to drive their decisions. as good as this may sound, there are barriers to making decisions informed by evidence. one is the stock of accessible well-considered data and rigorous analyses, including the stock in readable-yet-appropriately-nuanced, relevant, timely forms. several organizations’ raison d’etre is to increase this supply of ‘much needed’ evidence. another barrier is lack of demand among decision-makers for (certain types of rigorous) evidence (not just for per diems that come with listening about evidence) – including evidence that could have positive or negative outcomes.

we don’t disagree that both supply and demand for high-quality evidence are important issues. but these two posts are not about those scenarios. rather, we focus on a scenario in which there is, at least, the demand for commissioning evidence.

key examples are donor agencies, big (I)NGOs (BINGOs, if we must) or even government ministries that engage in evidence-generating activities, particularly when the stated goal is to make decisions about piloted programs (continue funding, scale-up, scrap, etc) or make significant tweaks to on-going programs. this should be the ‘easiest’ case of using evidence to inform a decision, where demand for evidence leads to the generation of a supply of by-definition-relevant evidence.

and yet, from what we have seen and experienced, even agencies that have made it to this seeming enlightened precipice of evidence-informed decision-making don’t know, at a practical level, what to do with that evidence once they’ve got it. we are not suggesting that those inside such agencies are not skilled at reading and interpreting evidence. rather, we suggest that so much attention has been given to supplying and demanding evidence that use has been overlooked.

absent attention on use, how generated evidence informs decision-making, if it does at all, is something of a mystery. absent a plan for use, it can also be mysterious (or, at least, not transparent) as to why the agency bothered to commission the evidence-generation at all. we suspect that better considered evidence and better plans for use can improve the use of evidence. our hunches drive these two blog posts.

in this post, we make two main points.

one, we hold that that a careful formative stage during which stakeholders are engaged to help develop research questions while remaining mindful of the policy process can help generate evidence that those stakeholders will know how to use. there is overlap and complementarity between our suggestions and the recent ideas of Monitoring, Structured experiential Learning & Evaluation (MeE; Pritchett, Samji & Hammer) and Problem-Driven Iterative Adaptation (PDIA; Andrews, Pritchett & Woolcock). however, here, we remain focused on planning for evaluation and setting the questions.

two, and relatedly, we advocate for more careful planning of how the generated evidence will be used in decision-making, regardless of the outcomes. in our next post, we take seriously that evidence is far from the only decision-making criterion. we discuss how evidence might be fit into a fair, deliberative process of decision-making by agencies and what such a process might entail.

at the outset, we recognize that there is a poor one-to-one mapping of the results of a single rigorous study or paper with policy changes (e.g. and also fun). in these two posts, however, we stay focused on studies that are set up specifically to guide future decisions and thus *should*, by definition, be immediately relevant to policy/programmatic funding/scaling decisions.

formative work: assessing needs and interests of decision-makers and other stakeholders

an early and wise step, we think, in planning evaluation that is not only policy-associated (we looked at a real, live policy!) but explicitly policy-relevant in terms of decision-making is to identify what kinds of decisions may be made at the end of the evaluation (i.e. what will be informed) and who may be involved. ‘involved’ includes elite decision-makers and possible policy champions and heroes; it also includes middle- and street-level bureaucrats who will implement the policy/program if that is the decision taken (see, e.g. here and here on getting buy-in beyond visible leaders).

among those who talk about demand-generation for evidence, there’s increasing recognition that stakeholder buy-in for the process of using evidence (not just for the policy under investigation) is required early on. but there seems to be less talk on actually asking stakeholders what they want to know to make decisions. we don’t suggest that what stakeholders deem most interesting should define the limits of what will be collected, analyzed and presented. many decision-makers won’t spontaneously crave rigorous impact evaluation.

there is plenty of evidence that decision-makers are heavily influenced by stories, images, even immersive experiences. this is not categorically bad and it certainly should not be ignored or discounted. rather, in addition to the types of data and analyses readily labelled as rigorous in the impact evaluation arena, we can be creative about collecting and analyzing additional types of data in more rigorous and positioned within a counterfactual framework. because, in the end, incorporating stakeholder preference for the kinds of evidence they need to drive policy change would enhance the quality of the evidence generation process.

another consideration relates to asking what magnitude of impacts decision-makers feel they need to see to be confident in making their decisions. we don’t suggest this is an easy question to ask — nor to answer. we only suggest that it could be a useful exercise to undertake (as with all our suggestions, empirical evidence from process data about decision-making would be very helpful) .

a related exercise is to honestly assess reasoned expectations for the elapsed time between introducing an intervention and the potential expression of relevant impacts. the evaluation should be planned accordingly, as a shorter evaluation period may not generate outcomes related to the issues .

planning to use evidence

it often seems that commissioners of evidence (and even those who generate the evidence) don’t actively consider how the evidence will actually be used in design or funding or whatever decisions will be made. there seems to be that there seems to be even less consideration of how the evidence will be used regardless of what the outcome is – positive, negative, mixed, null (a point made by, among others in other fora, Jeannie Annan, here). this may be one reason null and negative results seem to go unaddressed.

if there is a (potentially imposed) desire to commission rigorous evidence, one might assume there is genuine equipoise (or uncertainty, also here) about the efficacy, effectiveness or cost-effectiveness of a policy/program. yet many talks about early buy-in are actually about the program and the potential to validate a flagship programme and justify related spending through evaluation — not about the value of the evaluation process itself for learning. we don’t think this represents the best use of evaluation resources.

an exercise early in the formative phase during which decision-makers consider how the evidence help them make a decision may be useful – if they are asked to consider scenarios in which the evidence is clearly positive, clearly negative or null, mixed, fuzzy or indeterminant. this might also help to clarify research questions that should be asked as part of an evaluation.

in a recent blog post, dr. ian goldman suggests getting decision-maker buy-in by asking “departments to submit proposals for evaluations so that they will want to use the findings.” this is an important step. but it does not mean that proposal-submitters have considered how they will use the evidence if it comes back anything but unequivocally positive for the policy/program/project in question.

dr. goldman also proposes asking departments to design “improvement plans” after their evaluations are complete. we’d like to hear more about this process. but we suspect that drafting such a plan early in the formative stage might actually inform some of the research questions, thus better linking the evaluation to action plans for improvement. for example, sophie oxfam has written about IE results that left them with an “evidence puzzle” rather than a clear idea of how to improve the program. we don’t know if an early exercise in drafting an “improvement plan” would have yielded less puzzling outcomes — but that is an empirical question.

we hope that agencies doing such formative work will document and share the processes and their experiences.

be honest about the full theory of change for using evidence

in a good evaluation, positive validation is not the only possible outcome. therefore, the commissioning agency should honestly consider whether, if the results come back null or negative, the agency would actually be willing to pull or roll-back the policy. in many cases, programs have political cache and entitlement value regardless of objective welfare benefits delivered. rolling-back will not be a politically viable option in such cases. while it is important to build the general evidence base about policy/program cost/effectiveness, when an agency asks for evidence towards a particular decision that it isn’t actually willing to make, we are not sure the eval should go forward.

or, at least, we are uncertain if it should go forward as a yes/no question, where a negative result implies stopping the program. we suspect that evaluation will start to be more appreciated by decision-makers if designed to compare the effectiveness of option A or option B in delivering the favored program, rather than only examining whether option A works (and why). the former set-up provides ways forward regardless of the outcome; the latter may, in the political sense, not.

moving forward

in sum, we think that careful formative and needs-assessment work on what decision-makers (and potential implementers) want to see to be convinced and what types of evidence will inform decision-making may lead to the generation of evidence that is not only policy-related but genuinely policy-relevant.  when an agency or ministry specifically commissions an evaluation with the stated goal of using it in decision-making, this seems particularly important. doing this work well will require collaboration between commissioners, implementers and evaluators.

in the next post, we (humbly) consider the overall role evidence plays in decision-making and consider how it might fit into an overall fair and deliberative process.

back (and forward) from ‘the big push forward’ – thoughts on why evidence is political and what to do about it

i spent the beginning of the week in brighton at the ‘big push forward‘ conference, on the politics of evidence (#evpolitics) which mixed the need for venting and catharsis (about the “results agenda” and “results-based management” and “impact evaluation”) with some productive conversation, though no immediate concreteness on how the evidence from the conference would itself be used.

in the meantime, i offer some of my take-aways from the conference – based on some great back-and-forths with some great folks (thanks!), below.

for me, the two most useful catchphrases were trying to get to “relevant rigor” (being relevantly rigorous and rigorously relevant) and to pay attention to both “glossy policy and dusty implementation.” lots of other turns-of-phrase and key terms were offered, not all of them – to my mind – terribly useful.

there was general agreement that evidence could be political in multiple dimensions. these included in:

  • what questions are asked (and in skepticism of whose ideas they are directed), by whom, of whom, with whom in mind (who needs to be convinced), for whom – and why
  • the way questions are asked and how evidence is collected
  • how evidence is used and shared – by whom, where and why
  • how impact is attributed – to interventions or to organizations (and whether this fuels competitiveness for funds and recognition)
  • whether the originators of the idea (those who already ‘knew’ something was working in some way deemed insufficiently rigorous) or the folks who analyze evidence receive credit for the idea

questions and design. in terms of what evidence is collected and what questions are asked, a big part of the ‘push back’ relates to what questions are asked and whether they help goverments and organizations improve their practice. this requires getting input from many stakeholders on what questions are important to ask. in addition, it requires planning for how the evidence will be used, including what will be done if results are (a) null, (b) mixed, confused or inconclusive, and (c) negative. more generally, this requires recognizing that policy-makers aren’t making decisions about ‘average’ situations but rather decisions for specific situations. as such, impact evaluation and systematic reviews need to help them figure out what evidence applies to their situation. the sooner expectations are dispelled that an impact evaluation or a systematic review will provide a clear answer on the what should be done next, the better.

my sense, which was certainly not consensus, is that to be useful and to avoid being blocked by egos, impact questions need to shift away from “does X work?” to “does X work better than Y?” and/or “how an X be made to work better?” this also highlights the importance of monitoring and feedback of information into learning and decision-making (i.e.).

two more points on results for learning and decision-making. first, faced with the assertion that ‘impact evaluation doesn’t reveal *why* something works,’ it is unsatisfactory to say something along the lines of ‘we look for heterogenous treatment effects.’ it absolutely also requires asking front-line workers and program recipients why they think something is and is not working — not as the final word on the matter but as a very important source of information. second, as has been pointed about many places (e.g.), designing a good impact evaluation requires explication of a clear “Theory of Change” (still not my favorite term but apparently one that is here to stay). further, it is important to recognize that articulating a ToC (or LogFrame or use of any similar tool) should never be one person’s all-nighter for a funding proposal. rather, the tool is useful as a way of collectively building consensus around mission and why & how a certain idea is meant to work. as such, time and money need to allocated for a ToC to be developed.

collection. as for the actual collection of data, there was a reasonable amount of conversation about whether the method is extractive or empowering, though probably not enough on how to shift towards empowerment and the fact that extractive/empowering are not synonymous with quant/qual. an issue that received less attention than it should have was that data collection needs to align with an understanding of how long a program should take to work (and funding cycles should be realigned accordingly).

use. again, the conversation of the use of evidence was not as robust as i had hoped. however, it was pointed out early on (by duncan green) that organizations that have been comissioning systematic reviews in fact have no plan to use that evidence systematically. moreover, there was a reasonable amount of skepticism around whether such evidence would actually be used to make decisions to allocate resources to specific organizations or projects (for example, to kill or radically alter ineffective programs). rather, there is a sense that much impact evaluation is actually policy-based evidence-making, used to justify decisions already taken. alternatively, though, there was concern that the more such evidence was used to make specific funding decisions, the more organization would be incentivized to make ‘sausage‘ numbers that serve no one. thus, the learning, feedback and improving aspects of data need emphasis.

empowerment in the use of data (as opposed to its collection) was not as much a part of the conversation as i would have hoped, though certainly people raised issues of how monitoring and evaluation data were fed-back to and used by front-line workers, implementers, and ‘recipients.’  a few people stressed the importance of near-automated feedback mechanisms from monitoring data to generate ‘dashboards’ or other means of accessable data display, including alternatives to written reports.

a big concern on use of evidence was ownership and transparency of data (and results), including how this leads to the duplication/multiplication of data collection. surprisingly, with regards to transparency of data and analysis, no one mentioned the recent reinhart & rogoff mess, nor anything about mechanisms for improving data accessibility (e.g.)

finally, there was a sense that data collected needs to be useful – that the pendulum has swung too far from a dearth of data about development programs and processes to an unused glut, such that the collection of evidence feels like ‘feeding the beast.’ again, this loops back to planning how data will be broadly used and useful before it is collected.

here’s an idea — if they are trying to tell you something, make it easy for them to do so.

there’s been a good deal of press around the unfortunately insignificant results of a major HIV prevention trial with products for women in south africa, uganda and zimbabwe. the results had little to do with efficacy of the products (a pill and a gel) but rather with the fact that most of the participating women did not use the treatments as recommended – or at all.

one potential response is to improve our behavioral interventions to support adherence to treatment regimens (and prevention regimens) and integrate these methods more directly into medication trials. adherence and persistence with medication are global problems and we are just beginning to learn – with the help of health psychology and behavioral economics – how to tackle the challenge. efforts so far include high- and low-tech solutions, though not all the promises of the former, in terms of mhealth to facilitate behavior change and adherence, have yet been borne out.

another, not mutually exclusive, response would be to actually ask the women what they would like to see and use in the way of HIV prevention – a tool which should be empowering for them. the press seems full of comments like “the women are trying to tell us something!” why does it seem that, then, for a product made for them, they have to work so hard to tell us those things? why are we not hearing more sentences that start “the women told us…” that is, why, after such a big trial, am i not hearing anything about on-going qualitative and observational follow-up efforts to learn more about what exactly didn’t work about the methods offered to women?

there’s often a lot to learn from null results (that’s science, right?) but it doesn’t just come from brainstorming what went wrong. asking helps.

i don’t suggest that people are perfectly prescient about what they need or want. often, the innovations that we can’t live without now – smartphones, for example – weren’t a need or even desire that most people could have articulated 20 years ago. as such, directly asking people what it would take to get them to engage in X desirable behavior can’t determine the research agenda. but it should certainly be part of figuring it out.