Oops, Got Long-Winded About ‘Median Impact Narratives’

*A revised version of this post is also available here.

I finally got around to reading a post that had been flagged to me awhile ago, written by Bruce Wydick. While I don’t think the general idea of taking sampling and representatives seriously is a new one, the spin of a ‘median narrative’ may be quite helpful in making qualitative and mixed work more mainstream and rigorous in (impact) evaluation.

Anyway, I got a bit long-winded in my comment on the devimpact blog site, so I am sticking it below as well, with some slight additions:

First, great that both Bruce and Bill (in the comments) have pointed out (again) that narrative has a useful value in (impact) evaluation. This is true not just for a sales hook or for helping the audience understand a concept — but because it is critical to getting beyond ‘did it work?’ to ‘why/not?’

I feel Bill’s point (“telling stories doesn’t have to be antithetical to good evaluation“) should be sharper — it’s not just that narrative is not antithetical to good evaluation but, rather, it is constitutive of good evaluation and any learning and evidence-informed decision-making agenda. And Bill’s right, part of the problem is convincing a reader that it is a median story that’s being told when an individual is used as a case study — especially when we’ve been fed outlier success stories for so long. This is why it is important to take sampling seriously for qualitative work and to report on the care that went into it. I take this to be one of Bruce’s key points and why his post is important.

I’d also like to push the idea of a median impact narrative a bit further. The basic underlying point, so far as I understand it, is a solid and important one: sampling strategy matters to qualitative work and for understanding and explaining what a range of people experienced as the result of some shock or intervention. It is not a new point but the re-branding has some important sex appeal for quantitative social scientists.

One consideration for sampling is that the same observable’s (independent vars) that drive sub-group analyses can also be used to help determine a qualitative sub-sample (capturing medians, outliers in both directions, etc). To the extent that theory drives what sub-groups are examined via any kind of data collection method, all the better. Authur Kleinman once pointed out that theory is what helps separate ethnography from journalism — an idea worth keeping in mind.

A second consideration is in the spirit of Lieberman’s call for nested analyses (or other forms of linked and sequential qual-quant work), using quantitative outcomes for the dependent variable to drive case selection, iterated down to the micro-level. The results of quantitative work can be used to inform sampling of later qualitative work, targeting those representing the range of outcomes values (on/off ‘the line’).

Both these considerations should be fit into a framework that recognizes that qualitative work has its own versions of representativeness (credibility) as well as power (saturation) (which I ramble about here).

Finally, in all of this talk about appropriate sampling for credible qualitative work, we need to also be talking about credible analysis and definitely moving beyond cherry-picked quotes as the grand offering from qualitative work. Qualitative researchers in many fields have done a lot of good work on synthesizing across stories. This needs to be reflected in ‘rigorous’ evaluation practice. Qualitative work is not just for pop-out boxes (I go so far as to pitch the idea of a qualitative pre-analysis plan).

Thanks to both Bruce and Bill for bringing attention to an important topic in improving evaluation practice as a whole — both for programmatic learning and for understanding theoretical mechanisms (as Levy-Paluck points out in her paper). I hope this is a discussion that keeps getting better and more focused on rigor and learning as a whole in evaluation, rather than quant v qual.

Advertisements

What Does It Mean To Do Policy Relevant Evaluation?

A different version of this post appears here.

For several months, I have intended to write a post about what it actually means to do research that is ‘policy relevant,’ as it seems to be a term that researchers can self-ascribe* to their work without stating clearly what this entails or if it is an ex ante goal that can be pursued. I committed to writing about it here, alluded to writing about it here, and nearly stood up to the chicken of Bristol in the interim. Now, here goes a first pass. To frame this discussion, I should point out that I exist squarely in the applied space of impact evaluation (work) and political economy and stakeholder analysis (dissertation), so my comments may only apply in those spheres.

The main thrust of the discussion is this: we (researchers, donors, folks generally bought-into the evidence-informed decision-making enterprise) should parse what passes for ‘policy relevant’ into  ‘policy adjacent’ (or ‘policy examining?’) and ‘decision relevant’ (or ‘policymaker-relevant’) so that it is clear what we are all trying to say and do. Just because research is conducted on policy does not automatically make it ‘policy relevant’ — or, more specifically, decision-relevant. it is, indeed, ‘policy adjacent,’ by walking and working alongside a real, live policy to do empirical work and answer interesting questions about whether and why that policy brought about the intended results. but this does not necessarily make it relevant to policymakers and stakeholders trying to make prioritization, programmatic, or policy decisions. In fact, by this point, it may be politically and operationally hard to make major changes to the program or policy, regardless of the evaluation outcome.

This is where more clarity (and perhaps humility) is needed.

I think this distinction was, in part, what Tom Pepinsky wrestled with when he said that it was the murky and quirky (delightful!) questions “that actually influence how they [policymakers / stakeholders] make decisions” in each of their own murky and quirky settings. these questions may be narrow, operational, and linked to a middle-range or program theory (of change) when compared to grander, paradigmatic questions and big ideas. (Interestingly, and to be thought through carefully, this seems to be the opposite of Marc Bellemare’s advice on making research in agricultural economics more policy-relevant, in which he suggests pursuing bigger questions, partially linked to ag econs often being housed in ‘hard’ or ‘life’ science departments and thus dealing with different standards and expectations.)

I am less familiar with how tom discusses what is labelled as highly policy-relevant (the TRIP policymaker survey and seeing whether policymakers are aware of a given big-thinking researcher’s big idea) and much more familiar with researchers simply getting to declare that their work is relevant to policy because it is in some way adjacent to a real! live! policy. Jeff Hammer has pointed out that even though researchers in some form of applied work on development are increasingly doing work on ‘real’ policies and programs, they are not necessarily in a better position to help high-level policymakers choose the best way forward. This needs to be taken seriously, though it is not surprising that a chief minister is asking over-arching allocative questions (invest in transport or infrastructure?) Whereas researchers may work with lower-level bureaucrats and NGO managers or even street-level/front-line workers, who have more modest goals of improving workings and (cost-)effectiveness of an existing program or trying something new.

What is decision-relevant in a particular case will depend very much on the position of the stakeholder with whom the researcher-evaluator is designing the research questions and evaluation (an early engagement and co-creation of the research questions and plan for how the evidence will be used that i consider a pre-req to doing decision-relevant work — see, e.g., the beginning of Suvojit‘s and my discussion of actually planning to use evidence to make decisions). Intention matters in being decision-relevant, to my way of thinking, and so, therefore, does deciding whose decision you are trying to inform.

I should briefly say that I think plenty of policy-adjacent work is immensely valuable and useful in informing thinking and future planning and approaches. One of my favorite works, for example, The Anti-Politics Machine, offers careful vivisection (as Ferguson calls it) of a program without actually guiding officials deciding what to do next. Learning what is and isn’t working (and why) is critically important. His book is a profound, policy-adjacent work (by being about a real program) but it did not set out to be directly decision-relevant nor is it. The book still adds tremendous value in thinking about how we should approach and think about development but it is unlikely that a given bureaucrat can use it to make a programmatic decision.

But here is where I get stuck and muddled, which one of the reasons I put off writing this for so long. at some stage of my thinking, I felt that being decision-relevant, like being policy-adjacent, required working on real, live policies and programs. In fact, in a July 2014 attempt at writing this post, I was quite sympathetic to Howard White’s argument in a seminar that a good way to avoid doing ‘silly IE’ (sillIE©?) is to evaluate real programs and policies, even though being about a real program is not an automatic buffer against being silly.

But I increasingly wonder if I am wrong about decision-relevance. Instead, the main criterion is working with a decision-maker to sort out what decision needs to be made. One outcome of such a decision is that a particular way forward is definitely not worth pursuing, meaning that there is a serious and insurmountable design failure (~in-efficacy) versus an implementation failure (~in-effectiveness). A clear-cut design failure firmly closes a door on a way forward, which is important in decision-making processes (if stakeholders are willing to have a closed door be a possible result of an evaluation). For example, one might (artificially) test a program or policy idea in a crucial or Sinatra case setting — that is, if the idea can’t make it there, it can’t make it anywhere (Gerring, attributed to Yates). door closed, decision option removed. One might also want to deliver an intervention in what H.L. Mencken called a ‘horse-doctor’s dose‘ (as noted here). again, if that whopping strong version of the program or policy doesn’t do it, it certainly won’t do it at the more likely level of administration. A similar view is expressed in running randomized evaluations, noting the ‘proof-of-concept evaluations’ can show that even “a gold-plated, best-case-scenario version of the program is not effective.” door closed, decision option removed.

Even more mind-bending, Ludwig, Kling, and Mullainathan suggest laying out how researchers may approximate the ‘look’ of a policy to test the underlying mechanism (rather than the entirety of the policy’s causal chain and potential for implementation snafus) and, again, directly informing a prioritization, programmatic, or policy decision. As they note, “in a world of limited resources, mechanism experiments concentrate resources on estimating the parameters that are most decision relevant,” serving as a ‘first screen’ as to whether a policy is even worth trying. Again, this offers an opportunity to close a door and remove a decision option. It is hard to argue that this is not decision-relevant and would not inform policy, even if the experimental evaluation is not a real policy, carried out by the people who would take the policy to scale, and so on. Done well, the suggestion is (controversially) that a mechanism experiment that shows that even under ideal or even hyper-ideal conditions (and taking appropriate time trajectory into account) a policy mechanism does not bring about the desired change could be dismissed on the basis of a single study.

But, the key criterion of early involvement of stakeholders and clarifying the question that needs to be answered remains central to this approach to decision-relevance. And, again, having an identified set of stakeholders intended to be the immediate users of evidence seems to be important to being decision-relevant. And, finally, the role of middle-range or programmatic theory (of change) and clearly identified mechanisms of how a program/policy is meant to lead to an outcome is critical in being decision-relevant. .

To return to the opening premise, it does not seem helpful to label all evaluation research associated with a real-world policy or program as ‘policy relevant.’ It is often seen as desirable to be policy relevant in the current state of (impact) evaluation work but this doesn’t mean that all policy-adjacent research projects should self-label as being policy relevant. This is easy to do when it is not entirely clear what ‘policy relevance’ means and it spreads the term too thin. To gain clarity, it helps to parse studies that are policy adjacent from those that are decision-relevant. Being relevant to decisions or policymakers demands not just stakeholder engagement (another loose term) but stakeholder identification of the questions they need answered in order to make a prioritization, programmatic, or policy decision.

There must, therefore, be clear and tangible decision-makers who intend to make use of the generated evidence to work towards a pre-stated decision goal — including a decision to shut the door on a particular policy/program option. While being policy-adjacent requires working alongside a real-world policy, being decision-relevant may not have to meet this requirement, though it does need to ex ante intend to inform a specific policy/program decision and to engage appropriately with stakeholders to this end.

This is far from a complete set of thoughts — I have more reading to do on mechanisms and more thinking to do about when murky and quirky decisions can be reasonably made for a single setting based on a single study in that murky and quirky setting. Nevertheless, the argument that there should be some clear standards for when the term ‘policy relevant’ can be applied and what it means holds.

*In the same somewhat horrifying way that a person might self-ascribe connoisseur status or a bar might self-label as being a dive. no no no, vomit.

Pipeline Designs and Equipoise: How Can They Go Together?

I am writing about phase-in / pipeline designs. Again. I’ve already done it here. and more here. but.

The premise of a pipeline or phase-in design is that groups will be randomized or otherwise experimentally allocated to receive a given intervention earlier or later. The ‘later’ group can then serve as the comparison for the ‘early’ group, allowing for a causal claim about impact to be made. I am specifically talking about phase-in designs premised on the idea that the ‘later’ group is planned (and has perhaps been promised) to receive the intervention later. I take this to be a ‘standard’ approach to phase-in designs.

I’d like to revisit the issue of phase-in designs from the angle of equipoise, which implies some sense of uncertainty about the causal impact of a given intervention. This uncertainty provides the justification for studying making use of an ex ante impact evaluation. Equipoise literally translates to equal weight / force / interest. Here, the force in question is the force of argument about the impact of an intervention and which direction it will go (or whether there will be one at all).

There have already been some great conversations, if not decisive answers, as to whether, in social science research, the justification for using experimental allocation of an intervention needs to meet the standards of clinical equipoise or policy equipoise.* The key difference is the contrast between ‘a good impact’ (clinical equipoise) and ‘the best impact achievable the resources’ (policy equipoise). In either case, it is clear that some variant of equipoise is considered a necessary justification. For theoretical and/or empirical reasons, it just isn’t clear whether an intervention is (a) good (investment).

Whichever definition of equipoise you pursue, the underlying premise is one of a genuine uncertainty and an operational knowledge gap about how well a certain intervention will work in a certain setting at a certain point in time and at what degree of relative resource efficiency. This uncertainty is what lends credibility to an ex ante impact evaluation (IE) and the ethical justification for a leave-out (‘business as usual’ or perhaps ‘minimal/basic package’) comparison group. Hence, no RCTs on parachutes.

Uncertainty implies that the impact results could plausibly, if not with fully equal likelihood, come back positive, negative, null or mixed. At least some of those outcomes imply that a program is not a good use of resources, if not actually generating adverse effects. Such a program, we might assume, should be stopped or swapped for some alternative intervention (see Berk’s comments here).

To move forward from the idea of uncertainty, the following two statements simply do not go together despite often being implicitly paired:

  1. We are uncertain about the effectiveness impact our intervention will bring about / cause, so we are doing an (any type of ex ante) IE.
  2. We plan to scale this intervention for everyone (implicitly, at least, because we believe it works – that is, the impacts are largely in the desired direction). Because of resource constraints, we will have to phase it in over time to the population.

Yes, the second point could be and is carried on to say, ‘this offers a good opportunity to have a clean identification strategy and therefore to do IE.’ But this doesn’t actually square the circle between the two statements. It still requires the type of sleight of hand around the issue of uncertainty that I raised here about policy champions..

Unless there are some built-in plans to modify (or even cancel) the program along the phase-in process, the ethics of statement 2 rests solely on the resource constraint (relative to actual or planned demand), not on any variant of equipoise. This is an important point when justifying the ethics of ex ante IE. And it is worth noting how few development programs have been halted because of IE results. It would be a helpful global public good if someone would start compiling a list of interventions that have been stopped, plausibly, because of IE outcomes, perhaps making note of the specific research design used. Please and thank you.

Moreover, unless there is some built-in planning about improving, tweaking or even scrapping the program along the way, it is not clear that the ex ante IE based on a phase-in design can fully claim to be policy relevant. This is a point I plan to elaborate in a future post but, for now, suffice it to say that I am increasingly skeptical that being about a policy (being ‘policy adjacent’ by situating a study in a policy) is the same as informing decisions about that policy (being ‘decision relevant’).

To me, the latter has stronger claims on being truly policy relevant and helping making wise and informed decisions about the use of scarce resources – which I think is the crux of this whole IE game anyway. IEs of phase-in designs without clear potential for mid-course corrections (i.e. genuine decision points) seem destined for policy adjacency, at best. Again, the underlying premise of a phase-in design is that it is a resource constraint, not an evidence constraint, which is dictating the roll-out of the program. But the intention to make a decision at least partly based on the evidence generated by an IE again rests on the premise of ex ante uncertainty about the potential for (the most cost-efficient) impact.

To come back to the issue of equipoise and phase-in designs: if the ethics of much of the work we do rests on a commitment to equipoise, then more needs to be done to clarify how we assess it and whether IRB/ethics review committees take it seriously when considering research designs. What information does a review board need to make that assessment?

Moreover, it requires giving a good think on what types of research designs align with the agreed concept of equipoise (whichever that may be). My sense is that phase-in designs can only be commensurate with the idea of equipoise if they are well-conceived, with well-conceived indicating that uncertainty about impact is indeed recognized and contingencies planned for in a meaningful way – that is, that the intervention can be stopped or altered during the phase-in process.

* I don’t propose to settle this debate between clinical and policy equipoise here, though I am sympathetic to the policy equipoise argument (and would be more so if more ex ante IEs tended towards explicitly testing two variants of an intervention against one another to see which proves the better use of resources moving forward – because forward is the general direction people intend to move in development).

Buffet of Champions: What Kind Do We Need for Impact Evaluations and Policy?

This post is also cross-posted here and here.

I realize that the thesis of “we may need a new kind of champion” sounds like a rather anemic pitch for Guardians of the Galaxy. Moreover, it may lead to inflated hopes that i am going to propose that dance-offs be used more often to decide policy questions. While I don’t necessarily deny that this is a fantastic idea (and would certainly boost c-span viewership), I want to quickly dash hopes that this is the main premise of this post.

Rather, I am curious why “we” believe that policy champions will be keen on promoting and using impact evaluation (and subsequent evidence syntheses of these) and to suggest that another range of actors, which I call “evidence” and “issue” champions may be more natural allies. there has been a recurring storyline in recent literature and musings on (impact) evaluation and policy- or decision-making:

  • First, The aspiration: the general desire of researchers (and others) to see more evidence used in decision-making (let’s say both judgment and learning) related to aid and development so that scarce resources are allocated more wisely and/or so that more resources are brought to bear on the problem.
  • Second, The dashed hopes: the realization that data and evidence currently play a limited role in decision-making (see, for example, the report on the evidence on evidence-informed policy-making as well as here).
  • Third, The new hope: the recognition that “policy champions” (also “policy entrepreneurs” and “policy opportunists”) may be a bridge between the two.
  • Fourth, The new plan of attack: bring “policy champions” and other stakeholders in to the research process much earlier in order to get up-take of evaluation results into the debates and decisions. this even includes bringing policy champions (say, bureaucrats) on as research PIs.

There seems to be a sleight of hand at work in the above formulation and it is somewhat worrying in terms of equipoise and the possible use of the range of results that can emerge from an impact evaluation study. Said another way, it seems potentially at odds with the idea that the answer to an evaluation is unknown at the start of the evaluation. .

While I am not sure that “policy champion” has been precisely defined (and, indeed, this may be part of the problem), this has been done for the policy entrepreneur concept. So far as I can tell, the first time to articulate the entrepreneurial (brokering, middle-man, risk-taking) role in policy-making comes from David E. Price in 1971. The idea was repeated and refined in the 1980s and then became more commonplace in 1990s’ discussions of public policy, in part through the work of John Kingdon. (There is also an formative and informative 1991 piece by Nancy Roberts and Paula King.)

Much of the initial discussion, it seems, came out of studying US national and state-level congressional politics but the ideas have been repeatedly shown to have merit in other deliberative settings. Much of the initial work also focused on agenda-setting — which problems and solutions gain attention — but similar functions are also important in the adoption and implementation of policy solutions. Kingdon is fairly precise about the qualities of a policy entrepreneur — someone who has, as Kingdon calls it, a pet policy that they nurture over years, waiting for good moments of opportunity to suggest their policy as the solution to a pressing problem.

  • First, such a person must have a “claim to a hearing” — that is, at least behind-the-scenes, people must respect and be willing to be listen to this person on this topic (especially if this person is not directly in a position with decision-making power).
  • Second, such a person must have networks and connections as well as an ability to bargain and negotiate within them. this is a person that can broker ideas across diverse groups of people, can “soften-up” people to the entrepreneur’s preferred policy solution, etc.
  • Third, such a person must have tenacity, persistence and a willingness to risk personal reputation and resources for a policy idea.

In Kingdon’s and others’ conception, a policy entrepreneur has to work at selling their idea over a long period of time (which is presumably why Weissert (1991) also introduced the idea of policy opportunists, who only start to champion ideas once they make it to the deliberating table and seem likely to move forward.) In short, policy entrepreneurs (and through the sloppy use of near-synonyms, policy champions,) believe strongly in a policy solution and for some reason and have put in time, effort, and reputation into moving the idea forward. Note the nebulous use of “some reason” — I have not found a definition that specifies that policy entrepreneurs must come to promote a policy through a particular impetus. Glory, gold, God, goodness, and (g’)evidence also seem to be viable motivators to fit the definition. .

My question is: is this what we need to support the use of research (and, specifically impact evaluations and syntheses thereof) on decision-making. It is not clear to me that we do. Policy entrepreneurs are people already sold on a particular policy solution, whereas the question behind much evaluation work is ‘is this the best policy solution for this context?’ (Recognizing the importance of contextual and policy, if not clinical, uncertainty about the answer in order for an evaluation to be worthwhile. It seems to me, then, that what we (researchers and evaluator’s) actually need, then, are people deeply committed to one of two things:

(1) The use of data and evidence, in general, (“evidence champions” or, at least loosely, technocrats) as an important tool in sound decision-making and/or

(2) a particular issue or problem (“issue champions” — no doubt a sexier phrase is available). i’ll spend more time on the second. .

An “issue champion,” for example, may be someone who has similar qualities of a policy entrepreneur but, rather than using claims to a hearing, a network, and tenacity to bring forward a policy solution, s/he uses these tools to bring attention to a problem — say, malaria mortality. This person feels that malaria is a problem that must be solved — and is open to finding the most (cost-) effective solution to the problem (or means to do a good job with implementing that solution).

S/He is not, by contrast, someone already committed to believing that prevention, diagnostics, or treatment in any particular form or at any particular price are the best way forward until s/he has seen evidence of this in a relevant context. This is different from a “policy champion” who has, for example, been pushing for universal bednet coverage for the past 20 years. This is not to say that you don’t want the bednet champion to be well aware of your study and to even have input into defining the research questions and approving the research design (in fact, this seems vital in lending credibility and usefulness to the results). But, the way the study is structured will be important to whether the bednet champion is open to taking up the range of possible results from your study.

If your question is: does approach A or approach B result in more efficient distribution of bednets, then yes, both sets of results will be interesting to the bednet champion.

But if the question is more of the type: are bednets the most cost-effective approach to addressing malaria mortality in our country? then the bednet champion is likely to only be particularly interested in trumpeting about one set of results: those that are significantly in favor of bednets as a solution to the malaria problem.

The malaria/issue champion (or general evidence enthusiast), on the other hand, may be more open to thinking about how to interpret and use the range of possible results from the study, which may also be mixed, inconclusive, or even negative. (Throughout this discussion, I recognize that malaria, like all problems in human and economic development, don’t have silver bullet answers and that, therefore, “A or not-A”-type evaluation questions will only get us so far in getting the right mix of tools in the right place at the right time. i.e. the answer is likely neither that bednets do not good nor that they are the only thing needed to tackle malaria.) .

The worrisomeness, then, of the policy champion is that they are already committed to a policy solution. Will they change their mind on the basis of one study? Probably not (nor, necessarily, should they. But a meta-analysis may not sway them either.) But insofar as “we” want decision-makers to learn about our evidence and to consider it in the deliberations, it may be issue, rather than policy, champions that are particularly important. They may make use of the results regardless of what they are. We cannot necessarily expect the same of the policy champion. Of course, a small army of evidence champions is also helpful. I do want to stress that it is critical to have policy champions and other stakeholders involved early in the research-design process, so that the right questions can be asked and the politically and contextually salient outcomes and magnitudes considered. But as an ally in the evaluation process and, say, a potential PI on an evaluation, it seems that the issue champions are the folks likely to stick with it. .

And, yes, issue champions should probably have some moves ready, in case of a dance-off (as there will always be factors beyond evidence and data influencing decisions).

implementation / a ban on new ideas in India

not a lot of time to write on my own blog but here’s one that i wrote for 3ie based on the recent #3ievalconf in Delhi, Measuring Results.

back (and forward) from ‘the big push forward’ – thoughts on why evidence is political and what to do about it

i spent the beginning of the week in brighton at the ‘big push forward‘ conference, on the politics of evidence (#evpolitics) which mixed the need for venting and catharsis (about the “results agenda” and “results-based management” and “impact evaluation”) with some productive conversation, though no immediate concreteness on how the evidence from the conference would itself be used.

in the meantime, i offer some of my take-aways from the conference – based on some great back-and-forths with some great folks (thanks!), below.

for me, the two most useful catchphrases were trying to get to “relevant rigor” (being relevantly rigorous and rigorously relevant) and to pay attention to both “glossy policy and dusty implementation.” lots of other turns-of-phrase and key terms were offered, not all of them – to my mind – terribly useful.

there was general agreement that evidence could be political in multiple dimensions. these included in:

  • what questions are asked (and in skepticism of whose ideas they are directed), by whom, of whom, with whom in mind (who needs to be convinced), for whom – and why
  • the way questions are asked and how evidence is collected
  • how evidence is used and shared – by whom, where and why
  • how impact is attributed – to interventions or to organizations (and whether this fuels competitiveness for funds and recognition)
  • whether the originators of the idea (those who already ‘knew’ something was working in some way deemed insufficiently rigorous) or the folks who analyze evidence receive credit for the idea

questions and design. in terms of what evidence is collected and what questions are asked, a big part of the ‘push back’ relates to what questions are asked and whether they help goverments and organizations improve their practice. this requires getting input from many stakeholders on what questions are important to ask. in addition, it requires planning for how the evidence will be used, including what will be done if results are (a) null, (b) mixed, confused or inconclusive, and (c) negative. more generally, this requires recognizing that policy-makers aren’t making decisions about ‘average’ situations but rather decisions for specific situations. as such, impact evaluation and systematic reviews need to help them figure out what evidence applies to their situation. the sooner expectations are dispelled that an impact evaluation or a systematic review will provide a clear answer on the what should be done next, the better.

my sense, which was certainly not consensus, is that to be useful and to avoid being blocked by egos, impact questions need to shift away from “does X work?” to “does X work better than Y?” and/or “how an X be made to work better?” this also highlights the importance of monitoring and feedback of information into learning and decision-making (i.e.).

two more points on results for learning and decision-making. first, faced with the assertion that ‘impact evaluation doesn’t reveal *why* something works,’ it is unsatisfactory to say something along the lines of ‘we look for heterogenous treatment effects.’ it absolutely also requires asking front-line workers and program recipients why they think something is and is not working — not as the final word on the matter but as a very important source of information. second, as has been pointed about many places (e.g.), designing a good impact evaluation requires explication of a clear “Theory of Change” (still not my favorite term but apparently one that is here to stay). further, it is important to recognize that articulating a ToC (or LogFrame or use of any similar tool) should never be one person’s all-nighter for a funding proposal. rather, the tool is useful as a way of collectively building consensus around mission and why & how a certain idea is meant to work. as such, time and money need to allocated for a ToC to be developed.

collection. as for the actual collection of data, there was a reasonable amount of conversation about whether the method is extractive or empowering, though probably not enough on how to shift towards empowerment and the fact that extractive/empowering are not synonymous with quant/qual. an issue that received less attention than it should have was that data collection needs to align with an understanding of how long a program should take to work (and funding cycles should be realigned accordingly).

use. again, the conversation of the use of evidence was not as robust as i had hoped. however, it was pointed out early on (by duncan green) that organizations that have been comissioning systematic reviews in fact have no plan to use that evidence systematically. moreover, there was a reasonable amount of skepticism around whether such evidence would actually be used to make decisions to allocate resources to specific organizations or projects (for example, to kill or radically alter ineffective programs). rather, there is a sense that much impact evaluation is actually policy-based evidence-making, used to justify decisions already taken. alternatively, though, there was concern that the more such evidence was used to make specific funding decisions, the more organization would be incentivized to make ‘sausage‘ numbers that serve no one. thus, the learning, feedback and improving aspects of data need emphasis.

empowerment in the use of data (as opposed to its collection) was not as much a part of the conversation as i would have hoped, though certainly people raised issues of how monitoring and evaluation data were fed-back to and used by front-line workers, implementers, and ‘recipients.’  a few people stressed the importance of near-automated feedback mechanisms from monitoring data to generate ‘dashboards’ or other means of accessable data display, including alternatives to written reports.

a big concern on use of evidence was ownership and transparency of data (and results), including how this leads to the duplication/multiplication of data collection. surprisingly, with regards to transparency of data and analysis, no one mentioned the recent reinhart & rogoff mess, nor anything about mechanisms for improving data accessibility (e.g.)

finally, there was a sense that data collected needs to be useful – that the pendulum has swung too far from a dearth of data about development programs and processes to an unused glut, such that the collection of evidence feels like ‘feeding the beast.’ again, this loops back to planning how data will be broadly used and useful before it is collected.