i spent the beginning of the week in brighton at the ‘big push forward‘ conference, on the politics of evidence (#evpolitics) which mixed the need for venting and catharsis (about the “results agenda” and “results-based management” and “impact evaluation”) with some productive conversation, though no immediate concreteness on how the evidence from the conference would itself be used.
in the meantime, i offer some of my take-aways from the conference – based on some great back-and-forths with some great folks (thanks!), below.
for me, the two most useful catchphrases were trying to get to “relevant rigor” (being relevantly rigorous and rigorously relevant) and to pay attention to both “glossy policy and dusty implementation.” lots of other turns-of-phrase and key terms were offered, not all of them – to my mind – terribly useful.
there was general agreement that evidence could be political in multiple dimensions. these included in:
- what questions are asked (and in skepticism of whose ideas they are directed), by whom, of whom, with whom in mind (who needs to be convinced), for whom – and why
- the way questions are asked and how evidence is collected
- how evidence is used and shared – by whom, where and why
- how impact is attributed – to interventions or to organizations (and whether this fuels competitiveness for funds and recognition)
- whether the originators of the idea (those who already ‘knew’ something was working in some way deemed insufficiently rigorous) or the folks who analyze evidence receive credit for the idea
questions and design. in terms of what evidence is collected and what questions are asked, a big part of the ‘push back’ relates to what questions are asked and whether they help goverments and organizations improve their practice. this requires getting input from many stakeholders on what questions are important to ask. in addition, it requires planning for how the evidence will be used, including what will be done if results are (a) null, (b) mixed, confused or inconclusive, and (c) negative. more generally, this requires recognizing that policy-makers aren’t making decisions about ‘average’ situations but rather decisions for specific situations. as such, impact evaluation and systematic reviews need to help them figure out what evidence applies to their situation. the sooner expectations are dispelled that an impact evaluation or a systematic review will provide a clear answer on the what should be done next, the better.
my sense, which was certainly not consensus, is that to be useful and to avoid being blocked by egos, impact questions need to shift away from “does X work?” to “does X work better than Y?” and/or “how an X be made to work better?” this also highlights the importance of monitoring and feedback of information into learning and decision-making (i.e.).
two more points on results for learning and decision-making. first, faced with the assertion that ‘impact evaluation doesn’t reveal *why* something works,’ it is unsatisfactory to say something along the lines of ‘we look for heterogenous treatment effects.’ it absolutely also requires asking front-line workers and program recipients why they think something is and is not working — not as the final word on the matter but as a very important source of information. second, as has been pointed about many places (e.g.), designing a good impact evaluation requires explication of a clear “Theory of Change” (still not my favorite term but apparently one that is here to stay). further, it is important to recognize that articulating a ToC (or LogFrame or use of any similar tool) should never be one person’s all-nighter for a funding proposal. rather, the tool is useful as a way of collectively building consensus around mission and why & how a certain idea is meant to work. as such, time and money need to allocated for a ToC to be developed.
collection. as for the actual collection of data, there was a reasonable amount of conversation about whether the method is extractive or empowering, though probably not enough on how to shift towards empowerment and the fact that extractive/empowering are not synonymous with quant/qual. an issue that received less attention than it should have was that data collection needs to align with an understanding of how long a program should take to work (and funding cycles should be realigned accordingly).
use. again, the conversation of the use of evidence was not as robust as i had hoped. however, it was pointed out early on (by duncan green) that organizations that have been comissioning systematic reviews in fact have no plan to use that evidence systematically. moreover, there was a reasonable amount of skepticism around whether such evidence would actually be used to make decisions to allocate resources to specific organizations or projects (for example, to kill or radically alter ineffective programs). rather, there is a sense that much impact evaluation is actually policy-based evidence-making, used to justify decisions already taken. alternatively, though, there was concern that the more such evidence was used to make specific funding decisions, the more organization would be incentivized to make ‘sausage‘ numbers that serve no one. thus, the learning, feedback and improving aspects of data need emphasis.
empowerment in the use of data (as opposed to its collection) was not as much a part of the conversation as i would have hoped, though certainly people raised issues of how monitoring and evaluation data were fed-back to and used by front-line workers, implementers, and ‘recipients.’ a few people stressed the importance of near-automated feedback mechanisms from monitoring data to generate ‘dashboards’ or other means of accessable data display, including alternatives to written reports.
a big concern on use of evidence was ownership and transparency of data (and results), including how this leads to the duplication/multiplication of data collection. surprisingly, with regards to transparency of data and analysis, no one mentioned the recent reinhart & rogoff mess, nor anything about mechanisms for improving data accessibility (e.g.)
finally, there was a sense that data collected needs to be useful – that the pendulum has swung too far from a dearth of data about development programs and processes to an unused glut, such that the collection of evidence feels like ‘feeding the beast.’ again, this loops back to planning how data will be broadly used and useful before it is collected.