for theories of change and study planning, assumptions don’t make an ass of u & me

two nice posts came out yesterday that relate directly or tangentially to building theories of change. david evans wrote about, inter alia, what lies behind a null finding, here. marcus jenal wrote here about how complexity shouldn’t stop us from building theories of chang, up front, so long as we stand ready to adapt (parts*) of them. these two posts sort of collided in my head as ways of thinking about how tocs link to study design (as well as program planning) — thanks for inspiring me to write on a saturday morning!

.

plenty has been written on the relevance of tocs and a good place to start if you’re catching up is with craig‘s work, such as here. he, marcus, and others highlight the centrality of assumptions about why things may not work to a good theory of change. one reason to spend time on carefully hashing out the assumptions is that they help strengthen both research design and program planning, hence the blog title (the received wisdom is that when you assume, you do indeed make an ass out of u + me).

.

what excites me about building theories of change (or conceptual models, as i originally learned about them) is getting to draw simultaneously on more formal theory, on practical lessons from the empirical literature (making it useful when empirical papers offer basic information and lessons about mundane (read: fascinating and useful) implementation details), and a healthy dose of common sense, where ‘common’ includes drawing on a variety of perspectives.

.

getting a good start on a toc (even if planning to adapt and iterate throughout) is important if you, like me (with vegard, as we try to lay out in our work-in-progress here), see almost every other aspect of program planning and evaluation design as flowing from it: the content of questionnaires, the timing of data collection, which methods are most appropriate for answering which types of questions (links in the toc), what monitoring plans are needed, the enabling factors on which program design can draw and enhance, the contextual constraints a program can try to loosen, and even the way that the final report may look and the story it will try to tell about what met expectations, what didn’t, and why.

.

marcus’s post has some useful ideas about building tocs, including new-to-me cynefin and accommodating competing hypotheses in early toc iterations on how change might occuri have also written a bit about ways to try to enhance the toc-building process, here (and also some lessons from putting this into practice, here). these and other exercises will (perhaps implicitly) shed light on the ways that a program may not work as expected.

.

another set of useful ideas, especially in light of david’s post (and the paper that inspired it), can be drawn from cartwright and hardie. the book is worth checking out, as i skip over many of their ideas related to toc-building (not what they call it) to focus on one they call the ‘pre-mortem.

.

less morbidly and more relevantly, we might call it a pre-null exercise. this type of thought experiment is useful because well-considered theories of change incorporate, through assumptions, hypotheses/theories of no change — that is, all the reasons why the expected pathways to change may get blocked or washed out or never be walked at all, culminating in a null result. the existence and tractability of such roadblocks and breakdowns are important lessons to learn from any research projectthis is why thinking, early on, through the assumptions component of a toc is so critical, so that research designs can build in ways to catch potential reasons for no change.

.

the basic pre-null thought exercise is, at the beginning of project and study design, to imagine yourself at the end an analysis, with a lack of significant results. think through and list the possible reasons for this (beyond a true null) and then incorporate them both into program and study design (especially as these two seem to be (again?) moving closer together, see here and also samii on deep engagement).

.

david’s post gives some nice broad categories to consider: (1) lack of implementation fidelity or other implementation snafus (making it particularly important for a toc to include implementer incentives as well as considering the beneficiary viewpoint, as advocated by pritchett et al, among others), (2) altered behavior among ‘beneficiaries’ over time, (3) ge effects, or (4) that the intervention may work differently for various sub-groups (yes, sub-group assumptions should appear in a toc!).

.

trying to anticipate these different ways we might end up with null results means they can be better represented as toc assumptions and, accordingly, incorporated into study and questionnaire design — and we all end up wiser as a result.

.

i think it is fair to say broadly that this and many other thought exercises go un-done during the study design phase of rigorous evaluation and research (i like, for example, this posner et al paper for its effort to do some of this work ex post but of course wish it had — and think much of it could have — happened before the study). these efforts certainly go unreported and perhaps even untracked by researchers themselves, not just in the academic literature but, perhaps more upsettingly, in study reports that have fewer restrictions on words.

.

i am hoping that exercises like a pre-null thought experiment will be useful to researchers planning studies. what i am struggling to figure out is: why they aren’t happening much now.

.

here are some of my working hypotheses:

  • lack of time during program and/or study planning stages.
  • lack of clarity about toc-building (or conceptual modelling or whatever term you fancy) as being a key goal of formative work and deep stakeholder engagement (or more general lack of formative work and meaningful stakeholder engagement).
  • lack of funding for this kind of toc-building work and engagement.
  • lack of clarity about what constitutes a good theory of change and how it links to broader study and program design.
  • lack of (sociological) imagination or a sense of not needing to employ this during study design.
  • limited discussion of implementation lessons-learned (including during the pilot phase) in the empirical literature and little value (or actual disincentives) placed on sharing implementation details — good, bad, and ugly — that can inform future tocs.
  • under-valuing of theory-of-change-building (along with needs assessment and diagnostics?) as part of formal research education (these are things that can be taught, you don’t need to only learn them during your first research project, though certainly some of the lessons may only hit home then).

.

the follow-up question is, of course, how we can start to try to do better, such that inexplicable nulls become a bit more endangered.

.

*i note ‘parts’ because while from a learning perspective, we might want to be able to scrap all our initial conceptions but from an accountability (and, actually, learning) perspective, we probably want to hold some things, such as goals, as fixed.

i feel like an #oddeven party pooper (reducing and working are not the same)

there are two nice, evidence-informed op-ed pieces out today on delhi’s odd-even scheme to try to reduce air pollution (here and here). the results are heartening because i didn’t have a good sense of whether a two week window of implementing a policy — to which there were many exceptions — was long enough to potentially detect a statistically significant change in meaningful measures of pollution. nor, admittedly, did i feel that i was breathing cleaner air the past two weeks. as one the articles points out, much of the anecdotal chatter has been about clearer roads, not about clearer skies.

.

since i live in delhi, am certainly affected by the air quality, and worried about my health accordingly (plume tells me every day that the situation is dire), i was pretty pleased to wake up to the headline “yes delhi, it worked.” and what has indeed happened is that good evidence (rigorously obtained, as laid out by suvojit) has been generated of a statistically significant reduction in nasty particulate matter (pm 2.5) (by 18%) during the hours the intervention was in effect.

.

this was a policy that i wanted to see work, so i am pleased that the evidence shows a reduction in the particulate matter that is driving many of my good friends out of the city (alongside many other woes). but we must be careful — whether something “worked” is more subjective than is the evidence of a reduction, which greenstone and colleagues have nicely and rapidly documented.

.

if models had predicted a 50% reduction, we wouldn’t have been so thrilled about 18%. if the government had said that every little bit counts and that even a 5% reduction would be counted by them as a success and a reason to commit to continuing the program, then indeed, 18% is quite impressive.

.

moving forward, as delhi tries to clean up its act and hopefully become a model for the rest of the country, clarifying up-front decision-points and definitions of success will be important. for the next pilots — because delhi desperately needs such measures — how will we declare, in a rigorous and defensible way, that a policy effort ‘worked’ well enough to be scaled and continued?  those of us interested in promoting the use of rigorous evidence and evaluation to inform decision-making need to be slightly cautious in our interpretations and celebrations of victory when we haven’t said up front what we’ll count as a triumph.

.

*as an addendum (31 jan 2016), it is not clear that the researchers themselves penned the title ‘yes delhi, it worked.’ for the benefit of the doubt, i am hoping that the researchers submitted something more along the lines of ‘yes delhi, odd-even reduced pollution’ and that the newspaper itself opted to change it. but the point holds that success is subjective and therefore requires a definition, preferentially ex ante.

Aside

delhi’s #oddeven plan had a significant effect on pollution

agree that this is a potentially good sign about individual citizens being willing to engage in collective action – note also that many were most excited about lessened traffic, which is a good reason to think about inspiring collective action in ways that bring about both a public good and a private gain, allowing the pursuit of direct and indirect policy goals. my sense is there is still a long way to go in convincing people that the pollution is a problem and getting worse.

Suvojit Chattopadhyay

Researchers Michael Greenstone, Santosh Harish and Anant Sudarshan have some news for us. Hard data that shows that the Odd-Even plan reduced pollution by significant levels in Delhi. The headline: this study finds there was an 18% reduction in PM 2.5 due to the pilot during the hours that the rule was in effect. The effect size is truly staggering, and is quite unusual for studies that use such rigorous methodology to look at the impact of policy interventions.

Starting January 1, while absolute pollution levels increased both inside and outside Delhi (for atmospheric reasons, as noted by other commentators), the increase in fine particle levels in Delhi was significantly less than in the surrounding region. Overall, there was a 10-13 per cent relative decline in Delhi.

Around 8 am, the gap between Delhi’s pollution and that in neighbouring regions begins to form and steadily increases until mid afternoon. As temperatures…

View original post 558 more words

gratitude.

though it feels far less monumental than perhaps it should, i have done the electronic submission of my thesis, which is a big milestone in calling the thing done or, more accurately, me degree-ed, regardless of how much more work there is to do.

.

i am sure i have forgotten many people but just in case some people don’t actually get around to checking out the thesis itself — a profound but simple ‘thank you’. here are my acknowledgments:

.

a thesis seems like a lone and lonely process, with only data and tea (or stronger) to keep you company, right up until you realize how many people you have to thank. no matter how i’ve tried to keep tabs, i am sure i have forgotten people – if you know you played a role, please give yourself a pat on the back.

 .

this is an empirical dissertation based almost entirely on primary data, which would not exist without willing respondents. in tamale, this includes many private-sector retailers who gave their time to answer a lot of tiresome questions. these answers, in turn, would not have materialized without the long-standing support of a core survey team, with special thanks to abass adam yidana, damba mohammed majeed, and alidu osman tuunteya. n tuma. in accra, many people not only consented to be interviewed but have been patient guides and kept in touch and helped this thesis over its long trajectory. these include: george amofah, kwabena asante, dennis sena awitty, frank boateng, samuel boateng, alex dodoo, keziah malm, yuniwo nfor, louis nortey, daniel norgbedzie, elianne oei, ellen sam, sylvester segbaya. alex dodoo, and daniel norgbedzie have gone above and beyond. there would literally be no words (or numbers) without you.

.

i would have not been in ghana without the trust and support of günther fink and julia raifman and I would not have survived ghana without the moral, emotional, intellectual, and nutritional support and levity of becky antwi, slawa rokicki, mollie barnathan, liz schultz venable, pace phillips, suvojit chattopadhyay, usamatu salifu, salifu amidu, abubakari bukari, lindsey o’shaughnessy, lolo dessein, aqil esmail, michael polansky, sam polley, emmanuel okyere, and rachel strohm. innovations for poverty action-ghana provided much needed infrastructural support and connections; jeff mosenkis has egged me on from headquarters. nathan blanchet has been a guide on ghana and to this whole process.

 .

this thesis as a completed product would not exist without michael reich. from inspiring the ideas that went in to providing a (mostly) patient guide and forcing me to articulate my own ideas beyond, ahem, “a fucking mess” to something that is hopefully readable and possibly even, with time, enjoyable: thank you. you’ve pulled me back from the brink more than once and words don’t suffice. i know sometimes your papers take up to thirty drafts; this has taken many more and you’ve been there throughout.

.

günther fink, jessica cohen, and barbara heil: thank you for keeping me in line and inspired. günther, your enthusiasm, and barbara (mom #2), your persistence, have made a huge difference.

 .

to the swapportive team of shahira ahmed, corrina moucheraud, pamela scorza, and elif yavuz: thanks for keeping me going on so many levels. corrina moucheraud, in particular, has listened to and read many ideas and drafts that constitute what follows, though with far less brevity than her council. elif, you’ve been there, reminding me that they don’t teach kingdon in europe and that anything i do with it better be good.

 .

to an assortment of men in cambridge — thank you. john quattrochi, who helped me survive a wide variety of the perils of working abroad to early engagement with ideas (“is that what you’re trying to say?”), to getting my defense in place, to making sure the final touches were set. peter rockers, for your early skepticism and patience. jeremy barofsky, for encouragement, even sometimes by example. guy harling, for answering every stupid question i could think of while only occasionally reminding me that there are no stupid questions, only stupid people. zubin shroff, for listening and read-throughs.

 .

victoria fan, livia montana, rifat hasan, and jen manne-goehler have been sounding boards of one sort of another at various times.

 .

to the team at the center for geographic analysis, in particular jeff blossom (near and far!) and sumeeta srinivasan: i would have been lost without you.

 .

jesse bump and ashley fox have constituted a political economy crisis unit and have pulled me together and pushed me forward on more than one occasion. thanks for being key stakeholders.

 .

thank you to an intellectually and emotionally supportive community in delhi, with particular thanks to payal hathi, james pickett, and suvojit chattopadhyay for suffering through chapter drafts. bhuvana anand, shreya ray, sangita vyas, urmy shukla, jessica pickett, diane coffey, dean spears, shagun sabarwal, and markus olapade have all engaged with these ideas and the ideas are better for it. subha ganguly shahi and avi kishore have come in with key moral support.

 .

michael schulman, ian reiley, and liz richardson contributed to this being readable. nikolaos zahariadis and owen barder strengthened ideas. catherine goodman, sarah tougher, melisse murray, prashant yadav, and nora petty have been stand-by and stand-up amfm resources. marcia inhorn and norm daniels have been important mentors and models.

 .

several coffeeshops and restaurants have provided clean, well-lighted places over the years: trident and render in boston; andala and voltage in cambridge; mike’s and swad in tamale; loulou’s beignets in the woodlands; and maison des desserts, coast café, and latitude in delhi. thank you for the tea refills and unhurried surface area. and seventh heaven in rishikesh for an extended stay and support.

 ..

for my family, thanks for understanding this whole ‘abroad’ thing as best as possible and, in particular, to aunt janet for patient engagement with early drafts of the manuscript.

 .

finally, a huge thank you to my parents for absolutely everything from the mundane to the massive, from the decision to travel to details to debates to disasters (real and imagined) to deadlines to drafts-upon-drafts to the defense — even though you almost certainly never wanted know a thing about malaria policy in ghana. tusen takk.

.

chapter I, for the curious about this thing we’ve built (all mistakes my own).

strategy testing: a start

thanks to craig valters, i was recently pointed towards a new case study in the asia foundation’s working politically in practice series, focused on a ‘new’ approach called strategy testing. overall, i am sympathetic to much of the approach, though since i believe it has much in common with prototyping, product design and refinement, reasonable service delivery, etc, i am not sure if it is a wildly innovative new way of what i think many people would already see as good practice (as also acknowledged on p. 14 of the paper). it is, nevertheless, on its way to being practical.

.

the approach and what i like

as i understand it, the approach has three key features.

  1. a commitment to a theory of change as truly a set of hypotheses or best-guesses at a strategy, and therefore a living product. embedded in this is a greater commitment to humility.
  2. better individual tracking (daily? weekly?) of both external events, challenges faced, information received, and decisions taken.
  3. regular meetings (quarterly) of ‘program staff’ to review the theory of change and program approach and to refine as needed.

.

my sense is that the authors feel that the third point is the most radical of the suggestions they put forward. i disagree. i think it is point 2, having people take time out of their daily (“good”) work to document and reflect that would represent a much bigger and helpful change in the way development is practiced and will probably require more intensive skill development. future work that documents this more subtle but fundamental shift and makes suggestions to improve practice would be very useful. it shouldn’t be ignored because it is more mundane than the quarterly meetings at which an overhaul might happen.

.

overall, the approach represents an important commitment to continual learning as well as accountability in doing work that gets better and closer to success over time. it also moves a theory of change approach much more central to practice, taking it down off the dusty shelf. the approach also raises important questions about funding cycles and the power of the program team to make adjustments (see p. 14 but this should be explored more). one of the most difficult things about adaptive programming, which i do not take up in this post, will be how to make available adaptive budgeting.

.

what needs refinement

  • no matter how flexible-iterative-adaptive-dynamic-intractable-complex-unpredictable-otherbuzzwords are the problem and the program and the management approach, there seems to be nothing in this paper to suggest that, say these strategy testing meetings could not happen on a regular, (gasp) planned basis. let’s push the anti-planning reaction only as far as it needs to go (more on this below).

.

  • be clear about what is flexible; not everything is or should be. with an approach like strategy testing, it will be important to not make it too easy to redefine successful results (talked about as ‘ultimate outcomes’ in the paper). this matters not just from an accountability perspective (achieving what you said you were going to achieve, even if by a different route or on a different timeline) but also because, presumably, there was some real conviction and merit behind the goals in the first place vis-a-vis development and world-a-better-place-ness (if there wasn’t, then it is an entirely different type of problem with which we are dealing).

.

this is a key concern i have with the ‘adaptation’ movement in general: indicators, pathways, strategies, understandings of the problems, and the goals are often problematized and discounted in one breath, which glosses over too much. if all goalposts are movable, it will be quite difficult to deem any programs or strategy as simply unworthy of large resource outlay and let them go extinct.

.

in different parts of the paper, the authors say that “it is not possible to identify the outcomes and indicators at the outset of the program,” that “programs start with a broad articulation of the ultimate outcome,” and that “a precise plan of activities that will achieve results cannot be defined from the beginning.” i am more sympathetic to the framing of the second and third of these statements. the first statement seems to confuse humility with tabula rasa ignorance, which i don’t think helps move the conversation forward about how to do program planning better while also putting (structured) adaptation into practice.

.

  • define “program teams.” this term is used throughout the paper but it is hard to figure out who it includes, which has implications for how i feel about the approach, as it has implications for whose evidence and insight is deemed important. does it include front-line workers? office-based staff in the capitalif only the latter, the approach currently does not suggest how roadblocks and experiences and suggestions and feedback will be collected from the street-levelyet surely this is critical to a holistic picture of events, roadblocks, and accomplishments — and therefore choosing the path forward. the absence of the semi-systematic feedback from front-line implementers, from intended beneficiaries, from other stakeholders is problematic (distinct from saying all these people need to be physically in the room during strategy testing meetings).

.

  • the timeline and the ‘new information,’ ‘external changes,’ and ‘accomplishments and roadblocks’ seem out of sync. if the timeline is to be the key tool for daily or weekly reflection, it needs to move far beyond the sample provided in table 2 (acknowledging the potential for burdening program staff), which focuses on big-P political and high-level eventsone question is who (and how) will be put in charge of documenting such changes, through more regular interaction with stakeholders or more careful monitoring of the news as part of a monitoring strategy. a second and possibly more important question is how a timeline-type tool can be better aligned with the theory of change and require staff to engage with the assumptions therein on a more regular basis. can some of the burden on program staff be relived if m&e (or mel or merl or whatever) teams do regular debriefing interviews with staff? drilling in on these practical, small details of how one might put strategy testing into practice would be hugely useful.

.

  • at times, ‘traditional monitoring’ (which itself could be better defined so it is even clearer what strategy testing is being contrasted with or being appended onto) is painted as anachronistic; yet it must still be used in a strategy testing approach. for example, on page 11, the authors note that “by taking multiple small bets and continuously monitoring results, program teams are able to adjust and refine” (emphasis added). this suggests to me that a core set of indicators that measure progress/results towards some ultimate outcome (traditional monitoring?) are likely  in place for much of the project, a reality that sometimes gets lost in the thrust to position strategy testing as an alternative approach to monitoring. it seems like response-to-monitoring rather than monitoring itself is the bigger contribution of strategy testing and, again, sometimes this gets lost in the paper and buzzword barrage.

 

  • a key challenge raised on page 11 is not adequately addressed; the authors note: “whether a program strategy is worthy of continued investment may not be easy to decide.” more in-depth, ex ante discussion of just such decision points (see my series of blogs with suvojit, starting here) and what information will be needed to take such decisions are needed. these would need to be built into any monitoring plan, as part of the information needs for successful strategy testing. as is acknowledged in the paper, “it may be difficult for a team to accept that their strategy is not working and move on to something new, especially when they have invested heavily in that strategy.” this will make it all the more important to have up-front discussions about how to determine when something is not working (which relates to having clear, somewhat steady definitions of success).

    .

i take away from this paper that being flexible requires planning and commitments, even though at times these are painted in a negative and out-of-sync tone. it requires more managerial planning and commitment to finding time and tools and skills for reflection, to agreeing early on as to how strategic decisions will be made on the basis of evidence gathered, who will weigh in on them, on how success will be defined even if different strategic approaches to achieving it are adopted. this is acknowledged at the end of the paper, in discussing the need for structure and discipline within (and to promote) flexibility. but it should be made much more central to marketing, refining, and disseminating the approach.

.

more generally, in the movement towards adaptive and flexible development work, we need to be careful about noting where the changes really need to happen (e.g. on monitoring itself, or on better tailoring monitoring to fit with decision-making needs, or on allowing time and scope to respond to monitoring findings) and where structure and planning are needed, making flexibility/planning and structure/planning complementary rather than contrasting ideas.

more from #evalcon: program planning

disclaimer: i always get quite frustrated when people seem to be reinventing the wheel, especially when at least the contours of the wheel could be found with a reasonable literature review that was somewhat cross-disciplinary (i am pretty sure this is still a reasonable expectation… perhaps part of the problem is that literature is insufficiently open-access?)

.

i’ll be blunt: everyone should read just a little more before they speak, realize that they are not necessarily entering uncharted territory (including the realms of program planning, product design, & evaluation), are not great pioneers until they have assessed that for themselves from amongst the existing literature, and need to cite their sources.

.

program planning

from a lot of different corners, it seems that people involved in evaluation are suddenly ‘discovering’ that they may have a role to play in programming planning and design, whether facilitating it or doing it more directly. this ranges from frequent topics of conversation at the recent #evalcon in kathmandu to smart policy design by ben olken and others.

.

it is a natural enough ‘discovery’ when involved in an evaluation that it may have been helpful if the evaluation team had been involved earlier — say, before the program was designed. that makes sense: folks doing an evaluation tend to get hung up on details that turn out to matter, like operationalizing key concepts and goalposts, clarifying who will do what, what that will look like and how long it will take, and so on. a lot of these details would show up in a well-done theory of change.

.

not only do people planning an evaluation ask these types of questions, they also fill a useful role as outsiders, clarifying language and ideas that insiders may take for granted (which raises interesting questions about the promises and pitfalls of internal evaluators, even well trained, especially those taking on a learning as well as an accountability function).

.

it’s just that this is link and role to planning not a new discovery. i’ll give the example of the precede-proceed model, because i am familiar with it, but there are assuredly lots of models linking planning and evaluation in useful ways. i admittedly like some of the older illustrations of the precede-proceed model but respect that larry green has updated his figures and that i should move on (but if you’re curious, you can see the old ones if you search for images ‘green precede proceed’).

.

precede-proceed starts as too few programs and evaluations do: with a need assessment, based on objective indicators (wealth, disease, etc) as well as subjective indicators and intereststhis helps to form both a statement of the problem as well as setting targets for the evaluation to assess. this is an excellent time for those interested in participatory methods to employ them (rather than just employing the term ‘participatory’ whenever it makes you feel good) because this (and for program design itself) is when it really counts: getting the focus right.

.

from here, a series of diagnostics can be carried out to look for the factors (facilitating and blocking) that perpetuate the current, unsatisfactory state of the world but also allow for positive deviance. this can be a process of asking why 5 times or other tools to look for the points on which a program or policy might intervene.

.

this can then be followed by a process of assess the landscape of extant programs and policies and designing a new one, taking cues from product design, including the use of personae.

.

the evaluation may be broader than tracing these points backwards — the elements of the program or policy, the points of intervention, the different types of need identified — but this is effectively the building blocks for a well-aligned monitoring and evaluation strategy.

.

two points before moving on from the basic point that merging planning, design, and evaluation is charted territory:

  1. all of this suggests that people wanting to do good evaluation need to be better trained in the kinds of facilitating, mediating, needs assessing, and creative tasks implicated above.
  2. recognizing that design, implementation, & evaluation can be all part of the same processes is not somehow the same as saying that it is magically/conveniently unimportant to report on implementation details in an evaluation. if anyone outside the core implementation team of a project (a government agency, say, or an NGO) assists in planning, training, facilitating, framing, or any component of implementation, this needs to be reported for the sake of transparency, proper interpretation, and potential reproducibility.

 .

.

questions about independence

one of the major points echoed in the #evalcon session that i covered in my last post is that independence and unbiasedness of evaluations are hugely important in enhancing evaluative effort’s credibility among policy makers. a key challenge for anyone involved in the shifts considered in the first bit of this blog — evaluative folks thinking about getting involved early on in program design — is going to be how to instill and project integrity and trustworthiness of evaluation while letting go a bit on strict independence, in the sense of remaining arms’ length from the evaluation subject. to the extent that decision-makers and other stakeholders are a key audience, evaluators will be well-served by taking the time to understand what they see as credible and convincing evidence.

.

thoughts from #evalcon on evidence uptake, capacity building

i attended a great panel today, hosted by the think take initiative and idrc and featuring representatives from three of tti’s cohort of think tanks. this is part of the broader global evaluation week (#evalcon) happening in kathmandu and focused on building bridges: use of evaluation for decision making and policy influence. the notes on evidence-uptake largely come from the session while the notes on capacity building are my own musings inspired by the event.

.

one point early-on was to contrast evidence-informed decision-making with opinion-informed decision-making. i’ve usually heard the contrast painted as faith-based decision-making and think the opinion framing was useful. it also comes in handy for one of the key takeaways from the session, which is that maybe the point (and feasible goal) isn’t to do away with opinion-based decision-making but rather to make sure that opinions are increasingly shaped by rigorous evaluative evidence. or to be more bayesian about it, we want decision-makers to continuously update their priors about different issues, drawing on evidence.

.

this leads to a second point. in focusing on policy influence, we may become too focused on influencing very specific decision-makers for very specific decisions. this may lead us to lose sight of the broader goal of (re-)shaping the opinions of a wide variety of stakeholders and decision-makers, even if not linked to the immediate policy or program under evaluation. so, again, the frame of shaping opinions and aiming for decision-maker/power-center rather than policy-specific influence may lead to altered approaches, goals, and benchmarks.

.

a third point that echoed throughout the panel is that policy influence takes time. new ideas need time to sink in and percolate before opinions are re-shaped. secretary suman prasad sharma of nepal noted that from a decision-maker point of view, evaluations are better and more digestible when they aim to build bit by bit. participants invoked a building blocks metaphor several times and contrasted it with “big bang” results. a related and familiar point about the time and timing required for evaluation to change opinions and shape decisions is that planning for the next phase of the program cycle generally begins midway through current programming. if evaluation is to inform this next stage of planning, it requires the communication of interim results — or a more thoughtful shift of the program planning cycle relative to monitoring and evaluation funding cycles in general.

.

a general point that came up repeatedly was what constitutes a good versus a bad evaluation. this leads to a key capacity-building point: we need more “capacity-building” to help decision-makers recognize credible, rigorous evidence and to mediate between conflicting findings. way too often, in my view, capacity-building ends up being about how particular methods are carried out, rather than on the central task of identifying credible methodologies and weighting the findings accordingly (or on broader principles of causal inference). that is, capacity-building among decision-makers needs to (a) understand how they currently assess credibility (on a radical premise that capacity-building exercises might generate capacity on both sides) and (b) help them become better consumers, not producers, of evidence.

.

a point that surfaced continuously about how decision-makers assess evidence was about objectivity and neutrality. ‘bad evaluations’ are biased and opinionated; ‘good evaluations’ are objective. there is probably a much larger conversation to be had about parsing objectivity from independence and engagement as well as further assessment of how decision-makers assess neutrality and how evaluators might establish and signal their objectivity. as a musing: a particular method doesn’t guarantee neutrality, which can also be violated in shaping the questions, selecting the site and sample, and so on.

.

other characteristics of ‘good evaluation’ that came out included those that don’t confuse being critical with only being negative. findings about what is working are also appreciated. ‘bad evaluation’ assigns blame and accountability to particular stakeholders without looking through a nuanced view of the context and events (internal and external) during the evaluation. ‘good evaluation’ involves setting eval objectives up front. ‘good evaluation’ also places the findings in the context of other evidence on the same topic; this literature/evidence review work, especially when it does not focus on a single methodology or discipline (and, yes, i am particularly alluding to RCT authors that tend to only cite other RCTs, at the expense of sectoral evidence and simply other methodologies), is very helpful to a decision-making audience, as is helping to make sense of conflicting findings.

..

a final set of issues related to timing and transaction costs. a clear refrain throughout the panel is the importance of the timing of sharing the findings. this means paying attention to the budget-making cycle and sharing results at just the right moment. it means seeing windows of receptivity to evidence on particular topics, reframing the evidence accordingly, and sharing it with decision-makers and the media. it probably means learning a lot more from effective lobbyists. staying in tune with policy and media cycles in a given evaluation context is hugely time consuming. a point was made and is well-taken that the transaction costs of this kind of staying-in-tune for policy influence is quite high for researchers. perhaps goals for influence by the immediate researchers and evaluators should be more modest, at least when shaping a specific decision was not the explicit purpose of the evaluation.

.

one is to communicate the findings clearly to and to do necessary capacity-building with naturally sympathetic decision-makers (say, parliamentarians or bureaucrats with an expressed interest in x issue) to become champions to keep the discussion going within decision-making bodies. to reiterate, my view is that a priority for capacity-building efforts should focus on helping decision-makers become evidence champions and good communicators of specific evaluation and research findings. this is an indirect road to influence but an important one, leveraging the credibility of decision-makers with one another. two, also indirect, is to communicate the findings clearly to and to do necessary capacity-building with the types of (advocacy? think tank?) organizations whose job is to focus on the timing of budget meetings and shifting political priorities and local events to which the evidence can be brought to bear.

.

the happy closing point was that a little bit of passion in evaluation, even while trying to remain neutral and objective, does not hurt.