for theories of change and study planning, assumptions don’t make an ass of u & me

two nice posts came out yesterday that relate directly or tangentially to building theories of change. david evans wrote about, inter alia, what lies behind a null finding, here. marcus jenal wrote here about how complexity shouldn’t stop us from building theories of chang, up front, so long as we stand ready to adapt (parts*) of them. these two posts sort of collided in my head as ways of thinking about how tocs link to study design (as well as program planning) — thanks for inspiring me to write on a saturday morning!

.

plenty has been written on the relevance of tocs and a good place to start if you’re catching up is with craig‘s work, such as here. he, marcus, and others highlight the centrality of assumptions about why things may not work to a good theory of change. one reason to spend time on carefully hashing out the assumptions is that they help strengthen both research design and program planning, hence the blog title (the received wisdom is that when you assume, you do indeed make an ass out of u + me).

.

what excites me about building theories of change (or conceptual models, as i originally learned about them) is getting to draw simultaneously on more formal theory, on practical lessons from the empirical literature (making it useful when empirical papers offer basic information and lessons about mundane (read: fascinating and useful) implementation details), and a healthy dose of common sense, where ‘common’ includes drawing on a variety of perspectives.

.

getting a good start on a toc (even if planning to adapt and iterate throughout) is important if you, like me (with vegard, as we try to lay out in our work-in-progress here), see almost every other aspect of program planning and evaluation design as flowing from it: the content of questionnaires, the timing of data collection, which methods are most appropriate for answering which types of questions (links in the toc), what monitoring plans are needed, the enabling factors on which program design can draw and enhance, the contextual constraints a program can try to loosen, and even the way that the final report may look and the story it will try to tell about what met expectations, what didn’t, and why.

.

marcus’s post has some useful ideas about building tocs, including new-to-me cynefin and accommodating competing hypotheses in early toc iterations on how change might occuri have also written a bit about ways to try to enhance the toc-building process, here (and also some lessons from putting this into practice, here). these and other exercises will (perhaps implicitly) shed light on the ways that a program may not work as expected.

.

another set of useful ideas, especially in light of david’s post (and the paper that inspired it), can be drawn from cartwright and hardie. the book is worth checking out, as i skip over many of their ideas related to toc-building (not what they call it) to focus on one they call the ‘pre-mortem.

.

less morbidly and more relevantly, we might call it a pre-null exercise. this type of thought experiment is useful because well-considered theories of change incorporate, through assumptions, hypotheses/theories of no change — that is, all the reasons why the expected pathways to change may get blocked or washed out or never be walked at all, culminating in a null result. the existence and tractability of such roadblocks and breakdowns are important lessons to learn from any research projectthis is why thinking, early on, through the assumptions component of a toc is so critical, so that research designs can build in ways to catch potential reasons for no change.

.

the basic pre-null thought exercise is, at the beginning of project and study design, to imagine yourself at the end an analysis, with a lack of significant results. think through and list the possible reasons for this (beyond a true null) and then incorporate them both into program and study design (especially as these two seem to be (again?) moving closer together, see here and also samii on deep engagement).

.

david’s post gives some nice broad categories to consider: (1) lack of implementation fidelity or other implementation snafus (making it particularly important for a toc to include implementer incentives as well as considering the beneficiary viewpoint, as advocated by pritchett et al, among others), (2) altered behavior among ‘beneficiaries’ over time, (3) ge effects, or (4) that the intervention may work differently for various sub-groups (yes, sub-group assumptions should appear in a toc!).

.

trying to anticipate these different ways we might end up with null results means they can be better represented as toc assumptions and, accordingly, incorporated into study and questionnaire design — and we all end up wiser as a result.

.

i think it is fair to say broadly that this and many other thought exercises go un-done during the study design phase of rigorous evaluation and research (i like, for example, this posner et al paper for its effort to do some of this work ex post but of course wish it had — and think much of it could have — happened before the study). these efforts certainly go unreported and perhaps even untracked by researchers themselves, not just in the academic literature but, perhaps more upsettingly, in study reports that have fewer restrictions on words.

.

i am hoping that exercises like a pre-null thought experiment will be useful to researchers planning studies. what i am struggling to figure out is: why they aren’t happening much now.

.

here are some of my working hypotheses:

  • lack of time during program and/or study planning stages.
  • lack of clarity about toc-building (or conceptual modelling or whatever term you fancy) as being a key goal of formative work and deep stakeholder engagement (or more general lack of formative work and meaningful stakeholder engagement).
  • lack of funding for this kind of toc-building work and engagement.
  • lack of clarity about what constitutes a good theory of change and how it links to broader study and program design.
  • lack of (sociological) imagination or a sense of not needing to employ this during study design.
  • limited discussion of implementation lessons-learned (including during the pilot phase) in the empirical literature and little value (or actual disincentives) placed on sharing implementation details — good, bad, and ugly — that can inform future tocs.
  • under-valuing of theory-of-change-building (along with needs assessment and diagnostics?) as part of formal research education (these are things that can be taught, you don’t need to only learn them during your first research project, though certainly some of the lessons may only hit home then).

.

the follow-up question is, of course, how we can start to try to do better, such that inexplicable nulls become a bit more endangered.

.

*i note ‘parts’ because while from a learning perspective, we might want to be able to scrap all our initial conceptions but from an accountability (and, actually, learning) perspective, we probably want to hold some things, such as goals, as fixed.

Advertisements

i feel like an #oddeven party pooper (reducing and working are not the same)

there are two nice, evidence-informed op-ed pieces out today on delhi’s odd-even scheme to try to reduce air pollution (here and here). the results are heartening because i didn’t have a good sense of whether a two week window of implementing a policy — to which there were many exceptions — was long enough to potentially detect a statistically significant change in meaningful measures of pollution. nor, admittedly, did i feel that i was breathing cleaner air the past two weeks. as one the articles points out, much of the anecdotal chatter has been about clearer roads, not about clearer skies.

.

since i live in delhi, am certainly affected by the air quality, and worried about my health accordingly (plume tells me every day that the situation is dire), i was pretty pleased to wake up to the headline “yes delhi, it worked.” and what has indeed happened is that good evidence (rigorously obtained, as laid out by suvojit) has been generated of a statistically significant reduction in nasty particulate matter (pm 2.5) (by 18%) during the hours the intervention was in effect.

.

this was a policy that i wanted to see work, so i am pleased that the evidence shows a reduction in the particulate matter that is driving many of my good friends out of the city (alongside many other woes). but we must be careful — whether something “worked” is more subjective than is the evidence of a reduction, which greenstone and colleagues have nicely and rapidly documented.

.

if models had predicted a 50% reduction, we wouldn’t have been so thrilled about 18%. if the government had said that every little bit counts and that even a 5% reduction would be counted by them as a success and a reason to commit to continuing the program, then indeed, 18% is quite impressive.

.

moving forward, as delhi tries to clean up its act and hopefully become a model for the rest of the country, clarifying up-front decision-points and definitions of success will be important. for the next pilots — because delhi desperately needs such measures — how will we declare, in a rigorous and defensible way, that a policy effort ‘worked’ well enough to be scaled and continued?  those of us interested in promoting the use of rigorous evidence and evaluation to inform decision-making need to be slightly cautious in our interpretations and celebrations of victory when we haven’t said up front what we’ll count as a triumph.

.

*as an addendum (31 jan 2016), it is not clear that the researchers themselves penned the title ‘yes delhi, it worked.’ for the benefit of the doubt, i am hoping that the researchers submitted something more along the lines of ‘yes delhi, odd-even reduced pollution’ and that the newspaper itself opted to change it. but the point holds that success is subjective and therefore requires a definition, preferentially ex ante.

Aside

delhi’s #oddeven plan had a significant effect on pollution

agree that this is a potentially good sign about individual citizens being willing to engage in collective action – note also that many were most excited about lessened traffic, which is a good reason to think about inspiring collective action in ways that bring about both a public good and a private gain, allowing the pursuit of direct and indirect policy goals. my sense is there is still a long way to go in convincing people that the pollution is a problem and getting worse.

Suvojit Chattopadhyay

Researchers Michael Greenstone, Santosh Harish and Anant Sudarshan have some news for us. Hard data that shows that the Odd-Even plan reduced pollution by significant levels in Delhi. The headline: this study finds there was an 18% reduction in PM 2.5 due to the pilot during the hours that the rule was in effect. The effect size is truly staggering, and is quite unusual for studies that use such rigorous methodology to look at the impact of policy interventions.

Starting January 1, while absolute pollution levels increased both inside and outside Delhi (for atmospheric reasons, as noted by other commentators), the increase in fine particle levels in Delhi was significantly less than in the surrounding region. Overall, there was a 10-13 per cent relative decline in Delhi.

Around 8 am, the gap between Delhi’s pollution and that in neighbouring regions begins to form and steadily increases until mid afternoon. As temperatures…

View original post 558 more words

gratitude.

though it feels far less monumental than perhaps it should, i have done the electronic submission of my thesis, which is a big milestone in calling the thing done or, more accurately, me degree-ed, regardless of how much more work there is to do.

.

i am sure i have forgotten many people but just in case some people don’t actually get around to checking out the thesis itself — a profound but simple ‘thank you’. here are my acknowledgments:

.

a thesis seems like a lone and lonely process, with only data and tea (or stronger) to keep you company, right up until you realize how many people you have to thank. no matter how i’ve tried to keep tabs, i am sure i have forgotten people – if you know you played a role, please give yourself a pat on the back.

 .

this is an empirical dissertation based almost entirely on primary data, which would not exist without willing respondents. in tamale, this includes many private-sector retailers who gave their time to answer a lot of tiresome questions. these answers, in turn, would not have materialized without the long-standing support of a core survey team, with special thanks to abass adam yidana, damba mohammed majeed, and alidu osman tuunteya. n tuma. in accra, many people not only consented to be interviewed but have been patient guides and kept in touch and helped this thesis over its long trajectory. these include: george amofah, kwabena asante, dennis sena awitty, frank boateng, samuel boateng, alex dodoo, keziah malm, yuniwo nfor, louis nortey, daniel norgbedzie, elianne oei, ellen sam, sylvester segbaya. alex dodoo, and daniel norgbedzie have gone above and beyond. there would literally be no words (or numbers) without you.

.

i would have not been in ghana without the trust and support of günther fink and julia raifman and I would not have survived ghana without the moral, emotional, intellectual, and nutritional support and levity of becky antwi, slawa rokicki, mollie barnathan, liz schultz venable, pace phillips, suvojit chattopadhyay, usamatu salifu, salifu amidu, abubakari bukari, lindsey o’shaughnessy, lolo dessein, aqil esmail, michael polansky, sam polley, emmanuel okyere, and rachel strohm. innovations for poverty action-ghana provided much needed infrastructural support and connections; jeff mosenkis has egged me on from headquarters. nathan blanchet has been a guide on ghana and to this whole process.

 .

this thesis as a completed product would not exist without michael reich. from inspiring the ideas that went in to providing a (mostly) patient guide and forcing me to articulate my own ideas beyond, ahem, “a fucking mess” to something that is hopefully readable and possibly even, with time, enjoyable: thank you. you’ve pulled me back from the brink more than once and words don’t suffice. i know sometimes your papers take up to thirty drafts; this has taken many more and you’ve been there throughout.

.

günther fink, jessica cohen, and barbara heil: thank you for keeping me in line and inspired. günther, your enthusiasm, and barbara (mom #2), your persistence, have made a huge difference.

 .

to the swapportive team of shahira ahmed, corrina moucheraud, pamela scorza, and elif yavuz: thanks for keeping me going on so many levels. corrina moucheraud, in particular, has listened to and read many ideas and drafts that constitute what follows, though with far less brevity than her council. elif, you’ve been there, reminding me that they don’t teach kingdon in europe and that anything i do with it better be good.

 .

to an assortment of men in cambridge — thank you. john quattrochi, who helped me survive a wide variety of the perils of working abroad to early engagement with ideas (“is that what you’re trying to say?”), to getting my defense in place, to making sure the final touches were set. peter rockers, for your early skepticism and patience. jeremy barofsky, for encouragement, even sometimes by example. guy harling, for answering every stupid question i could think of while only occasionally reminding me that there are no stupid questions, only stupid people. zubin shroff, for listening and read-throughs.

 .

victoria fan, livia montana, rifat hasan, and jen manne-goehler have been sounding boards of one sort of another at various times.

 .

to the team at the center for geographic analysis, in particular jeff blossom (near and far!) and sumeeta srinivasan: i would have been lost without you.

 .

jesse bump and ashley fox have constituted a political economy crisis unit and have pulled me together and pushed me forward on more than one occasion. thanks for being key stakeholders.

 .

thank you to an intellectually and emotionally supportive community in delhi, with particular thanks to payal hathi, james pickett, and suvojit chattopadhyay for suffering through chapter drafts. bhuvana anand, shreya ray, sangita vyas, urmy shukla, jessica pickett, diane coffey, dean spears, shagun sabarwal, and markus olapade have all engaged with these ideas and the ideas are better for it. subha ganguly shahi and avi kishore have come in with key moral support.

 .

michael schulman, ian reiley, and liz richardson contributed to this being readable. nikolaos zahariadis and owen barder strengthened ideas. catherine goodman, sarah tougher, melisse murray, prashant yadav, and nora petty have been stand-by and stand-up amfm resources. marcia inhorn and norm daniels have been important mentors and models.

 .

several coffeeshops and restaurants have provided clean, well-lighted places over the years: trident and render in boston; andala and voltage in cambridge; mike’s and swad in tamale; loulou’s beignets in the woodlands; and maison des desserts, coast café, and latitude in delhi. thank you for the tea refills and unhurried surface area. and seventh heaven in rishikesh for an extended stay and support.

 ..

for my family, thanks for understanding this whole ‘abroad’ thing as best as possible and, in particular, to aunt janet for patient engagement with early drafts of the manuscript.

 .

finally, a huge thank you to my parents for absolutely everything from the mundane to the massive, from the decision to travel to details to debates to disasters (real and imagined) to deadlines to drafts-upon-drafts to the defense — even though you almost certainly never wanted know a thing about malaria policy in ghana. tusen takk.

.

chapter I, for the curious about this thing we’ve built (all mistakes my own).

strategy testing: a start

thanks to craig valters, i was recently pointed towards a new case study in the asia foundation’s working politically in practice series, focused on a ‘new’ approach called strategy testing. overall, i am sympathetic to much of the approach, though since i believe it has much in common with prototyping, product design and refinement, reasonable service delivery, etc, i am not sure if it is a wildly innovative new way of what i think many people would already see as good practice (as also acknowledged on p. 14 of the paper). it is, nevertheless, on its way to being practical.

.

the approach and what i like

as i understand it, the approach has three key features.

  1. a commitment to a theory of change as truly a set of hypotheses or best-guesses at a strategy, and therefore a living product. embedded in this is a greater commitment to humility.
  2. better individual tracking (daily? weekly?) of both external events, challenges faced, information received, and decisions taken.
  3. regular meetings (quarterly) of ‘program staff’ to review the theory of change and program approach and to refine as needed.

.

my sense is that the authors feel that the third point is the most radical of the suggestions they put forward. i disagree. i think it is point 2, having people take time out of their daily (“good”) work to document and reflect that would represent a much bigger and helpful change in the way development is practiced and will probably require more intensive skill development. future work that documents this more subtle but fundamental shift and makes suggestions to improve practice would be very useful. it shouldn’t be ignored because it is more mundane than the quarterly meetings at which an overhaul might happen.

.

overall, the approach represents an important commitment to continual learning as well as accountability in doing work that gets better and closer to success over time. it also moves a theory of change approach much more central to practice, taking it down off the dusty shelf. the approach also raises important questions about funding cycles and the power of the program team to make adjustments (see p. 14 but this should be explored more). one of the most difficult things about adaptive programming, which i do not take up in this post, will be how to make available adaptive budgeting.

.

what needs refinement

  • no matter how flexible-iterative-adaptive-dynamic-intractable-complex-unpredictable-otherbuzzwords are the problem and the program and the management approach, there seems to be nothing in this paper to suggest that, say these strategy testing meetings could not happen on a regular, (gasp) planned basis. let’s push the anti-planning reaction only as far as it needs to go (more on this below).

.

  • be clear about what is flexible; not everything is or should be. with an approach like strategy testing, it will be important to not make it too easy to redefine successful results (talked about as ‘ultimate outcomes’ in the paper). this matters not just from an accountability perspective (achieving what you said you were going to achieve, even if by a different route or on a different timeline) but also because, presumably, there was some real conviction and merit behind the goals in the first place vis-a-vis development and world-a-better-place-ness (if there wasn’t, then it is an entirely different type of problem with which we are dealing).

.

this is a key concern i have with the ‘adaptation’ movement in general: indicators, pathways, strategies, understandings of the problems, and the goals are often problematized and discounted in one breath, which glosses over too much. if all goalposts are movable, it will be quite difficult to deem any programs or strategy as simply unworthy of large resource outlay and let them go extinct.

.

in different parts of the paper, the authors say that “it is not possible to identify the outcomes and indicators at the outset of the program,” that “programs start with a broad articulation of the ultimate outcome,” and that “a precise plan of activities that will achieve results cannot be defined from the beginning.” i am more sympathetic to the framing of the second and third of these statements. the first statement seems to confuse humility with tabula rasa ignorance, which i don’t think helps move the conversation forward about how to do program planning better while also putting (structured) adaptation into practice.

.

  • define “program teams.” this term is used throughout the paper but it is hard to figure out who it includes, which has implications for how i feel about the approach, as it has implications for whose evidence and insight is deemed important. does it include front-line workers? office-based staff in the capitalif only the latter, the approach currently does not suggest how roadblocks and experiences and suggestions and feedback will be collected from the street-levelyet surely this is critical to a holistic picture of events, roadblocks, and accomplishments — and therefore choosing the path forward. the absence of the semi-systematic feedback from front-line implementers, from intended beneficiaries, from other stakeholders is problematic (distinct from saying all these people need to be physically in the room during strategy testing meetings).

.

  • the timeline and the ‘new information,’ ‘external changes,’ and ‘accomplishments and roadblocks’ seem out of sync. if the timeline is to be the key tool for daily or weekly reflection, it needs to move far beyond the sample provided in table 2 (acknowledging the potential for burdening program staff), which focuses on big-P political and high-level eventsone question is who (and how) will be put in charge of documenting such changes, through more regular interaction with stakeholders or more careful monitoring of the news as part of a monitoring strategy. a second and possibly more important question is how a timeline-type tool can be better aligned with the theory of change and require staff to engage with the assumptions therein on a more regular basis. can some of the burden on program staff be relived if m&e (or mel or merl or whatever) teams do regular debriefing interviews with staff? drilling in on these practical, small details of how one might put strategy testing into practice would be hugely useful.

.

  • at times, ‘traditional monitoring’ (which itself could be better defined so it is even clearer what strategy testing is being contrasted with or being appended onto) is painted as anachronistic; yet it must still be used in a strategy testing approach. for example, on page 11, the authors note that “by taking multiple small bets and continuously monitoring results, program teams are able to adjust and refine” (emphasis added). this suggests to me that a core set of indicators that measure progress/results towards some ultimate outcome (traditional monitoring?) are likely  in place for much of the project, a reality that sometimes gets lost in the thrust to position strategy testing as an alternative approach to monitoring. it seems like response-to-monitoring rather than monitoring itself is the bigger contribution of strategy testing and, again, sometimes this gets lost in the paper and buzzword barrage.

 

  • a key challenge raised on page 11 is not adequately addressed; the authors note: “whether a program strategy is worthy of continued investment may not be easy to decide.” more in-depth, ex ante discussion of just such decision points (see my series of blogs with suvojit, starting here) and what information will be needed to take such decisions are needed. these would need to be built into any monitoring plan, as part of the information needs for successful strategy testing. as is acknowledged in the paper, “it may be difficult for a team to accept that their strategy is not working and move on to something new, especially when they have invested heavily in that strategy.” this will make it all the more important to have up-front discussions about how to determine when something is not working (which relates to having clear, somewhat steady definitions of success).

    .

i take away from this paper that being flexible requires planning and commitments, even though at times these are painted in a negative and out-of-sync tone. it requires more managerial planning and commitment to finding time and tools and skills for reflection, to agreeing early on as to how strategic decisions will be made on the basis of evidence gathered, who will weigh in on them, on how success will be defined even if different strategic approaches to achieving it are adopted. this is acknowledged at the end of the paper, in discussing the need for structure and discipline within (and to promote) flexibility. but it should be made much more central to marketing, refining, and disseminating the approach.

.

more generally, in the movement towards adaptive and flexible development work, we need to be careful about noting where the changes really need to happen (e.g. on monitoring itself, or on better tailoring monitoring to fit with decision-making needs, or on allowing time and scope to respond to monitoring findings) and where structure and planning are needed, making flexibility/planning and structure/planning complementary rather than contrasting ideas.