i feel like an #oddeven party pooper (reducing and working are not the same)

there are two nice, evidence-informed op-ed pieces out today on delhi’s odd-even scheme to try to reduce air pollution (here and here). the results are heartening because i didn’t have a good sense of whether a two week window of implementing a policy — to which there were many exceptions — was long enough to potentially detect a statistically significant change in meaningful measures of pollution. nor, admittedly, did i feel that i was breathing cleaner air the past two weeks. as one the articles points out, much of the anecdotal chatter has been about clearer roads, not about clearer skies.

.

since i live in delhi, am certainly affected by the air quality, and worried about my health accordingly (plume tells me every day that the situation is dire), i was pretty pleased to wake up to the headline “yes delhi, it worked.” and what has indeed happened is that good evidence (rigorously obtained, as laid out by suvojit) has been generated of a statistically significant reduction in nasty particulate matter (pm 2.5) (by 18%) during the hours the intervention was in effect.

.

this was a policy that i wanted to see work, so i am pleased that the evidence shows a reduction in the particulate matter that is driving many of my good friends out of the city (alongside many other woes). but we must be careful — whether something “worked” is more subjective than is the evidence of a reduction, which greenstone and colleagues have nicely and rapidly documented.

.

if models had predicted a 50% reduction, we wouldn’t have been so thrilled about 18%. if the government had said that every little bit counts and that even a 5% reduction would be counted by them as a success and a reason to commit to continuing the program, then indeed, 18% is quite impressive.

.

moving forward, as delhi tries to clean up its act and hopefully become a model for the rest of the country, clarifying up-front decision-points and definitions of success will be important. for the next pilots — because delhi desperately needs such measures — how will we declare, in a rigorous and defensible way, that a policy effort ‘worked’ well enough to be scaled and continued?  those of us interested in promoting the use of rigorous evidence and evaluation to inform decision-making need to be slightly cautious in our interpretations and celebrations of victory when we haven’t said up front what we’ll count as a triumph.

.

*as an addendum (31 jan 2016), it is not clear that the researchers themselves penned the title ‘yes delhi, it worked.’ for the benefit of the doubt, i am hoping that the researchers submitted something more along the lines of ‘yes delhi, odd-even reduced pollution’ and that the newspaper itself opted to change it. but the point holds that success is subjective and therefore requires a definition, preferentially ex ante.

thoughts from #evalcon on evidence uptake, capacity building

i attended a great panel today, hosted by the think take initiative and idrc and featuring representatives from three of tti’s cohort of think tanks. this is part of the broader global evaluation week (#evalcon) happening in kathmandu and focused on building bridges: use of evaluation for decision making and policy influence. the notes on evidence-uptake largely come from the session while the notes on capacity building are my own musings inspired by the event.

.

one point early-on was to contrast evidence-informed decision-making with opinion-informed decision-making. i’ve usually heard the contrast painted as faith-based decision-making and think the opinion framing was useful. it also comes in handy for one of the key takeaways from the session, which is that maybe the point (and feasible goal) isn’t to do away with opinion-based decision-making but rather to make sure that opinions are increasingly shaped by rigorous evaluative evidence. or to be more bayesian about it, we want decision-makers to continuously update their priors about different issues, drawing on evidence.

.

this leads to a second point. in focusing on policy influence, we may become too focused on influencing very specific decision-makers for very specific decisions. this may lead us to lose sight of the broader goal of (re-)shaping the opinions of a wide variety of stakeholders and decision-makers, even if not linked to the immediate policy or program under evaluation. so, again, the frame of shaping opinions and aiming for decision-maker/power-center rather than policy-specific influence may lead to altered approaches, goals, and benchmarks.

.

a third point that echoed throughout the panel is that policy influence takes time. new ideas need time to sink in and percolate before opinions are re-shaped. secretary suman prasad sharma of nepal noted that from a decision-maker point of view, evaluations are better and more digestible when they aim to build bit by bit. participants invoked a building blocks metaphor several times and contrasted it with “big bang” results. a related and familiar point about the time and timing required for evaluation to change opinions and shape decisions is that planning for the next phase of the program cycle generally begins midway through current programming. if evaluation is to inform this next stage of planning, it requires the communication of interim results — or a more thoughtful shift of the program planning cycle relative to monitoring and evaluation funding cycles in general.

.

a general point that came up repeatedly was what constitutes a good versus a bad evaluation. this leads to a key capacity-building point: we need more “capacity-building” to help decision-makers recognize credible, rigorous evidence and to mediate between conflicting findings. way too often, in my view, capacity-building ends up being about how particular methods are carried out, rather than on the central task of identifying credible methodologies and weighting the findings accordingly (or on broader principles of causal inference). that is, capacity-building among decision-makers needs to (a) understand how they currently assess credibility (on a radical premise that capacity-building exercises might generate capacity on both sides) and (b) help them become better consumers, not producers, of evidence.

.

a point that surfaced continuously about how decision-makers assess evidence was about objectivity and neutrality. ‘bad evaluations’ are biased and opinionated; ‘good evaluations’ are objective. there is probably a much larger conversation to be had about parsing objectivity from independence and engagement as well as further assessment of how decision-makers assess neutrality and how evaluators might establish and signal their objectivity. as a musing: a particular method doesn’t guarantee neutrality, which can also be violated in shaping the questions, selecting the site and sample, and so on.

.

other characteristics of ‘good evaluation’ that came out included those that don’t confuse being critical with only being negative. findings about what is working are also appreciated. ‘bad evaluation’ assigns blame and accountability to particular stakeholders without looking through a nuanced view of the context and events (internal and external) during the evaluation. ‘good evaluation’ involves setting eval objectives up front. ‘good evaluation’ also places the findings in the context of other evidence on the same topic; this literature/evidence review work, especially when it does not focus on a single methodology or discipline (and, yes, i am particularly alluding to RCT authors that tend to only cite other RCTs, at the expense of sectoral evidence and simply other methodologies), is very helpful to a decision-making audience, as is helping to make sense of conflicting findings.

..

a final set of issues related to timing and transaction costs. a clear refrain throughout the panel is the importance of the timing of sharing the findings. this means paying attention to the budget-making cycle and sharing results at just the right moment. it means seeing windows of receptivity to evidence on particular topics, reframing the evidence accordingly, and sharing it with decision-makers and the media. it probably means learning a lot more from effective lobbyists. staying in tune with policy and media cycles in a given evaluation context is hugely time consuming. a point was made and is well-taken that the transaction costs of this kind of staying-in-tune for policy influence is quite high for researchers. perhaps goals for influence by the immediate researchers and evaluators should be more modest, at least when shaping a specific decision was not the explicit purpose of the evaluation.

.

one is to communicate the findings clearly to and to do necessary capacity-building with naturally sympathetic decision-makers (say, parliamentarians or bureaucrats with an expressed interest in x issue) to become champions to keep the discussion going within decision-making bodies. to reiterate, my view is that a priority for capacity-building efforts should focus on helping decision-makers become evidence champions and good communicators of specific evaluation and research findings. this is an indirect road to influence but an important one, leveraging the credibility of decision-makers with one another. two, also indirect, is to communicate the findings clearly to and to do necessary capacity-building with the types of (advocacy? think tank?) organizations whose job is to focus on the timing of budget meetings and shifting political priorities and local events to which the evidence can be brought to bear.

.

the happy closing point was that a little bit of passion in evaluation, even while trying to remain neutral and objective, does not hurt.

Thinking About Building Evaluation Ownership, Theories of Change — Back From Canadian Evaluation Society

This week I had the pleasure of attending the Canadian Evaluation Society (#EvalC2015) meeting in Montreal, which brought together a genuinely nice group of people thinking not just hard a-boot evaluation strategies and methodologies but also how evaluation can contribute to better and more transparent governance, improving our experience as global and national citizens — that is, evaluation as a political (and even social justice) as much as a technical act.

Similarly, there was some good conversation around the balance between the evaluation function being about accountability versus learning and improvement and concern about when the pendulum swings too far to an auditing rather than an elucidating and improving role.

For now, I want to zoom in on a two important themes and more own nascent reflections on them. I’ll be delighted to get feedback on these thoughts, as I am continuing to firm them up myself. my thoughts are in italics, below.

  1. Collaboration, neutrality and transparency
    1. There were several important calls relating to transparency, including a commitment to making evaluation results public (and taking steps to make sure citizens see these results (without influencing their interpretation of them or otherwise playing an advocacy role)) and for decision-makers claiming to have made use of evidence to inform their decisions to be more open about how and which evidence played this role. This is quite an important point and it echoes some of the points Suvojit and I made about thinking about the use of evaluative evidence ex ante. We’re continuing to write about this, so stay tuned.
    2. There was quite a bit of push back about whether evaluation should be ‘neutral’ or ‘arm’s length’ from the program — apparently this is the current standard practice in Canada (with government evaluations). This push back seems to echo several conversations in impact evaluation about beginning stakeholder engagement and collaboration far earlier in the evaluation process, including Howard White’s consideration of evaluative independence.
    3. Part of the push back on ‘arm’s length neutrality’ came from J. Bradley Cousins, who will have a paper and ‘stimulus document’ coming out in the near future on collaborative evaluation that seems likely to be quite interesting. In another session, it was noted that ‘collaboration has more integrity than arm’s length approaches. I particularly liked the idea of thinking about how engagement between researchers and program/implementation folks could improve a culture of evaluative thinking and organizational learning — a type of ‘capacity building’ we don’t talk about all that often. Overall, I am on board with the idea of collaborative evaluation, with the major caveat that evaluators need to report honestly about the role the play vis-a-vis refining program theory, refining the program contents, assisting with implementing the program, monitoring, etc.
  2. Building a theory of change and fostering ownership in an evaluation.
    1. There was a nice amount of discussion around making sure that program staff, implementers, and a variety of stakeholders could “see themselves” in the theory of change and logic model/results chain. This not only included that they could locate their roles but also that these planning and communication tools reflected the language with which they were used to talking about their work. Ideally, program staff can also understanding their roles and contributions in light of their spheres of direct and indirect influence.
    2. John Mayne and Steve Montague made some very interesting points about building a theory of change, which I will have to continue to process over the upcoming weeks. they include:
      1. Making sure to think about ‘who’ in addition to ‘what’ and ‘why’ — this includes, I believe, who is doing what (different types and levels of implementer’s) as well as defining intended reach, recognizing that some sub-groups may require different strategies and assumptions in order for an intervention to reach them.
      2. As was noted “frameworks that don’t consider reach conspire against equity and fairness” because “risks live on the margin.” I haven’t fully wrapped my head around the idea of ‘theories of reach’ embedded or nested within the theory of change but am absolutely on-board with considering distributional expectations and challenges from the beginning and articulating assumptions about when and why we might expect heterogeneous treatment effects — and deploying quantitative and qualitative measurement strategies accordingly.
    3. John Mayne advocated his early thoughts that for each assumption in a theory of change, the builders should articulate a justification for its:
      1. Necessity — why is this assumption needed?
      2. Realization — why is this assumption likely to be realized in this context?
      3. This sounds like an interesting way to plan exercises towards collaborative building of theories of change
    4. a productive discussion developed (fostered by John Mayne, Steve Montague and Kaireen Chaytor, among others) around how to get program staff involved in articulating the theory of change. A few key points were recurring — with strong implications for how long a lead time is needed to set up an evaluation properly (which will have longer-term benefits even if it seems to be slightly inefficient upfront):
      1. Making a theory of change and its assumptions explicit is part of a reflective practice of operations and implementation.
      2. Don’t try to start tabula rasa in articulating the theory of change (‘the arrows’) with the implementing and program staff. Start with the program documents, including their articulation of the logic model or results chain (the ‘boxes’ in a diagrammatic theory of change) and use this draft as the starting point for dialogue.
      3. It may help to start with one-on-ones with some key program informants, trying to unpack what lies in the arrows connecting the results boxes. This means digging into the ‘nitty girtty’ micro-steps and assumptions, avoiding magical leaps and miraculous interventions. Starting with one-on-ones, rather than gathering the whole group to consider the results chain, can help to manage some conflict and confusion and build a reasonable starting point — despite the fact that:
      4. Several commentators pointed out that it is unimportant whether the initial results chain was validated or correct — or was even set up as a straw-person. Rather, what is important was having something common and tangible that could serve as a touchstone or boundary object in bringing together the evaluators and implementer’s around a tough conversation. In fact, having some flaws in the initial evaluators’ depiction of the results chain and theory of change allows opportunities for program staff to be the experts and correct these misunderstandings, helping to ensure that program staff are not usurped in the evaluation design process.
      5. Initial disagreement around the assumptions (all the stuff behind the arrows) in the theory of change can be productive if they are allowed to lead to dialogue and consensus-building. Keep in mind that the theory of change can be a collaborative force. As Steve Montague noted, “building a theory of change is a team sport,” and needs to be an iterative process between multiple stakeholders all on a ‘collective learning journey.’
        1. One speaker suggested setting up a working group within the implementing agency to work on building the theory of change and, moreover, to make sure that everyone internally understands the program in the same way.
    5. This early engagement work is the time to get construct validity right.
    6. The data collection tools developed must, must, must align with the theory of change developed collectively. This is also a point Shagun and i made in our own presentation at the conference, where we discussed our working paper on meaningfully mixing methods in impact evaluation. stay tuned!
    7. The onus is on the evaluator to make sure that the theory of change is relevant to many stakeholders and that the language used is familiar to them.
    8. There was also a nice discussion about making sure to get leadership buy-in and cooperation early in the process on what the results reporting will look like. Ideally the reporting will also reflect the theory of change.

Overall, much to think about and points that I will definitely be coming back to in later work. Thanks again for a great conference.

What Does It Mean To Do Policy Relevant Evaluation?

A different version of this post appears here.

For several months, I have intended to write a post about what it actually means to do research that is ‘policy relevant,’ as it seems to be a term that researchers can self-ascribe* to their work without stating clearly what this entails or if it is an ex ante goal that can be pursued. I committed to writing about it here, alluded to writing about it here, and nearly stood up to the chicken of Bristol in the interim. Now, here goes a first pass. To frame this discussion, I should point out that I exist squarely in the applied space of impact evaluation (work) and political economy and stakeholder analysis (dissertation), so my comments may only apply in those spheres.

The main thrust of the discussion is this: we (researchers, donors, folks generally bought-into the evidence-informed decision-making enterprise) should parse what passes for ‘policy relevant’ into  ‘policy adjacent’ (or ‘policy examining?’) and ‘decision relevant’ (or ‘policymaker-relevant’) so that it is clear what we are all trying to say and do. Just because research is conducted on policy does not automatically make it ‘policy relevant’ — or, more specifically, decision-relevant. it is, indeed, ‘policy adjacent,’ by walking and working alongside a real, live policy to do empirical work and answer interesting questions about whether and why that policy brought about the intended results. but this does not necessarily make it relevant to policymakers and stakeholders trying to make prioritization, programmatic, or policy decisions. In fact, by this point, it may be politically and operationally hard to make major changes to the program or policy, regardless of the evaluation outcome.

This is where more clarity (and perhaps humility) is needed.

I think this distinction was, in part, what Tom Pepinsky wrestled with when he said that it was the murky and quirky (delightful!) questions “that actually influence how they [policymakers / stakeholders] make decisions” in each of their own murky and quirky settings. these questions may be narrow, operational, and linked to a middle-range or program theory (of change) when compared to grander, paradigmatic questions and big ideas. (Interestingly, and to be thought through carefully, this seems to be the opposite of Marc Bellemare’s advice on making research in agricultural economics more policy-relevant, in which he suggests pursuing bigger questions, partially linked to ag econs often being housed in ‘hard’ or ‘life’ science departments and thus dealing with different standards and expectations.)

I am less familiar with how tom discusses what is labelled as highly policy-relevant (the TRIP policymaker survey and seeing whether policymakers are aware of a given big-thinking researcher’s big idea) and much more familiar with researchers simply getting to declare that their work is relevant to policy because it is in some way adjacent to a real! live! policy. Jeff Hammer has pointed out that even though researchers in some form of applied work on development are increasingly doing work on ‘real’ policies and programs, they are not necessarily in a better position to help high-level policymakers choose the best way forward. This needs to be taken seriously, though it is not surprising that a chief minister is asking over-arching allocative questions (invest in transport or infrastructure?) Whereas researchers may work with lower-level bureaucrats and NGO managers or even street-level/front-line workers, who have more modest goals of improving workings and (cost-)effectiveness of an existing program or trying something new.

What is decision-relevant in a particular case will depend very much on the position of the stakeholder with whom the researcher-evaluator is designing the research questions and evaluation (an early engagement and co-creation of the research questions and plan for how the evidence will be used that i consider a pre-req to doing decision-relevant work — see, e.g., the beginning of Suvojit‘s and my discussion of actually planning to use evidence to make decisions). Intention matters in being decision-relevant, to my way of thinking, and so, therefore, does deciding whose decision you are trying to inform.

I should briefly say that I think plenty of policy-adjacent work is immensely valuable and useful in informing thinking and future planning and approaches. One of my favorite works, for example, The Anti-Politics Machine, offers careful vivisection (as Ferguson calls it) of a program without actually guiding officials deciding what to do next. Learning what is and isn’t working (and why) is critically important. His book is a profound, policy-adjacent work (by being about a real program) but it did not set out to be directly decision-relevant nor is it. The book still adds tremendous value in thinking about how we should approach and think about development but it is unlikely that a given bureaucrat can use it to make a programmatic decision.

But here is where I get stuck and muddled, which one of the reasons I put off writing this for so long. at some stage of my thinking, I felt that being decision-relevant, like being policy-adjacent, required working on real, live policies and programs. In fact, in a July 2014 attempt at writing this post, I was quite sympathetic to Howard White’s argument in a seminar that a good way to avoid doing ‘silly IE’ (sillIE©?) is to evaluate real programs and policies, even though being about a real program is not an automatic buffer against being silly.

But I increasingly wonder if I am wrong about decision-relevance. Instead, the main criterion is working with a decision-maker to sort out what decision needs to be made. One outcome of such a decision is that a particular way forward is definitely not worth pursuing, meaning that there is a serious and insurmountable design failure (~in-efficacy) versus an implementation failure (~in-effectiveness). A clear-cut design failure firmly closes a door on a way forward, which is important in decision-making processes (if stakeholders are willing to have a closed door be a possible result of an evaluation). For example, one might (artificially) test a program or policy idea in a crucial or Sinatra case setting — that is, if the idea can’t make it there, it can’t make it anywhere (Gerring, attributed to Yates). door closed, decision option removed. One might also want to deliver an intervention in what H.L. Mencken called a ‘horse-doctor’s dose‘ (as noted here). again, if that whopping strong version of the program or policy doesn’t do it, it certainly won’t do it at the more likely level of administration. A similar view is expressed in running randomized evaluations, noting the ‘proof-of-concept evaluations’ can show that even “a gold-plated, best-case-scenario version of the program is not effective.” door closed, decision option removed.

Even more mind-bending, Ludwig, Kling, and Mullainathan suggest laying out how researchers may approximate the ‘look’ of a policy to test the underlying mechanism (rather than the entirety of the policy’s causal chain and potential for implementation snafus) and, again, directly informing a prioritization, programmatic, or policy decision. As they note, “in a world of limited resources, mechanism experiments concentrate resources on estimating the parameters that are most decision relevant,” serving as a ‘first screen’ as to whether a policy is even worth trying. Again, this offers an opportunity to close a door and remove a decision option. It is hard to argue that this is not decision-relevant and would not inform policy, even if the experimental evaluation is not a real policy, carried out by the people who would take the policy to scale, and so on. Done well, the suggestion is (controversially) that a mechanism experiment that shows that even under ideal or even hyper-ideal conditions (and taking appropriate time trajectory into account) a policy mechanism does not bring about the desired change could be dismissed on the basis of a single study.

But, the key criterion of early involvement of stakeholders and clarifying the question that needs to be answered remains central to this approach to decision-relevance. And, again, having an identified set of stakeholders intended to be the immediate users of evidence seems to be important to being decision-relevant. And, finally, the role of middle-range or programmatic theory (of change) and clearly identified mechanisms of how a program/policy is meant to lead to an outcome is critical in being decision-relevant. .

To return to the opening premise, it does not seem helpful to label all evaluation research associated with a real-world policy or program as ‘policy relevant.’ It is often seen as desirable to be policy relevant in the current state of (impact) evaluation work but this doesn’t mean that all policy-adjacent research projects should self-label as being policy relevant. This is easy to do when it is not entirely clear what ‘policy relevance’ means and it spreads the term too thin. To gain clarity, it helps to parse studies that are policy adjacent from those that are decision-relevant. Being relevant to decisions or policymakers demands not just stakeholder engagement (another loose term) but stakeholder identification of the questions they need answered in order to make a prioritization, programmatic, or policy decision.

There must, therefore, be clear and tangible decision-makers who intend to make use of the generated evidence to work towards a pre-stated decision goal — including a decision to shut the door on a particular policy/program option. While being policy-adjacent requires working alongside a real-world policy, being decision-relevant may not have to meet this requirement, though it does need to ex ante intend to inform a specific policy/program decision and to engage appropriately with stakeholders to this end.

This is far from a complete set of thoughts — I have more reading to do on mechanisms and more thinking to do about when murky and quirky decisions can be reasonably made for a single setting based on a single study in that murky and quirky setting. Nevertheless, the argument that there should be some clear standards for when the term ‘policy relevant’ can be applied and what it means holds.

*In the same somewhat horrifying way that a person might self-ascribe connoisseur status or a bar might self-label as being a dive. no no no, vomit.

Thinking About Stakeholder Risk and Accountability in Pilot Experiments

This post is also cross-posted here in slightly modified form.

Since I keep circling around issues related to my dissertation in this blog, I decided it was time to start writing about some of that work. As anyone who has stood or sat near to me for more than 5 minutes over the past 4.25 years will know, in my thesis I examine the political-economy of adopting and implementing a large global health program (the affordable medicines facility – malaria or “AMFm”). This program was designed at the global level (meaning largely in D.C. and Geneva with tweaking workshops in assorted African capitals). Global actors invited select Sub-Saharan African countries to apply to pilot the AMFm for two years before any decision was made to continue, modify, scale-up, or terminate. It should also be noted from the outset that it was not fully clear what role the evidence would play in the board’s decision and how the evidence would be interpreted. As I highlight below, this lack of clarity helped to foster feelings of risk as well as a resistance among some of the national-level stakeholders about participating in the pilot. . . as  . .

To push the semantics a bit, several critics have (e.g.) noted that scale and scope and requisite new systems and relationships involved in the AMFm disqualify it from being considered a ‘pilot,’ though i use that term for continuity with most other AMFm-related writing. . .

In my research, my focus is on the national and sub-national processes of deciding to participate in the initial pilot (‘phase I’) stage, focusing specifically on Ghana. Besides the project scale and resources mobilized, one thing that stood out about this project is that there was a reasonable amount of resistance to piloting this program among stakeholders in several of the invited countries. I have been very fortunate that my wonderful committee and outside supporters like Owen Barder have continued to push me over the years (and years) to try to explain this resistance to an ostensibly ‘good’ program. Moreover, I have been lucky and grateful that a set of key informants in Ghana that have been willing to converse openly with me over several years as I have tried to untangle the reasons behind the support and resistance and to try to get the story ‘right’. . .

The set-up of the global health pilot experiment, from the global perspective, the set-up was a paragon of planning for evidence-informed decision-making: pilot first, develop benchmarks for success and commission an independent evaluation (a well-monitored before and after comparison) — and make decisions later. . .

In my work, through a grounded qualitative analysis, I distil the variety of reasons for supporting and resisting Ghana’s participation in the AMFm pilot to three main types: those related to direct policy goals (in this case, increasing access to malaria medication and lowering malaria mortality), indirect policy goals (indirect insofar as they are not the explicit goals of the policy in question, such as employment and economic growth), and finally those related to risk and reputation (individual, organizational, and national). I take the latter as my main focus for the rest of this post. . . . .

A key question on which I have been pushed is the extent to which resistance to participation (which meant resisting an unprecedented volume of highly subsidized, high-quality anti-malarial treatments entering both the public and the private sector) emerges from the idea of the AMFm versus the idea of piloting the AMFm with uncertain follow-up plans. . ..

Some issues, such as threats to both direct and indirect policy goals often related to the AMFm mechanism itself, including the focus on malaria prevention rather than treatment as well as broader goals related to national pride and the support of local businesses. The idea of the AMFm itself, as well as it a harbinger of approaches (such as market-based approaches) to global health, provoked both support and resistance. . . .

But some sources of resistance stemmed more directly from the piloting process itself. By evidence-informed design, the global fund gave “no assurance to continue [AMFm] in the long-term,” so that the evaluation of the pilot would shape their decision. This presented limited risks to them. At the national level, this uncertainty proved troubling, as many local stakeholders felt it posed national, organizational, and personal risks for policy goals and reputations. Words like ‘vilification‘ and ‘chastisement‘ and ‘bitter‘ came up during key informant interviews. in a point of opposing objectives (if not a full catch-22, a phrase stricken from my thesis), some stakeholders may have supported the pilot if they knew the program would not be terminated (even if modified), whereas global actors wanted the pilot to see if the evidence suggested the program should (not) be terminated. Pilot-specific concerns related to uncertainties around the sunk investments of time in setting up the needed systems and relationships, which have an uncertain life expectancy. also, for a stakeholder trying to decide whether to support or resist a pilot, it doesn’t help when the reputation and other pay-offs from supporting are uncertain and may only materialize should the pilot prove successful and be carried to the next stage. . . .

A final but absolutely key set of concerns for anyone considering working with policy champions is what, precisely, the decision to continue would hinge upon. Would failure to meet benchmarks be taken as a failure of the mechanism and concept? A failure of national implementation capacity and managerial efforts in Ghana (in the face of a key donor)? A failure of individual efforts and initiatives in Ghana? .

Without clarity on these questions about how accountability and blame would be distributed, national stakeholders were understandably nervous and sometimes resistant (passively of actively) to Ghana’s applying to be a phase I pilot country. To paraphrase one key informant’s articulation of a common view, phase I of the AMFm should have been an experiment on how to continue, not whether to continue, the initiative. . . .

How does this fit in with our ideas of ideal evidence-informed decision-making about programs and policies? The experience recorded here raises some important questions when we talk about wanting policy champions and wanting to generate rigorous evidence about those policies. Assuming that the policies and programs under study adhere to one of the definitions of equipoise, the results from a rigorous evaluation could go either way.

What risks does the local champion(s) of a policy face in visibly supporting a policy?

Is clear accountability established for evaluation outcomes?

Are there built-in buffers for the personal and political reputation of champions and supporters in the evaluation design?

The more we talk about early stakeholder buy-in to evaluation and the desire for research uptake on the basis of evaluation results, the more we need to think about the political economy of pilots and those those stepping up to support policies and the (impact) evaluation of them. Do they exist in a learning environment where glitches and null results are considered part of the process? Can evaluations help to elucidate design and implementation failures in a way that has clear lines of accountability among the ‘ideas’ people, the champions, the managers, and the implementer’s? These questions need to be taken seriously if we expect government officials to engage in pilot research to help decide the best way to move a program or policy forward (including not moving it forward at all).

Center and Peripherary in Doing Development Differently

I have spent almost three weeks back in TX, which was supposed to be, in part, a time of immense productivity in front of our fireplace (yes, it is chilly here. Probably not enough to warrant a fire but still. I am sitting in front of the fireplace and paying for carbon credits to mitigate the guilt.) I brought home big batches of reading but am taking back far more of it with me to Delhi than I had planned.

Nevertheless, I did finally make it through Duncan Green’s post on his immediate thoughts on Doing Development Differently from Matt Andrews and team. So, that’s only three months behind schedule.

Many things are, of course, striking and exciting about this movement, including the idea of rapid iterations to promote (experiential) learning and tweaks, the importance of morale and relationships, and the time horizon.

But the most striking thing had to do with immersion, deep study and deep play*.

deep study of the system, based on continuous observation and listening. In Nicaragua, UNICEF sent public officials out to try and access the public services they were administering, and even made the men carry 30lb backpacks to experience what it’s like being pregnant! This is all about immersion, rather than the traditional ‘fly in, fly out’ consultant culture.

The idea is, it seems, to strike a blow at the ‘consultant culture’ of folks from D.C., London and Geneva parachuting in to solve problems (there’s probably an interesting discussion to be had about the relevance of area studies in this approach). But that is for another time. What is most immediately striking is that Duncan doesn’t report on UNICEF folks making consultants visiting Nicaragua from NYC head out to remote areas and try to access services with pregnant-backpacks.

If I read the anecdote correctly (is there more written about this somewhere?), the target was public officials, which I take to mean Nicaraguan civil servants and politicians based in the capital or another metropolis. Which is an important (re-)lesson. Being from X country doesn’t automatically make you knowledgeable about all areas and details of X country (duh). Probably many of us have sat with civil servants who talk about ‘the hinterlands’ and ‘backwards’ areas and who seem quite surprised at what they find there, if they visit at all. There is a vast difference between the high-level and the street-level, between big decisions about adopting and championing a policy and the many small decisions involved in implementing that idea. Implementation is, as always, profoundly local. (This idea, incidentally, also applies to study design and the relationships between PIs, their research assistants and the field teams.)

This all suggests that, maybe, doing development differently (and probably doing evaluation differently) also has to do with shifting ideas about center and periphery (globally as well as nationally), about who has relevant knowledge, and thinking about immersion for program designers and decision-makers of a variety of types, whether from the country in question or not. This, in part, raises questions about who is doing the iteration and learning and how lessons are passed up as well as down different hierarchies (and spread horizontally). looking forward to hearing and thinking more.

*It’s hard to resist a Geertz reference, since ‘continual observation and listening’ sounds an awful lot like ‘participant-observation,’ a study technique that almost *never* comes up in “mixed-methods’ evaluation proposals.

theories of change, stakeholders, imagined beneficiaries, & stealing from product design. that is, meet ‘mary.’

this post is also available, lightly edited, here.

i have been thinking a lot about ‘theories of change’ this week (just did some presenting on them here!). actually, i have been thinking more about ‘conceptual models,’ which was the term by which i was first introduced to the general idea (via vic strecher in conceptual models 101) and the term i still prefer because it implies more uncertainty and greater scope for tinkering than does ‘theory.’ (i accept that ‘theory of change‘ has been branded and that i have to live with it but i don’t have to like it. when people start calling them “tocks,” it’ll be a really, really bad day. i can deal with the acronym “ToCs” but please, world, don’t pronounce it “tocks” or switch to writing “tox” or something else dreadful.)

regardless of the term, the approach of thinking seriously about how behavioral, social and economic change will happen is really important — and often overlooked during the planning stages of both projects/programs/policies and evaluations. (too often, the intricacies of how change actually happened (or didn’t) are left to academic speculation in the discussion section of an evaluation paper — a certainly not informed by talking systematically to those people who were intended to benefit from the program).

i think there is growing recognition that building a theory of change is something that should happen, at least in part, backwards (among other places where this is discussed is in ‘evidence-based policy‘ with the idea of a ‘pre-mortem‘ and ‘thinking step-by-step and thinking backwards‘). that is, you start with the end goal (usually some variant of ‘peace,’ ‘satisfaction,’ ‘wellbeing,’ ‘capabilities,’* etc) in mind and work backwards as to how you are going to get there. actually, it’s a bit more like the transcontinental railroad, where you start from both ends (where you are and where you want to get) and build backwards and forwards until the ideas meet in the middle and you have a sense of what needs to be done and what assumptions underlie one step translating to the next.

in teaching us about not only conceptual models but grant writing, vic used the analogy of an island. the island was where you wanted to get — the state of the world as things would be once your intervention was rolled-out, fully operational and lasting change affected. it wasn’t enough to just say that people would have more money or would be healthier. you had to describe how the state of the world would look, feel, and operate. how would someone’s day look in the new state of the world? what would be different about the way they undertook their daily activities, or indeed what their daily activities would be? then, once you had the new state of the world/island in mind, you could make sense of where you were currently (through one of those ex anteneeds assessment‘ things i so rarely hear about in planning development projects or building theories of change) and what needed to be done to build a bridge from where you are to the island.

some of this work in understanding where people are and where ‘they,’ and therefore, ‘we’ want to get is meant to be generated through the nebulous terms “stakeholder engagement” and “formative work.” i think we discuss much less how formative engagement and stakeholder work (probably not a great sign of specificity that all the words can be mixed up so easily) actually translates into a robust theory of change. in this regard, i have learnt quite a bit from product and engineering books like the inmates are running the asylum. these are books about product and service design and the ‘user experience’ — far-out concepts we probably (almost certainly) don’t spend enough time thinking about in ‘development’ and something that would probably really benefit our theories of change in detailed and ‘best-fitting’ a particular situation… not to mention, you know, benefit the beneficiaries.

one of the tools i like best is what is, effectively, imaginary prospective users — in cooper‘s terminology, ‘personas.’ here’s the idea, as i see it translating to development and theories of change. we know stakeholders are important but they cannot (realistically or effectively) all be in the same room, at the same table, at the same time. nor can they all be called up each time we make a small tweak in program design or the underlying assumptions. and, it is likely the intended beneficiaries that are hardest to call up and the most likely not to be at the table. but we can use personas to bring them to the table, so that what happened in ‘the field’ most certainly does not stay there.

let’s say that for a given project and evaluation, widowed women are a key sub-group of interest.

forget widowed women.

start thinking about “mary.”

mary is a widowed woman.

her husband had been a carpenter and died of c cause. she lives in x place while her n children live in z other places and provide her with s amount of support. mary can be a composite of widowed women you did meet in the field during deep, household level needs assessment and formative in-depth interviews with intended beneficiaries. that’s how you might have a picture of mary and know that she lives in h type of house, with e regular access to electricity and have g goats and l other livestock. it’s how you know she’s illiterate and has a mobile phone onto which she never adds credit. it’s how you know what time she wakes up, what her morning chores are, who she talks to, when and whether she has time to go to the market, how she gets her information, what aspects of her environment will enable change and which will hinder it, and so on.

so, all potential beneficiaries can’t be at the table but personas of key subgroups and heterogeneities of interest can be. if everyone in the room for the design (intervention and evaluation) process is introduced to the personas, then they can speak up for mary. she still gets a voice and the ability to ask, ‘what’s in all this for me?’ will she be able to deal with an extra goat if she gets one as part of a livestock program? does she have the means of transport to collect cash as part of a transfer program? is her neighborhood safe for walking so she can follow up on the health information you provide? is mary going to give a hoot about the sanitation information you provide her?

mary’s obstacles need to be dealt with in your program design and the places where mary might have trouble engaging with the program need to be put into your theory of change and monitored as part of your M&E (& e) plan. will mary help you think everything? no, of course not — she’s good but she’s not that good. but it’ll probably be nearer to something that can actually work (and don’t forget that street-level workers, other implementers and high-level stakeholders should have personas too!).

please invite mary to the table when you’re designing your intervention and constructing your theory of change. it doesn’t replace the need for actual monitoring and actually asking for beneficiary, implementer and stakeholder feedback.

but have mary describe to you how her life will be different (better!) with your program in place, how the actual structure of her day and decision-making have changed now that she’s on the aforementioned goal island. you’ll be a little closer to making it so.

this post is massively indebted to danielle giuseffi, who introduced me to some of the books above and with whom i have discussed building models more than anyone else! still one of my favorite business-partners-in-waiting, d-funk, and i still like our behavioral bridge.

*yes, i know that ‘capabilities’ were initially from amartya sen and that i should have linked to this. but for planning approaches, i find the 10 laid out by nussbaum more accessible.