something to ponder: cataloging the evaluations undertaken in a country

working my way through “demand for and supply of evaluations in selected sub-saharan african countries,” which is a good read. there are several great points to note and consider but just one that i want to highlight here:

in no country was there a comprehensive library of evaluations that had been undertaken [there]

this seems like something that should be change, as an important public good and source of information for national planning departments. i wonder if ethics review / institutional review bodies that work to register and approve studies may be able to take on some of this function.

gem from the anti-politics machine: they only seek the kind of advice they can take

i am starting to re-read the anti-politics machine after some time… and, of course, started with the epilogue — the closest ferguson comes to giving advice from his vivisection. here’s a gem that remains relevant ten-plus years later, in spite of major political changes in southern africa:

certainly, national and international ‘development’ agencies do constitute a large and ready market for advice and prescriptions, and it is the promise of real ‘input’ that makes the ‘development’ form of engagement such a tempting one for many intellectuals. these agencies seem hungry for good advice, and ready to act on it. why not give it?

.

but as i have tried to show, they only seek the kind of advice they can take. one ‘developer’ asked my advice on what his country could do to ‘help these people.’ when i suggested that his government might contemplate sanctions against apartheid, he replied, with predictable irritation, ‘no, no! i mean development!

.

the only ‘advice’ that is in question here is advice about how to ‘do development’ better. there is a ready ear for criticisms of ‘bad development projects,’ so long as these are followed up with calls for ‘good development projects.’

what does it mean to do policy relevant evaluation?

for several months, i have intended to write a post about what it actually means to do research that is ‘policy relevant,’ as it seems to be a term that researchers can self-ascribe* to their work without stating clearly what this entails or if it is an ex ante goal that can be pursued. i committed to writing about it here, alluded to writing about it here, and nearly stood up to the chicken of bristol in the interim. now, here goes a first pass. to frame this discussion, i should point out that i exist squarely in the applied space of impact evaluation (work) and political economy and stakeholder analysis (dissertation), so my comments may only apply in those spheres.

.

the main thrust of the discussion is this: we (researchers, donors, folks generally bought-into the evidence-informed decision-making enterprise) should parse what passes for ‘policy relevant’ into  ‘policy adjacent’ (or ‘policy examining?’) and ‘decision relevant’ (or ‘policymaker-relevant’) so that it is clear what we are all trying to say and do. just because research is conducted on policy does not automatically make it ‘policy relevant’ — or, more specifically, decision-relevant. it is, indeed, ‘policy adjacent,’ by walking and working alongside a real, live policy to do empirical work and answer interesting questions about whether and why that policy brought about the intended results. but this does not necessarily make it relevant to policymakers and stakeholders trying to make prioritization, programmatic, or policy decisions. in fact, by this point, it may be politically and operationally hard to make major changes to the program or policy, regardless of the evaluation outcome.

.

this is where more clarity (and perhaps humility) is needed.

.

i think this distinction was, in part, what tom pepinsky wrestled with when he said that it was the murky and quirky (delightful!) questions “that actually influence how they [policymakers / stakeholders] make decisions” in each of their own murky and quirky settings. these questions may be narrow, operational, and linked to a middle-range or program theory (of change) when compared to grander, paradigmatic questions and big ideas. (interestingly, and to be thought through carefully, this seems to be the opposite of marc bellemare’s advice on making research in agricultural economics more policy-relevant, in which he suggests pursuing bigger questions, partially linked to ag econs often being housed in ‘hard’ or ‘life’ science departments and thus dealing with different standards and expectations.)

.

i am less familiar with how tom discusses what is labelled as highly policy-relevant (the TRIP policymaker survey and seeing whether policymakers are aware of a given big-thinking researcher’s big idea) and much more familiar with researchers simply getting to declare that their work is relevant to policy because it is in some way adjacent to a real! live! policy. jeff hammer has pointed out that even though researchers in some form of applied work on development are increasingly doing work on ‘real’ policies and programs, they are not necessarily in a better position to help high-level policymakers choose the best way forward. this needs to be taken seriously, though it is not surprising that a chief minister is asking over-arching allocative questions (invest in transport or infrastructure?) whereas researchers may work with lower-level bureaucrats and NGO managers or even street-level/front-line workers, who have more modest goals of improving workings and (cost-)effectiveness of an existing program or trying something new.

.

what is decision-relevant in a particular case will depend very much on the position of the stakeholder with whom the researcher-evaluator is designing the research questions and evaluation (an early engagement and co-creation of the research questions and plan for how the evidence will be used that i consider a pre-req to doing decision-relevant work — see, e.g., the beginning of suvojit‘s and my discussion of actually planning to use evidence to make decisions). intention matters in being decision-relevant, to my way of thinking, and so, therefore, does deciding whose decision you are trying to inform.

.

i should briefly say that i think plenty of policy-adjacent work is immensely valuable and useful in informing thinking and future planning and approaches. one of my favorite works, for example, the anti-politics machine, offers careful vivisection (as ferguson calls it) of a program without actually guiding officials deciding what to do next. learning what is and isn’t working (and why) is critically important. his book is a profound, policy-adjacent work (by being about a real program) but it did not set out to be directly decision-relevant nor is it. the book still adds tremendous value in thinking about how we should approach and think about development but it is unlikely that a given bureaucrat can use it to make a programmatic decision.

.

but here is where i get stuck and muddled, which one of the reasons i put off writing this for so long. at some stage of my thinking, i felt that being decision-relevant, like being policy-adjacent, required working on real, live policies and programs. in fact, in a july 2014 attempt at writing this post, i was quite sympathetic to howard white’s argument in a seminar that a good way to avoid doing ‘silly IE’ (sillIE©?) is to evaluate real programs and policies, even though being about a real program is not an automatic buffer against being silly.

.

but i increasingly wonder if i am wrong about decision-relevance. instead, the main criterion is working with a decision-maker to sort out what decision needs to be made. one outcome of such a decision is that a particular way forward is definitely not worth pursuing, meaning that there is a serious and insurmountable design failure (~in-efficacy) versus an implementation failure (~in-effectiveness). a clear-cut design failure firmly closes a door on a way forward, which is important in decision-making processes (if stakeholders are willing to have a closed door be a possible result of an evaluation). for example, one might (artificially) test a program or policy idea in a crucial or sinatra case setting — that is, if the idea can’t make it there, it can’t make it anywhere (gerring, attributed to yates). door closed, decision option removed. one might also want to deliver an intervention in what h.l. mencken called a ‘horse-doctor’s dose‘ (as noted here). again, if that whoppingly strong version of the program or policy doesn’t do it, it certainly won’t do it at the more likely level of administration. a similar view is expressed in running randomized evaluations, noting the ‘proof-of-concept evaluations’ can show that even “a gold-plated, best-case-scenario version of the program is not effective.” door closed, decision option removed.

.

even more mind-bending, ludwig, kling, and mullainathan suggest laying out how researchers may approximate the ‘look’ of a policy to test the underlying mechanism (rather than the entirety of the policy’s causal chain and potential for implementation snafus) and, again, directly informing a prioritization, programmatic, or policy decision. as they note, “in a world of limited resources, mechanism experiments concentrate resources on estimating the parameters that are most decision relevant,” serving as a ‘first screen’ as to whether a policy is even worth trying. again, this offers an opportunity to close a door and remove a decision option. it is hard to argue that this is not decision-relevant and would not inform policy, even if the experimental evaluation is not a real policy, carried out by the people who would take the policy to scale, and so on. done well, the suggestion is (controversially) that a mechanism experiment that shows that even under ideal or even hyper-ideal conditions (and taking appropriate time trajectory into account) a policy mechanism does not bring about the desired change could be dismissed on the basis of a single study.

.

but, the key criterion of early involvement of stakeholders and clarifying the question that needs to be answered remains central to this approach to decision-relevance. and, again, having an identified set of stakeholders intended to be the immediate users of evidence seems to be important to being decision-relevant. and, finally, the role of middle-range or programmtic theory (of change) and clearly identified mechanisms of how a program/policy is meant to lead to an outcome is critical in being decision-relevant. .

.to return to

to return to the opening premise, it does not seem helpful to label all evaluation research associated with a real-world policy or program as ‘policy relevant.’ it is often seen as desirable to be policy relevant in the current state of (impact) evaluation work but this doesn’t mean that all policy-adjacent research projects should self-label as being policy relevant. this is easy to do when it is not entirely clear what ‘policy relevance’ means and it spreads the term too thin. to gain clarity, it helps to parse studies that are policy adjacent from those that are decision-relevant. being relevant to decisions or policymakers demands not just stakeholder engagement (another loose term) but stakeholder identification of the questions they need answered in order to make a prioritization, programmatic, or policy decision.

.

there must, therefore, be clear and tangible decision-makers who intend to make use of the generated evidence to work towards a pre-stated decision goal — including a decision to shut the door on a particular policy/program option. while being policy-adjacent requires working alongside a real-world policy, being decision-relevant may not have to meet this requirement, though it does need to ex ante intend to inform a specific policy/program decision and to engage appropriately with stakeholders to this end.

.

this is far from a complete set of thoughts — i have more reading to do on mechanisms and more thinking to do about when murky and quirky decisions can be reasonably made for a single setting based on a single study in that murky and quirky setting. nevertheless, the argument that there should be some clear standards for when the term ‘policy relevant’ can be applied and what it means holds.

.

*in the same somewhat horrifying way that a person might self-ascribe connoisseur status or a bar might self-label as being a dive. no no no, vomit.

some nuggets from the amber: keshavjee’s ‘blind spot’

just finished salmaan‘s blind spot: how neoliberalism infiltrated global health. below, a few tidbits of his weaving of the local and personal with the global, ideological, and ideal. everything below represents a direct quote and the page numbers are marked in parentheses.

this book is intended to add to the conversation about how to more effectively bring the fruits of technology and innovation to those for whom it is a matter of life and death; how social, political, and economic forces have shaped practices in global health; and how ideological blind spots are traps along the path of achieving some of our most humane and important societal goals (xxxii).

.

in truth, i was confused. i was in my midtwenties and had been convinced by seasoned development experts that this approach [a revolving drug fund in badakhshan, tajikistan] was an integral part of the transition from “humanitarian assistance” to “long term development.” of course, as a graduate student somewhat versed in development theory, i knew that something was not right about planning a strategy with people from france and switzerland, whose medicines were paid for by health insurance, to convince the canadians, whose medicines were paid for by health insurance, that the badakhshanis, whose entire life-world had collapsed, who had no access to pooled-risk insurance schemes, and who were now living in poverty, should finance medicines from their destitute communities; that they should “participate in their own health” by joining committees that defined what doctors could prescribe for them; and that their system of curative care – with hospitals, clinics, and experts trained over 70 years of soviet rule – should move towards a more “preventative” medical system in the midst of epidemic disease and hunger. but i participated in moving the strategy forward because i saw it as a means of protecting the most vulnerable (82).

.

the rise of NGOs as recipients of aid dollars closely followed the ascendancy of neoliberal thinking in washington and london. in their studies of the ngo sector, david hulme and michael edwards described the 1980s and 1990s as a period dominated by a “new policy agenda,” which placed a premium on the untested belief that the private sector was a better mechanism for delivering services and that ngos are more efficient than the government, especially in providing services to poor people… ngos became a powerful global political force explicitly identified as a meeting place for economic and political aid (106).

.

as ahmed and i were to see, these assumptions did not stop with the revolving drug fund. Instead, it appeared to us that the idea that privatization was the only logical path forward – an assumption made without critical assessment of the potential consequences – was manifesting itself in other ways. it was as if the aspirations of hayek and his colleagues at the mont pèlerin society were being realized in real time: ideology was operating as common sense. as to other important outcomes – reduced mortality or morbidity, or ensuring justice and dignity – they fell into what i refer to as realms of neoliberal programmatic blindness: areas of programs that are eclipsed by ideological aims (114).

.

i have come to the conclusion that the way out of the amber is both complex and simple. at its most complex, it will require the recalibration of our goals as a society and the type of world we wish to create… at its most simple, this will require recalibrating our focus in global health and development from “sustainability” and “local ownership” to an approach that puts equity and patient outcomes first… a recalibration of focus towards equity of access and equity of outcome will allow ngos to again take up the mantle as a vanguard of a moral order that finds poverty and structural violence unacceptable (142).

“politically robust” experimental design in democracies and a plea for more experience sharing

sometimes i re-read a paper and remember how nice a sentence or paragraph was (especially when thinking that a benevolent or benign dictator might make research so much easier, as though easy was the main goal of research).

so it is with the paper by gary king and colleagues (2007) on “a ‘politically robust’ experimental design for public policy evaluation, with application to the mexican universal health insurance program”

scholars need to remember that responsive political behavior by political elites is an integral and essential feature of democratic political systems and should not be treated with disdain or as an inconvenience. instead, the reality of democratic politics needs to be built into evaluation designs from the start — or else researchers risk their plans being doomed to an unpleasant demise. thus, although not always fully recognized, all public policy evaluations are projects in both political science and political science.

what would be nice is if researchers would share more of their experiences and lessons learned not just in robust research design (though this is critical) but also in working to (and failing to) persuade local political leaders to go along with randomization schemes and to potentially hold off any kind of scale-up until the results are in… and only if they are promising!

thinking about stakeholder risk and accountability in pilot experiments

this post is also cross-posted here in slightly modified form.

since i keep circling around issues related to my dissertation in this blog, i decided it was time to start writing about some of that work. as anyone who has stood or sat near to me for more than 5 minutes over the past 4.25 years will know, in my thesis i examine the political-economy of adopting and implementing a large global health program (the affordable medicines facility – malaria or “AMFm”). this program was designed at the global level  (meaning largely in dc and geneva with tweaking workshops in assorted african capitals). global actors invited select sub-saharan african countries to apply to pilot the AMFm for two years before any decision was made to continue, modify, scale-up, or terminate. it should also be noted from the outset that it was not fully clear what role the evidence would play in the board’s decision and how the evidence would be interpreted. as i highlight below, this lack of clarity helped to foster feelings of risk as well as a resistance among some of the national-level stakeholders about participating in the pilot. . . as  . .

.

to push the semantics a bit, several critics have (e.g.) noted that scale and scope and requisite new systems and relationships involved in the AMFm disqualify it from being considered a ‘pilot,’ though i use that term for continuity with most other AMFm-related writing. . .

.

in my research, my focus is on the national and sub-national processes of deciding to participate in the initial pilot (‘phase I’) stage, focusing specifically on ghana. besides the project scale and resources mobilized, one thing that stood out about this project is that there was a reasonable amount of resistance to piloting this program among stakeholders in several of the invited countries. i have been very fortunate that my wonderful committee and outside supporters like owen barder have continued to push me over the years (and years) to try to explain this resistance to an ostensibly ‘good’ program. moreover, moreover, i have been been lucky and grateful that a set of key informants in ghana that have been willing to converse openly with me over several years as i have tried to untangle the reasons behind the support and resistance and to try to get the story ‘right’. . .

.

the set-up of the global health pilot experiment, from the global perspective, the set-up was a paragon of planning for evidence-informed decision-making: pilot first, develop benchmarks for success and commission an independent evaluation (a well-monitored before and after comparison) — and make decisions later. . .

.

in my work, through a grounded qualitative analysis, i distil the variety of reasons for supporting and resisting ghana’s participation in the AMFm pilot to three main types: those related to direct policy goals (in this case, increasing access to malaria medication and lowering malaria mortality), indirect policy goals (indirect insofar as they are not the explicit goals of the policy in question, such as employment and economic growth), and finally those related to risk and reputation (individual, organizational, and national). i take the latter as my main focus for the rest of this post. . . . .

.

a key question on which i have been pushed is the extent to which resistance to participation (which meant resisting an unprecedented volume of highly subsidized, high-quality anti-malarial treatments entering both the public and the private sector) emerges from the idea of the AMFm versus the idea of piloting the AMFm with uncertain follow-up plans. . ..

some issues, such as threats to both direct and indirect policy goals often related to the AMFm mechanism itself, including the focus on malaria prevention rather than treatment as well as broader goals related to national pride and the support of local businesses. the idea of the AMFm itself, as well as it a harbinger of approaches (such as market-based approaches) to global health, provoked both support and resistance. . . .

.

but some sources of resistance stemmed more directly from the piloting process itself. by evidence-informed design, the global fund gave “no assurance to continue [AMFm] in the long-term,” so that the evaluation of the pilot would shape their decision. this presented limited risks to them.  at the national level, this uncertainty proved troubling, as many local stakeholders felt it posed national, organizational, and personal risks for policy goals and reputations. words like ‘vilification‘ and ‘chastisement‘ and ‘bitter‘ came up during key informant interviews. in a point of opposing objectives (if not a full catch-22, a phrase stricken from my thesis), some stakeholders may have supported the pilot if they knew the program would not be terminated (even if modified), whereas global actors wanted the pilot to see if the evidence suggested the program should (not) be terminated. . . . . pilot-specific concerns related to uncertainties around the sunk investments of time in setting up the needed systems and relationships, which have an uncertain life expectancy. also, for a stakeholder trying to decide whether to support or resist a pilot, it doesn’t help when the reputational and other pay-offs from supporting are uncertain and may only materialize should the pilot prove successful and be carried to the next stage. . . .

.

a final but absolutely key set of concerns for anyone considering working with policy champions is what, precisely, the decision to continue would hinge upon.would failure to meet benchmarks be taken as a failure of the mechanism and concept? a failure of national implementation capacity and managerial efforts in ghana (in the face of a key donor)? a failure of individual efforts and initiatives in ghana? .

.

without clarity on these questions about how accountability and blame would be distributed, national stakeholders were understandably nervous and sometimes resistant (passively of actively) to ghana’s applying to be a phase I pilot country. to paraphrase one key informant’s articulation of a common view, phase I of the AMFm should have been an experiment on how to continue, not whether to continue, the initiative. . . .

.

how does this fit in with our ideas of ideal evidence-informed decision-making about programs and policies? . the experience recorded here raises some important questions when we talk about wanting policy champions and wanting to generate rigorous evidence about those policies. assuming that the policies and programs under study adhere to one of the definitions of equipoise, the results from a rigorous evaluation could go either way.

what risks does the local champion(s) of a policy face in visibly supporting a policy?

is clear accountability established for evaluation outcomes?

are there built-in buffers for the personal and political reputation of champions and supporters in the evaluation design?

the more we talk about early stakeholder buy-in to evaluation and the desire for research uptake on the basis of evaluation results, the more we need to think about the political economy of pilots and those those stepping up to support policies and the (impact) evaluation of them. do they exist in a learning environment where glitches and null results are considered part of the process? can evaluations help to elucidate design and implementation failures in a way that has clear lines of accountability among the ‘ideas’ people, the champions, the managers, and the implementers? these questions need to be taken seriously if we expect government officials to engage in pilot research to help decide the best way to move a program or policy forward (including not moving it forward at all).

my day today: dry cleaning

[on phone, while in auto on the way to work]

me: “hi dry cleaner. i had to leave home for office, so i left one dress hanging on my door. can you please pick it up?”

dry cleaner: “sure, i will send a boy to pick it up.”

.

- 10 minutes later -

.

dry cleaner: “the boy is standing outside your door and says no one is answering.”

me: “yes, i had  to leave for office, so i left one dress hanging on my door. can he please take it?”

dry cleaner: “yes madam.”

.

- 3 minutes later -

.

dry cleaner: “the boy says there is only one dress.”

me: “yes, there is one dress hanging on the door. can he please take it to clean?”

dry cleaner: “yes madam.”

.

service and perseverance and convenience, yes.

straightforward, no.