i had the pleasure of giving two talks through 3ie seminar series over the past few weeks, in london (chaired by the wonderful daniel philips) and in delhi (chaired by the great stuti tripathi and ably discussed by the always-exciting colin bangay). i was also able to present at the london school of hygiene and tropical medicine — big thanks to catherine goodman and clare chandler for arranging and for great conversation. many thanks to everyone who participated (and everyone has listened about my thesis along the way). [slides for the interested.]
.
the presentations were on aspects of my thesis research, which centers on the experience of ‘doing’ the affordable medicines facility-malaria (amfm) in ghana. as often happens, the global narrative around the amfm and the decision to pilot-and-evaluate-and-decide was intriguing and became an important point of discussion. indeed, the pull of the story itself was a major drag on my getting on with aspects of my thesis (with apologies to my committee). what i present below is less about my actual work (relating more to national stakeholders opting in to the pilot and the implementation of it) and focuses on the global narrative, which is abbreviated and stylized in my telling.
.
i set aside the precise definition of equipoise that is most relevant in social science and development evaluations –- whether we should be talking about clinical equipoise (or efficacy equipoise?), some flavor of policy equipoise (or (relative) cost-efficiency uncertainty), or even operational equipoise (uncertainty around whether this thing can actually be implemented in this context, let alone produce the intended results).
.
rather, i begin with the assumption that meaningful uncertainty is a good starting point for commissioning an evaluation and also possibly a key part of the ethical justification for particular approaches to evaluation, such as random assignment. the former is of more interest to me here.
.
an important question – one that the amfm raises pointedly — following from this interest is what obligation, if any, follows from the establishment of equipoise (that a community of thinkers and/or implementers have meaningful uncertainty about a proposed program on a theoretical or practical level).
.
let’s look at the amfm. the goal was to make use of existing global and national public and private sector supply chains (from pharmaceutical manufacturers to small pharmaceutical sellers) to dramatically increase access to high-quality antimalarial treatment and, in turn, improve the (appropriate) use of such treatment and reduce the malaria burden.
.
this generated a situation of ‘active equipoise’ (read: sometimes heated controversy). some was more ideological: should the private sector be used to deliver public health, for example. i set this aside here. some were practical: if we use this specific mechanism to deliver a health commodity, will the subsidies (‘co-payments’) involved be captured along the supply chain or passed on to the end-user? will people not only obtain the high-quality, recommended anti-malarial treatments once they are made more accessible but use them (appropriately) and finally reduce the malaria burden?
.
given this degree of uncertainty about putting a theoretically ‘elegant’ (an oft-used epithet for the amfm) mechanism into practice, a decision was taken to pilot (at national-scale, in 7 countries, for 1.5 years, so the application of the term ‘pilot’ is debatable) and to commission an independent evaluation that would inform the decision to continue, modify, scale, or terminate the initiative. specifically, the global fund agreed to host the initiative for this pilot period and the evaluation was intended to inform the fate of the amfm, at least in the global fund’s portfolio. i am not going to wade into the confusion about the decision was ultimately made (also here) because i want to focus earlier in time than that, about the design of the evaluation itself given its intended decision-informing function.
.
note that there were three key points of (less-ideological) debate at the global level that prompted the pilot-evaluate-decide approach, that can plotted down a theory of change and also down a supply chain:
- implementation feasibility and the possibility of supply-chain capture (and drug adulteration)
- the translation of access into (appropriate) use
- the translation of use into reduced malaria burden
.
before going on, please note that i am not arguing that all evaluation or research needs to lead to a decision or even have this as a goal. rather, i am asking, once it is determined that we will commission an evaluation to inform our decision -– a pinnacle of evidence-informed decision-making — what are our (researchers, evaluators, funders, decision-makers) obligations (ethical or otherwise)?
.
for a variety of reasons, the global fund decided that they wanted the pilot to 1.5 years and, following from this decision, set four benchmarks deemed achievable (through modelling work) within that timeframe that would define success. these related to gains in availability, price, market share (gained against less effective but cheaper and more familiar anti-malarial treatments), and household use. even though the link between use and malaria burden was a key point of uncertainty, this was determined to be beyond the scope of the evaluation from the outset (which people might agree or disagree with). at some point in the process, household surveys were also dropped from the evaluation plan, also cutting off the potential to make rigorous (or, really, any) statements about whether access translated into use.
.
a result of this, it seems, is that many global stakeholders have been able to use the results of the independent evaluation (which suggest that at least in 5 of the 7 pilot countries, moderate to high success in access was achieved) to support whatever position they had initially. (the story at the national level seems a bit different: whether because of experiential learning or the evaluation results or path dependency or other factors, many national-level stakeholders seem to have wound up more supportive of the initiative than they were initially –- something which warrants further investigation.)
.
a key question is how we should feel about the issue of the evaluation – again, explicitly intended to inform a decision – not being set up to address the key points of controversy. disappointed? angry? ethically outraged (note, to the extent that money and not just principle matters, that this evaluation had a $10 million pricetag and that the overall piloting process rang in around $460 million)? this issue of appropriateness and outrage was a key point of discussion, particularly in the delhi seminar.
.
i certainly don’t have an answer but the question merits further debate. if an evaluation is commissioned to address specific points of controversy (uncertainty, equipoise) and explicitly to inform a decision, what are the obligations and responsibilities (whether practical or moral):
- of the evaluation design to address the controversy (in a way meaningful for those identified as key stakeholders or decision-makers)?
- to use the evidence generated to make the decision? (and to put in place processes to help make this so)
.
for those of us that push for evidence to play a role in decision-making, these seem important questions to debate. i hope we start to.
Hi Heather – great blog!
I’m a bit more cynical than you, and think that we should not be talking about obligations and responsibilities, but rather incentives and fear. You say “global stakeholders have been able to use the results of the independent evaluation… to support whatever position they had initially”. I think that statement could be true of almost all independent evaluations of large-scale global programs. I’m struggling to think of a good example of the opposite situation (and would love it if commentators have a good example). I think we need to look closely at incentives for the evaluator (in how they write final evaluation reports and executive summaries).
Evaluators have limited incentives for address controversy, due to (i) lack of clarity on how the evaluation will be used, and (ii) professional courtesy. On the first point, I think we don’t know what the end result of an evaluation will be. Taking the AMFm evaluation as an example, if it was found to be very critical of the feasibility of AMFm, we don’t know whether funds will be repurposed into another malaria venture, into another health program, or into something else entirely. Once a program has started to be rolled-out at a national scale*, no-one wants to be responsible for ending something that is providing anti-malarial treatments to hundreds of thousands of people. On the second point, we all know that development is a small world. I think it’s a very normal human behaviour to stay polite when talking about programs that are implemented by your friends, colleagues, or people within your professional network. Victoria Fan talks about this in terms of “personal biases” (http://www.cgdev.org/blog/global-health-mystery-what’s-behind-us-government-position-amfm).This isn’t to excuse personal biases, but just to understand where they come from.
As a result of these two incentives, evaluators are incentivised to identify both positives and negatives from an evaluation. This false search for balance can lead to language such as “this program was good, but could be improved in the following ways”, rather than addressing controversies head-on.
On top of this, implementors obviously have incentives to present the results of an evaluation in a positive light.
Not sure what the answer is here. It’s easy to say that we should push for more small-scale pilots with a rigorous evaluation before launching national programs. The disincentives against a negative result are lower, and it means that we never start a national program from a position of equipoise. It’s much easier said than done, and we obviously struggle to do that, even in a UK / US context.
* It’s an aside, but it’s really hard to call a national-scale program a ‘pilot’
LikeLike
agree with much of what you’ve said and will try to elaborate more shortly. broadly, i think it may help to distinguish between commissioners of evidence and evaluators themselves in parsing incentives (and fear). in the talk (i’ve added a link to the slides), i make a point of reminding us that ‘stakeholders’ isn’t just a warm-fuzzy word but gets very much at ‘interests,’ whether facing gaining or losing something as a result of piloting and/or being part of an evaluation (in which ‘termination’ is a possible outcome). we should be mindful that stakeholders include commissioners of evidence, evaluators, etc.
LikeLike