p4p, habits, context, process

Mark Nichter and Prascilla Magrath (hereafter, M&N) have a nice new paper on pay-for-performance (P4P), stating that “understanding the processes by which P4P targets are reached demands a reorientation… towards an understanding of motivation as a component within a complex adaptive social system.” (Complex systems are so hot right now!)

the paper is worth reading in full. the authors also raise important points that i have wanted to address for some time and hopefully will do so in more depth in the future. these relate to process; motivation & context; and experiments.

1. a key question that is not always sufficiently asked or answered in health systems – or other – research is ‘how/why did X intervention not/work in Y setting?” this is an issue of both internal validity / credibility and external validity. internally, answering these questions means looking at the intervention from the perspective of multiple stakeholders to get a nuanced, full view of how or why something worked. externally, of course, the more the context and processes are understood, the easier it may be to predict when and where else the intervention may be successful.

answering these types of questions involve qualitative as well as quantitative work and looking at implementation process as well as impacts. hopefully i will have a write-up on studying implementation in the near future with @jonathan_payne [yes, Jon, I am putting it in writing as a commitment device]. the call for these types of work in the study of interventions – including but not limited to explicitly experimental approaches – include papers by Paluck, Mills, and Ssengooba, as well as the M&N paper discussed here.

i want to quickly make amends for the past times i have denigrated process indicators. it’s true that process indicators can become as meaningless as they are made into check boxes of routines (such as number of posters hung or meetings held). also, process indicators without impact indicators don’t get us as far as we need to go in saying whether something worked or should be done again. but, recording and evaluating process is necessary, if not sufficient, in saying how something worked and whether it could work again (and where).

2. M & N draw on Bordieu to provide a frame for studying the implementation and impacts of P4P – but the point is more widely applicable. M & N draw on the concept of ‘habits’ – attitudes, dispositions, actions – of individuals and social groups to help explain how interventions are received differently in different places. this has interesting parallels with Stein‘s work on ‘habits,’ for which he draws on the work of Velben; these parallels and lessons deserve more exploration than i give them here.

both roughly aim at the idea of shaping interventions to a given contextual reality, including political, economic, and administrative structures but also social processes and group, sub-group, and individual dispositions and ‘habits of thought.’

both also suggest the need to consider how an intervention’s positive & negative effects will ripple out in both time and space, as well as how positive effects can be sustained. Stein highlights that “relations among institutional constructs, habits and the transformation of behavior are at the core of development.” of course, M & N also point out that changing behavior is not everything. behavior relates to what actors ‘will do’ but what they ‘can do’ is may still be constrained by a lack of material, human, and time resources.

M & N also draw on Bourdieu’s multiple forms of capital to consider the ways that interventions may change dispositions and behavior; these include economic but also cultural, social, and symbolic capital. the importance of non-economic forms of capital in changing motivations and incentivizing behavior are increasingly recognized, including the many talks of Rory Sutherland and Ashraf’s recent paper on pro-social benefits. the way an intervention will change the distribution of all these types of capital should be considered in the design and the process captured over the course of implementation.

the ways that previous interventions have altered the distribution of these forms of capital is also an important consideration; tabula rasa non existunt.

3. finally, for quite some time (with a little e-input from Owen Barder – thanks!), i have been working up to saying all social interventions – explicitly experimental or not – are experiments. they change the context in ways that are not adequately considered. this includes the ethical ramifications of implementing an intervention and of stopping that  intervention. at the very least, the presence of an experiment/intervention changes changes expectations. while i’ve been tiptoeing up to this point for months, M & N go right ahead and say, “short-term interventions can have long-term impacts on expectations.” an experiment or intervention might be billed as discrete, pilot, or otherwise, but if it is taking place outside a controlled laboratory setting, it is bringing about some change, from anchors and reference prices to much larger shifts in thinking on ‘how things work.’  experiments have social and political ramifications beyond the intended effects, at levels above the individuals directly involved.

to this end, M&N and others promote a ‘cyclical formative reformative research approach’ as a way of moving forwards with (health) systems research and i strongly back this idea as a way of experimenting and promoting development more generally. this sort of long-term research agenda does not always fit with the present structure of grants & funding but hopefully the latter will begin to change.

wait, we’re experimenting, right?

many of the descriptions of the ideal next World Bank president – at least the ones with which I agree – have called for a little more humility about how much we actually know about economic & human development and poverty reduction.

so it’s frustrating to see articles like this, which imply a low level of humility about the work we are doing and an unclear commitment to learning what actually does and does not work (regardless of felt commitment to poverty reduction & development).

a large part of the reason that experiments and impact evaluations in development have become popular is that we weren’t getting as far as we needed with theory, intuition or observation alone. money and other resources were being put into programs when we don’t know if they are effective (even if things seemed to be changing in the presence of the program), let alone how they compared to other programs in terms of efficacy or cost-efficiency. process and implementation evaluations that could have improved subsequent program interventions were not being conducted and/or shared.

it seems like we need to pause and think about how and why we are experimenting.

  • we experiment because we don’t know what works – or whether something that works in one location will work in another. if we knew what worked, we would potentially be under some ethical obligation to do that thing for all people in all places we thought it would work. when we don’t know what works, or when there is at least genuine disagreement about the best approaches, an experimental design is justified. in short, we need to bring equipoise into social science research. in part, this means that we should be testing our new (experimental) idea against the best known or available intervention with a similar goal. new drugs are usually tested against a placebo and a regularly used treatment.
  • because we are experimenting, we should encourage the publication of null findings and laud these as equally important learning experiences. this requires funders recognizing such reporting as essential for reporting on the accountability of studies and program implementations.  it also requires changing the strong bias of journal editors and reviewers to only publish significant findings. confidence intervals aside, null findings may be just as “significant” for our understanding of what works and doesn’t work in development as reporting statistically significant results.
  • evaluations probably need to start to look more like programs that could be scaled-up. there are good experimental reasons for manipulating only one or two key variable(s) at  time and trying to limit all other contamination, but there has to be increasing movement toward learning what works in situ, even if that means there is more than one moving part. and if it is really unclear how the findings from an experiment would be scaled-up in a program or policy, then the experiment likely needs to be re-thought.
  • also, we need to think more about the ethics of doing social science experiments in low- and middle-income countries. there are increasing obligations for clinical research by large pharmaceutical or academic institutions, if the drug proves effective, to – at a minimum – make the drug available to – again, at a minimum – the host community. this is because the host community bore some risk in participating in an experimental intervention — but more generally because any intervention alters biological and social patterns that will remain changed after the trial ends and the researchers leave the community to publish their results in scientific journals.
  • experimenting is good in a context in which we aren’t sure what works. NGO- and state-run programs need to be linked with evaluation efforts. there are roughly a bajillion graduate students interested in program evaluation, development economics, and so on and there are a large number of programs that are being run by governments or NGOs without any rigorous evaluation or clear delineation of ‘lessons learned’ – or at least evaluations that get talked about in the technocratic space. none of these programs will offer a perfect experimental design but, hey, that’s where the complex statistics come in. all we need is a yenta to link grad students to programs (and evaluation funding) and we’re set.
  • experiments, programs, policies, etc, need to allow us to learn about the implementation process as well as the outcomes. deviations from initial design and unexpected hurdles along the way should be reported so that everyone can learn from them. yes, the reality of actually running these programs may make it more difficult to make causal inference with certainty – but these aren’t just aberrations in an experimental design, they’re part of the reality into which any scaled-up effort would be plugged. this is similar to the distinction between “efficacy” and “effectiveness” in clinical research. knowing how an intervention performs under ideal experimental conditions (efficacy) may not tell us how the same intervention program performs applied under real world circumstances or scaled up to other communities (effectiveness).
  • replication is central to the natural sciences, but still largely under-utilized the social sciences and development research. but we need to recognize the importance of replication in confirming or dis-confirming the results from program implementation studies and encourage greater publication of replication studies.

*see, for example, “Moral standards for research in developing countries: from ‘reasonable availability’ to ‘fair benefits’” or “What makes clinical research in developing countries ethical? The benchmarks of ethical research”

*other inspiration

**big thanks to Mike for helping me sort through a lot of the ideas in this post