many of the descriptions of the ideal next World Bank president – at least the ones with which I agree – have called for a little more humility about how much we actually know about economic & human development and poverty reduction.
so it’s frustrating to see articles like this, which imply a low level of humility about the work we are doing and an unclear commitment to learning what actually does and does not work (regardless of felt commitment to poverty reduction & development).
a large part of the reason that experiments and impact evaluations in development have become popular is that we weren’t getting as far as we needed with theory, intuition or observation alone. money and other resources were being put into programs when we don’t know if they are effective (even if things seemed to be changing in the presence of the program), let alone how they compared to other programs in terms of efficacy or cost-efficiency. process and implementation evaluations that could have improved subsequent program interventions were not being conducted and/or shared.
it seems like we need to pause and think about how and why we are experimenting.
- we experiment because we don’t know what works – or whether something that works in one location will work in another. if we knew what worked, we would potentially be under some ethical obligation to do that thing for all people in all places we thought it would work. when we don’t know what works, or when there is at least genuine disagreement about the best approaches, an experimental design is justified. in short, we need to bring equipoise into social science research. in part, this means that we should be testing our new (experimental) idea against the best known or available intervention with a similar goal. new drugs are usually tested against a placebo and a regularly used treatment.
- because we are experimenting, we should encourage the publication of null findings and laud these as equally important learning experiences. this requires funders recognizing such reporting as essential for reporting on the accountability of studies and program implementations. it also requires changing the strong bias of journal editors and reviewers to only publish significant findings. confidence intervals aside, null findings may be just as “significant” for our understanding of what works and doesn’t work in development as reporting statistically significant results.
- evaluations probably need to start to look more like programs that could be scaled-up. there are good experimental reasons for manipulating only one or two key variable(s) at time and trying to limit all other contamination, but there has to be increasing movement toward learning what works in situ, even if that means there is more than one moving part. and if it is really unclear how the findings from an experiment would be scaled-up in a program or policy, then the experiment likely needs to be re-thought.
- also, we need to think more about the ethics of doing social science experiments in low- and middle-income countries. there are increasing obligations for clinical research by large pharmaceutical or academic institutions, if the drug proves effective, to – at a minimum – make the drug available to – again, at a minimum – the host community. this is because the host community bore some risk in participating in an experimental intervention — but more generally because any intervention alters biological and social patterns that will remain changed after the trial ends and the researchers leave the community to publish their results in scientific journals.
- experimenting is good in a context in which we aren’t sure what works. NGO- and state-run programs need to be linked with evaluation efforts. there are roughly a bajillion graduate students interested in program evaluation, development economics, and so on and there are a large number of programs that are being run by governments or NGOs without any rigorous evaluation or clear delineation of ‘lessons learned’ – or at least evaluations that get talked about in the technocratic space. none of these programs will offer a perfect experimental design but, hey, that’s where the complex statistics come in. all we need is a yenta to link grad students to programs (and evaluation funding) and we’re set.
- experiments, programs, policies, etc, need to allow us to learn about the implementation process as well as the outcomes. deviations from initial design and unexpected hurdles along the way should be reported so that everyone can learn from them. yes, the reality of actually running these programs may make it more difficult to make causal inference with certainty – but these aren’t just aberrations in an experimental design, they’re part of the reality into which any scaled-up effort would be plugged. this is similar to the distinction between “efficacy” and “effectiveness” in clinical research. knowing how an intervention performs under ideal experimental conditions (efficacy) may not tell us how the same intervention program performs applied under real world circumstances or scaled up to other communities (effectiveness).
- replication is central to the natural sciences, but still largely under-utilized the social sciences and development research. but we need to recognize the importance of replication in confirming or dis-confirming the results from program implementation studies and encourage greater publication of replication studies.
*see, for example, “Moral standards for research in developing countries: from ‘reasonable availability’ to ‘fair benefits’” or “What makes clinical research in developing countries ethical? The benchmarks of ethical research”
**big thanks to Mike for helping me sort through a lot of the ideas in this post