bringing in the state for experiments and development efforts — when? how?

there have been a lot of mentions of governments, experiments, ownership, & development in the past two weeks that sparked a few ideas in my head. the underlying theme is that we need to start considering not just the political economy of the contexts in which we work but how to actually bring political and economic considerations – and interests in long-term sustainability, accountability, and ‘ownership’ – into program design and implementation, as well as into the experiments to test those programs. i first consider lessons from a totally hypothetical RCT. then, two quick reviews of new programs related to public sector involvement in development efforts.

first, the political economy of experimentation: lessons stemming from a totally hypothetical nation-wide RCT run in conjunction with the government of a totally hypothetical state.

as suggested before, experiments, should often be designed with the (likely) ultimate implementer in mind – ideally, in consultation with them. because much of development deals with public goods and market failures, there is a good chance the state would ultimately be involved in an experiment-born or experiment-tested program/policy in order to bring it to scale, institutionalize it, & sustain it.

this requires experimental design to start to look more like program design. research design cannot substitute for program or policy design. program & policy design requires inputs from the local context and from the field of study & practice relevant to the content of the intervention.

these designs must also account for politics. working with the government – at a large and visible scale – means what once may have been a more ‘neutral’ or ‘local’ experiment is now political. who gets access to the intervention, when, and how (Lasswell, 1936) becomes critical. this will often be inherently at odds with randomized evaluation (King et al, 2007) (Fox & Reich, forthcoming). it’s not impossible, it just takes a good deal of planning & savvy in both program and research design & implementation.

even in the face of political, technical, and social challenges to implementation of the program & the study, there will always be important lessons to be learned — but only if solid monitoring is taking place and process evals are reported. all stakeholders need to be on board with that up front (tough!).

  • when does the government come in?
    • option 1: the experiment starts because of a program/policy the government wants to try or tweak. thus, it is government- or implementer-initiated. these experiments are, i believe, along the lines of what David Brooks suggests here.
    • option 2: researchers/technocrats have an idea that they want to try at a large-scale, and therefore approach the government. the gov/implementer should be on board before the design is finalized. they will have a much clearer idea than almost any researcher of what is politically feasible and what incentives will need to be offered to get the buy-in of on-the-ground implementers and bureaucratic structures. that is, the program design needs to be rigorous, with technical and theoretical aspects adjusted to the local political economy and an understanding of different stakeholders’ interests and capacity.
    • in either case: if the implementer seems unwilling to scale-up an idea in the design phase, the experiment should be re-considered and re-designed. if the government lacks capacity or is stretched too thin to scale-up and sustain the idea, then the extent to which capacity-building can and should be built into the experiment needs to be strongly considered (as well as whether an alternative approach is needed).
  • how can the state & other stakeholders be brought in?
    • a wide variety of stakeholders should be involved in deciding how to measure an acceptable level of progress and ‘success.’ all stakeholders (including politicians, bureaucrats) should articulate what will make the experiment worthwhile to them and what they would need to see in order to ‘be convinced’ that the program/policy is worth continuing to pursue. treatment effect will be far from the only thing that matters.
    • responsibility, political risk, & political timelines need to be discussed explicitly. having technocrats and researchers involved may shield politicians from some blame if the program does not work as intended – but a plan to laud politicians for engaging in evidence-building policy and to allow them to take some credit when things go well – will be important (Fox & Reich, forthcoming).
  • other lessons that could be learned from this hypothetical study:
    • for a variety of appearance & funding reasons – because of commitment to public sector involvement – we may want it to appear that the government is running the experimental program. but if it’s not what’s happening on the ground, re-assess and either build capacity or adapt the program to reality.
    • proof-of-concept experiments will still have a role in helping certain ideas seem less risky to stakeholders – because large-scale policy changes are risky, both in terms of domestic politics and global politics. these small-scale experiments are also the best testing ground for multiple treatment arms, rather than proliferating treatment arms when working at national-scale. however, even in these experiments, we should consider what outcomes will be politically relevant and what technical kinks we encounter, so that: (1) the results are more easily ‘sell-able’ to politicians and other implementers down the line and (2) it will be clearer what capacities and resources are necessary for the program/policy in question.

moving on from our hypothetical example (whew!)… on to two non-research-based examples of designing and implementing development efforts in ways to encourage state involvement and to increase its accountability & sustainability. what lessons about incentives & partnering could the experimental world – with an increasing eye toward scale-up and sustainability – learn from these?

the Gates Foundation has an interesting (and uncharacteristic) initiative on the table to get state governments in Nigeria more involved in vaccination and MDG efforts (h/t @KarenGrepin). this effort recognizes the key role of governmental leadership in the implementation of development programs (though these grants will go to the ‘implementing partner;’ states will be able to demonstrate ownership in part through co-funding awards projects (?)).

there are 11 process & outcome criteria on which programs will be judged, with awards going to the highest-performing state (also for the most-improved) in each of Nigeria’s geo-political zones. ultimately, it is not clear to me exactly how this will reward political effort or encourage the design of programs that can be institutionalized (vaccines – you have to keep giving them year after year!) – but kind of a cool idea to have all those government- & partner-designed pilots running simultaneously and rewarding some outcomes. it would be great to know what other sorts of evaluation will be going on outside of the 11 award criteria, so that we may all learn from challenges and successes.

meanwhile, Water for People (@NedBreslin) has an interesting commentary (h/t How Matters) on sustainability and ownership. NB suggests that fiscal and operational capacity and discipline of the implementers need to be central considerations in program design and implementation. NB suggests that programs and capital investments need to be designed and tailored to the demonstrated ability-to-pay of the community in question. that is, co-financing between a development agency (including researchers?), communities, and local government is required before new infrastructure was installed. the general idea seems to be this: ‘we (researchers(?)/philathropists/etc) want to work towards achieving X outcome in your [geographic area]; achieving X outcomes requires sustained effort and investment. we have a mix-and-match menu of ways to put together a program to work towards X goal; we should determine which option will be best for this community based on, in part, your ability to invest some money up front, to the extent that your payments will be required to continue to achieve X over the long-term.’ NB goes on to say, “free projects facilitate corruption… [gov] funding starts to be allocated by governments if NGOs (researchers?) use their finances as leverage, in financial partnership with host-country governments, rather than absolving them of their financial and developmental responsibilities.”

as experiments become more like programs (and programs more like experiments), and take place at larger scales and in greater collaboration with the public sector, we need to think carefully about what is required for sound program design and implementation, such that we can still have impact after an experiment ends (though, of course, long-term measurement of process & outcomes should be supported).

Advertisements
Aside

somali pirates switching back to smaller boats

to remain less conspicuous, Somali pirates have been switching back to using smaller, more traditional boats (e.g., i think) as motherships, which blend in with fishing boats. this probably will keep them closer to shore.

more here. a few points of note:

  • “a study published in February by U.S. non-governmental organisation One Earth Future Foundation showed Somali piracy cost the world economy some $7 billion last year. The total paid in ransoms reached $160 million, with an average ransom for a ship rising to $5 million, from around $4 million in 2010.”
  • turns out, the indian ocean is real big – and therefore hard to monitor. presumably, not much of a revelation?
Aside

happy malaria day!

turns out, control means you are supposed to keep working at something (that includes funding it): http://www.bmj.com/content/344/bmj.e2935

Aside

global sea piracy down 28% in first quarter

less in somalia, more in west africa & indonesia.

more here.

demography: two things i remember from class and one thing i managed to remember from a book

sometimes memory devices and catch phrases work for me, like remembering how to set a table, how to spell encyclopedia, and how many ounces are in a pint (and a pound, incidentally). sometimes, they do not, as evidenced by the fact that i only remember the demonic halves of the demonic mnemonics for the cranial nerves and the bones of the wrist.

this might explain why i remember approximately two things from my first demography class (with Dr. Ron Lesthaeghe, who did everything in his power to teach me more than two things and deserves no blame for what i say here! (he has, however, approved it!)). since it seems like population & fertility management are back in the conversation (Rio+20, Gates, Sachs, & Blattsman, among others), perhaps phrases stuck in my brain will lodge in someone else’s as well and be of use.

  • thanks to some intense demography study sessions and the need to consolidate a lot of information, a few friends and i were finally able to develop and remember the following: “pope, no pants, no progeny.” this helped us to remember that, in France, where we saw the first large-scale fertility declines in the world around the time of the French Revolution and its reverberations in the 1800s, it was those landless farmers who were working on Church lands (that is, pope; Protestants were more likely to be small shareholders), that moved to the cities (‘no pants’ is a very literal interpretation of sans culottes – ‘no fancy pants’ may be slightly more accurate), and tended to have fewer children. thus, the fertility transition was not driven so much by mortality decline or development per se but a variety of SES conditions, often following religious, cultural, and linguistic lines.
  • Prof Lesthaeghe, building on the work of Coale, hammered home that the adoption of new (fertility) behavior requires being simultaneously ready, willing, and able to do so.
    • readiness broadly refers to economic and normative factors, such as ideas about family size; the balance of the costs and benefits of having (a certain number of) children in light of the child mortality rate; schooling laws (among other things); and issues related to female schooling and their opportunities for employment with reasonable returns to education.
    • willingness broadly refers to norms about using contraception and interfering with ‘nature’
    • ability’ refers to the knowledge and availability of means to control fertility, including accessibility, acceptability, affordability – gracious, I do like alliteration!
  • occasionally I manage to read a book. thanks to Matthew Connelley, among others, for pointing out that the former Indian minister of health and family planning Karan Singh not only said that “development is the best  contraceptive” at the Bucharest conference in 1974 but has also publicly mused as to whether the maxim should have been “contraception is the best development.” my guess is that he is right/wrong either way, since causality almost certainly runs both ways.

my own rough take is that development may well be the best motivator to change desires about completed family size, the right time to begin a family, and contraception use… but the contraception needs to be available and accessible for that to work. so, in each context, we need to assess the readiness, willingness, & ability factors that are preventing fertility decline (or pushing it down too far! looking at you italy) to determine what needs to be done to help women and families want to and achieve more control over their fertility and tailor the response accordingly. implying that one approach will work is almost certainly too simplistic. and, child mortality and female empowerment are critical parts of this conversation and potential points for intervention.

*thanks to john to spurring me to write this. he has learned my weaknesses. obviously, a big thanks to professor lesthaeghe for putting complicated ideas into phrases that even i can remember. and thanks to ellen, molly, & emily for trying to teach me osteology and for studying demography with me, respectively.

we’re experimenting! also, clarifying types of replications

a nice article from chris said, discussing how we might alter publication rules (and the granting requirements of donor organizations)  in a way to move us closer to good, useful research – specifically, looking more toward the importance of the question and the rigor of the method to answer it. i am, of course, fully in favor of focusing on important (in this case, policy-relevant) questions, rigorous design implementation (in this case, with an eye toward considering scale-up potential), solid data collection (no really, good regressions don’t fix bad data) — as well as publishing results that aren’t necessarily the sexiest but will ultimately move our understanding of what works forward in important ways.

Granting agencies should reward scientists who publish in journals that have acceptance criteria that are aligned with good science. In particular, the agencies should favor journals that devote special sections to replications, including failures to replicate. More directly, the agencies should devote more grant money to submissions that specifically propose replications. Moreover — and this is a fairly radical step that many good scientists I know would disagree with — I would like to see some preference given to fully “outcome-unbiased” journals that make decisions based on the quality of the experimental design and the importance of the scientific question, not the outcome of the experiment. This type of policy naturally eliminates the temptation to manipulate data towards desired outcomes.

(addition 30.04.2012: http://www.overcomingbias.com/2012/04/who-wants-unbiased-journals.html)

if we start taking replications more seriously in social science experiments, we may need to start being more precise with terms. there are a few possible variants/meanings of replications, potentially making it difficult for experimenters, donors, consumers of research, and other stakeholders to speak clearly with one another and set expectations.

  • one potential meaning is a program/experiment conducted in one location with one set of implementers, repeated in the same place with different implementers (say, the government versus an NGO). call this internal replication (?).
  • another type of replication would be transplanting the program/experiment to a different context, making either minor adjustments (such as language) or more substantive adjustments based on lessons learned from the first pass and a local stakeholder analysis. some range of this is external replication; it’s hard to know at what degree of modification we should really stop calling it a replication and just call it a new or extension experiment inspired by another, rather than selling it as a replication.
  • (of course, an internal replication, depending on the number of lessons learned on the first go-round and the modifications required for the second set of implementers to have a go, might itself actually be a new or extended experiment, rather than a replication. again, the line would be fuzzy but presumably some simple criteria/framework could be delineated)

h/t marginal revolution & rachel strohm

experimenting with intention

this post revisits some issues i have touched on before.

first of all, good find by roving bandit. the gist is that an experimental program undertaken in ‘ideal’ (NGO-run) conditions did not show any effect when the same program was run by the government. oops.

i think this raises several possible questions related to carrying out experiments (i am sure there are more than i cover below):

  • before undertaking (getting funding for) an experimental intervention, how clear should we be on who would be sustaining the effort and/or taking it to scale? what kind of agreement would need to be in place? would we have some effect-size threshold that would mean that we would aim for scale and sustainability, below which an idea is scrapped?
  • how do we distinguish between proof-of-concept studies and if-this-works-it’s-going-to-scale studies? how many replications of the former would we want before we did the latter?
  • how involved should the putative implementer be in the design & conduct of the experiment?
  • how much training and capacity building with the future implementer should be built into the experimental process? would we start to consider ethical requirements in this regard (i.e. experimenters have some obligation to train as well, as needed)?
  • if something doesn’t work, what responsibility do we have to help enhance the public sector’s (or other implementer’s) capacity? i.e. is the response to a null finding a scrapping of the idea or a re-tooling of the implementer? or something else?
  • how much more process evaluation & monitoring should be put in place in ‘in situ’ experiments so that we can learn more about precisely went right and wrong in implementation? how can we encourage the publication and sharing of these results, not just the treatment effect? (i swear i have an ‘in praise of process evaluation’ post coming soon. i have to atone for all the times i have denigrated it.)
  • even when a program doesn’t work, how do we make sure that the public sector (or other) implementer doesn’t get blamed for the effort and reward honesty instead of only exciting results?