Brief Thought on Commitment-To-Analysis Plans

First, I am starting a small campaign to push towards calling ‘pre-analysis plans’ something else before the train gets too far from the station. Something like ‘commitment to analysis plans’ or ‘commitment to analysis and reporting plans.’ I have two reasons for this.

  1. PAP just isn’t a super acronym; it’s kind of already taken.
  2. I think the name changes moves the concept a step back from indicating that the researcher needs to pre-specify the entirety of the analysis plan but, rather, to indicate the core intended dating cleaning and coding procedures and the central analysis — and to commit to completing and reporting those results, whether significant or not. this shift, from a commitment rather than a straitjacket, seems like it would go some way towards addressing concerns expressed by Olken an others that the task of pre-specifying all possible analyses ex ante is both herculean and blinkered, in the sense of not incorporating learning’s from the field to guide parts of the analysis. the commitment, it seems to me, should be partly around making clear to the reader of a study which analyses were ‘on plan’ and which came later, rather than claiming perfect foresight.

Second, speaking of those learning’s from the field that may be incorporated into analysis… I had a moment today to think a bit about the possible views from the field that come from surveyors (as I am working on doing some of my dissertation analysis and already starting to form a list of questions to write back to the survey team with which I worked!). Among the decisions laid out by folks like Humphreys and Mckenzie in their lists of what should be specified in a commitment to analysis plan (doesn’t a ‘CAP’ sound nice?) about data cleaning, surveyors play very little role.

Yet a survey (or discussion) among survey staff about their experience with the questionnaire can yield information on whether there were any questions with which they systematically felt uncomfortable or uncertain about or that respondents rarely seemed to understand. Yes, many of these kinks should be worked out during piloting but, no, they aren’t always. Sometimes surveyors don’t get up the gumption to tell you a question is terrible until the research is underway and sometimes they themselves don’t realize it.

For example, in one field experiment with which i was involved, surveyors only admitted at the end (we conducted an end-of-survey among them) how uncomfortable they were with a short-term memory test module (which involved asking respondents to repeat strings of numbers) and that it was quite embarrassing to ask these questions of their elders. To the point that some of them breezed through these questions pretty quickly during interviews and considered some of the answers they reported suspect. Some wrote fairly agonizing short essays to me in the end-of-survey questionnaire (it’s a good thing to make them anonymous!), asking me to “Imagine that you have to ask this question to an elder…” and proceeded to explain the extreme horror of this.* As the short-term memory module was not part of the central research question or main outcomes of interest, it was not subjected to any of the audit, back-check, or other standard data-quality procedures in place, and so the problem was not caught earlier.

I can imagine a commitment-to-analysis plan that committed to collecting and incorporating surveyor feedback. For example, a CAP that stated that if >90% of surveyors reported being uncertain about the data generated by a specific question, those data would be discarded or treated with extreme caution (and that caution passed on to the consumers of the research). Maybe this could be one important step to valuing, in some systematic way, the experience and insights of a survey team.

*For the record, I can somewhat imagine this, having used to work in a call center to conduct interviews with older women following up on their pelvic floor disorder surgery and whether they were experiencing any urinary symptoms. In that case, however, most of the discomfort was on my side, as they were well versed in — and fairly keen to — talking about their health issues and experiences! Note to self: aim not to have pelvic floor disorder.

Advertisements

Thinking More About Using Personas/Personae In Developing Theories of Change

I have previously advocated, here (and here), for taking a ‘persona’ or character-based approach to fleshing out a theory of change. This is a way of involving a variety of stakeholders (especially those closer to the ground, such as intended beneficiaries and street-level implementer’s) in discussions about program and theory of change development — even when they are not physically at the table, which is not always possible (though encouraged, of course).

This week, I had a new chance to put some of these ideas into action. A few lessons learned for future efforts:

  • This activity worked well in small groups. However, it may be too much to ask groups to fully develop their own personae, especially given possible time limits within the confines of a workshop.
    • It may be better to have some partially developed characters in mind (for example, that represent differing initial levels of the key outcomes of interest and variation on some of the hypothesized sub-groups of interest (explanatory variables). Groups can then take a shorter amount of time to elaborate — rather than develop — these dossiers and give a name to each of their creations (Mary, Bob, Fatima, etc). Alternatively, developing dossiers (and therefore articulating sub-groups of interest) could be a separate, opening activity.
  • Introducing any language about “role-playing” can lead to only one person in a group assuming the role of a given character and the group sort of playing ’20 Questions’ to that character, rather than everyone trying to consider and take on the thoughts, intentions, and decisions and steps a given character might take, confronted with a given intervention (as either a targeted beneficiary or an implementer). The idea is to get the team thinking about the potential barriers and enablers at multiple levels of influence (i.e. assumptions) that may be encountered on the path towards the outcomes of interest.
  • Speaking in “I” statements is helpful in helping people try to think like the different adopted personae. I really had to nag people on this in the beginning but I think it was ultimately useful to get people speaking in this way. In relation to this, there may be important lessons from cognitive interviewing (how-to here) practice, to get activity participants to think out loud about the chain of small decisions and actions they would need to take when confronted with a new program or policy.
  • I noted a marked tendency this time around for men to only speak for male characters and for women, the same! There may be some creative ways to discourage this (thoughts welcome).
  • There are two potential key goals of an activity like this, which should be kept distinct (and be articulated early and often during the activity) even though they are iterative.
    • A first relates to Elaborating Activities, that is, to develop a robust intervention, so that nuance to activities and ‘wrap-around’ support structures (to use Cartwright and Hardie’s terminology) and activities can be developed. This can lead to a laundry or wish list of activities — so if is at the brainstorming stage, this can be articulated as an ‘ok’ outcome or even an explicit goal.
    • A second relates to Explicating and Elaborating assumptions, filling in all the intermediate steps between the big milestones in a results chain. This second goal is bound up in the process of moving from a log-frame to a robust theory of change (as noted by John Mayne at the Canadian Evaluation Society, this is adding all the arrows to the results chain boxes) as well as a more robust and nuanced set of indicators to measure progress towards goals and uncover mechanisms leading to change.
      • A nice wrap-up activity here could include sorting out the assumptions for which evidence is already available and which should be collected and measured as part of research work.
  • It remains an important activity to elaborate and verbally illustrate how X character’s routines and surroundings will be different if the end-goals are reached — given that social, environmental, infrastructural and institutional change is often the goal of ‘development’ efforts. This last step of actually describing how settings and institutions may operate differently, and the implications on quotidian life, is an important wrap-up and time needs to be made for it.

Of course, the use of personae (or an agent-based perspective) is only one part of elaborating a theory of change. But it can play an important role in guiding the other efforts to provide nuance and evidence, including highlighting where to fill in ideas from theoretical and empirical work to end up with a robust theory of change that can guide the development of research methods and instruments.

Would be great to hear further ideas and inputs!

Back to Basics — Trusting Whether and How The Data are Collected and Coded

This is a tangential response to the lacour and #lacourgate hubbub (with hats off to the summaries and views given here and here). While he is not implicated in all of the comments, below, I am mostly certainly indebted to Mike Frick for planting the seed of some of the ideas presented below, particularly on member-checking (hopefully our under-review paper on the same will be out sometime in the future…). Salifu Amidu and Abubakari Bukari are similarly motivational-but-not-implicated, as are Corrina Moucheraud, Shagun Sabarwal and Urmy Shukla.

To a large extent, the lacour response is bringing a new angle on an increasingly familiar concern: trusting the analysis. This means additional (and important) calls for replication and other forms of post-publication peer review (as Broockman calls for) as a guard against significance-hungry, nefarious researchers. Pre-analysis plans, analytic/internal replications, and so on, are all important steps towards research transparency. But they miss the fundamental tendency to treat data as ‘true’ once it makes it into the familiar, rectangular format of a spreadsheet.

Given lacour, it seems clear that we may need to take an additional step back to get into the heart of research: the data. We place a lot of trust in data themselves — between advisers and advisees, between research collaborators, and between producers and users of large, public data sets. and, in turn, between PIs and research assistants and the actual team collecting the data. This trust is about, of course, whether the data exist at all and whether they measure what they purport to measure. (Green seems to have had a hunch about this?)

We should be clear about the foundations of this trust and what we might do to strengthen it. Ultimately, the lacour story is a story about the production of data, not its analysis. The transparency agenda needs to expand accordingly, to address the fundamental constancy that ‘shit in leads to shit out.’

Here’s a few thoughts:

  • Start to teach data collection like it matters. Survey design and data collection are weirdly absent from many graduate programs — even those oriented towards research. You may pick these up in electives but they are rarely required, to my knowledge. Learning about construct validity, validating test instruments in new contexts, questionnaire design, the potential for interview effects, some of the murky and inchoate contents of the activity labelled as ‘formative work*,’ etc, need not be re-discovered by each new graduate student or research assistant who takes on field work. If a course-work model won’t work, then a much more explicit apprenticeship model should be sought for those pursuing primary empirical work. in terms of teaching, one occasionally might be forgiven for thinking that impact evaluators had discovered data collection and that there aren’t mounds of resources on household surveys, psychometric’s, and questionnaire design that can be used to better ensure the quality and truthfulness of the data being collected. Interdisciplinary work needs to start with what and by what means and measures data are collected to answer a particular question.
  • Report on data quality practices. Lots of survey firms and researchers employee strategies such as data audits and back-checks. Good on you. Report it. This almost never makes it into actual publications but these are not just internal operations processes. Researchers do need to put forth some effort to make their readers trust their data as well as their analysis but so much less work seems to go into this. With the rise of empirical work in economics in other fields, this needs to be given more documented attention. If you threw out 5% of your data because of failed back-checks, tell me about it. I’d believe the remaining 95% of your data a lot more. The onus is on the researchers to make the reader trust their data.
  • Treat surveyors as a valuable source of information. It is increasingly common to at least have surveyors fill a question at the end of questionnaire about whether the respondent was cooperative (usually a Likert scale item) or other brief reflection on how the interview went. I have no idea what happens to responses to the data so produced — if they are used to throw out or deferentially weight responses, do please tell the reader about it. Moreover, you can systematically ask your surveyors questions (including anonymously) about question items that they don’t trust. For example, I asked (in written form) this question of surveyors and most reported that it was incredibly embarrassing for them to ask their elders to play certain memory games related to short-term recall. This might be a good sign to tread lightly with those data, if not discount them completely (whether or not the surveyors faithfully asked the embarrassing question, it still suggests that it created a tense social interaction that may not have generated trustworthy data, even if it didn’t fall in the traditional space of ‘sensitive questions.’). If nothing else, the surveyors’ assessments may be given as part of the ‘methods’ or ‘results’ attended to in publications. And, in general, remembering that surveys are human interactions, not matrix populators, is important.
  • Member-check. Member-checking is a process described by Lincoln and Guba (and others) that involves taking results and interpretations back to those surveyed to test interpretative hypotheses, etc. if some results really fly in the face of expectations, this process could generate some ‘red flags’ about which results and interpretations should be treated with care. And these can be reported to readers.
  • Coding. As with ‘formative work,’ the nuances of ‘we coded the open-ended data’ is often opaque, though this is where a lot of the interpretive magic happens. This is an important reason for the internal replication agenda to start with the raw data. In plenty of fields, it would be standard practice to use two independent coders and to report on inter-rater reliability. This does not seem to be standard practice in much of impact evaluation. This should change.
  • Check against other data-sets. It would not take much time for researchers to put into context their own findings by comparing (as part of a publication) the distribution of results on key questions to the distribution from large data-sets (especially when some questionnaire items are designed to mimic the dhs, lsms, or other large public data-sets for precisely this reason). This is not reported often enough. This does not mean that the large, population-linked data-set will always trump your project-linked data-set but it seems only fair to alert your readers to key differences, for the purposes of internal believability as well as external validity.
  • Compare findings with findings from studies on similar topics (in similar contexts) — across disciplines. Topics and findings do not end with the boundaries of a particular method of inquiry. Placing the unexpectedness of your findings within this wider array of literature would help.
  • Treat all types of data with similar rigor and respect. (Cue broken record.) If researchers are going to take such care with quantitative data and then stick in a random quote as anec-data in the analysis without giving any sense of where it came from or whether it should be taken as representative of the entire sample or some sub-group… well, it’s just a bit odd. However you want to label these different types of data — quant and qual or data-set-observations and causal-process observations — they are empirical data and should be treated with the highest standards known in each field of inquiry.

I can’t assess whether any of these measures, singly or together, would have made a major difference in the lacour case — especially since it remains nebulous how the data were generated, let alone with what care. But the lacour case reveals that we need to be more careful. A big-name researcher was willing to trust that the data themselves were real and collected to the best of another researcher’s ability — and focused on getting the analysis right. In turn, other researchers bought into both the analysis and the underlying data because of the big-name researcher. This suggests we need to do a bit more to establish trust in the data themselves — and that the onus for this is on the researchers — big names or no — claiming to have led the data collection and cleaning processes. This is especially true given the unclear role for young researchers as potential replicators and debunkers, highlighted here. I hope the transparency agenda steps up accordingly.

*If on occasion a researcher reported on what happened during the ‘formative phase’ and about how the ‘questionnaire was changed in response,’ that would be really interesting learning for all of us. Report it. Also, if you are planning to do ‘qualitative formative work’ to improve your questionnaire, it would be good if you built in time in your research timeline to actually analyze the data produced by that work, report on that analysis, and explain how the analysis led to changing certain questionnaire items…

Thinking About Building Evaluation Ownership, Theories of Change — Back From Canadian Evaluation Society

This week I had the pleasure of attending the Canadian Evaluation Society (#EvalC2015) meeting in Montreal, which brought together a genuinely nice group of people thinking not just hard a-boot evaluation strategies and methodologies but also how evaluation can contribute to better and more transparent governance, improving our experience as global and national citizens — that is, evaluation as a political (and even social justice) as much as a technical act.

Similarly, there was some good conversation around the balance between the evaluation function being about accountability versus learning and improvement and concern about when the pendulum swings too far to an auditing rather than an elucidating and improving role.

For now, I want to zoom in on a two important themes and more own nascent reflections on them. I’ll be delighted to get feedback on these thoughts, as I am continuing to firm them up myself. my thoughts are in italics, below.

  1. Collaboration, neutrality and transparency
    1. There were several important calls relating to transparency, including a commitment to making evaluation results public (and taking steps to make sure citizens see these results (without influencing their interpretation of them or otherwise playing an advocacy role)) and for decision-makers claiming to have made use of evidence to inform their decisions to be more open about how and which evidence played this role. This is quite an important point and it echoes some of the points Suvojit and I made about thinking about the use of evaluative evidence ex ante. We’re continuing to write about this, so stay tuned.
    2. There was quite a bit of push back about whether evaluation should be ‘neutral’ or ‘arm’s length’ from the program — apparently this is the current standard practice in Canada (with government evaluations). This push back seems to echo several conversations in impact evaluation about beginning stakeholder engagement and collaboration far earlier in the evaluation process, including Howard White’s consideration of evaluative independence.
    3. Part of the push back on ‘arm’s length neutrality’ came from J. Bradley Cousins, who will have a paper and ‘stimulus document’ coming out in the near future on collaborative evaluation that seems likely to be quite interesting. In another session, it was noted that ‘collaboration has more integrity than arm’s length approaches. I particularly liked the idea of thinking about how engagement between researchers and program/implementation folks could improve a culture of evaluative thinking and organizational learning — a type of ‘capacity building’ we don’t talk about all that often. Overall, I am on board with the idea of collaborative evaluation, with the major caveat that evaluators need to report honestly about the role the play vis-a-vis refining program theory, refining the program contents, assisting with implementing the program, monitoring, etc.
  2. Building a theory of change and fostering ownership in an evaluation.
    1. There was a nice amount of discussion around making sure that program staff, implementers, and a variety of stakeholders could “see themselves” in the theory of change and logic model/results chain. This not only included that they could locate their roles but also that these planning and communication tools reflected the language with which they were used to talking about their work. Ideally, program staff can also understanding their roles and contributions in light of their spheres of direct and indirect influence.
    2. John Mayne and Steve Montague made some very interesting points about building a theory of change, which I will have to continue to process over the upcoming weeks. they include:
      1. Making sure to think about ‘who’ in addition to ‘what’ and ‘why’ — this includes, I believe, who is doing what (different types and levels of implementer’s) as well as defining intended reach, recognizing that some sub-groups may require different strategies and assumptions in order for an intervention to reach them.
      2. As was noted “frameworks that don’t consider reach conspire against equity and fairness” because “risks live on the margin.” I haven’t fully wrapped my head around the idea of ‘theories of reach’ embedded or nested within the theory of change but am absolutely on-board with considering distributional expectations and challenges from the beginning and articulating assumptions about when and why we might expect heterogeneous treatment effects — and deploying quantitative and qualitative measurement strategies accordingly.
    3. John Mayne advocated his early thoughts that for each assumption in a theory of change, the builders should articulate a justification for its:
      1. Necessity — why is this assumption needed?
      2. Realization — why is this assumption likely to be realized in this context?
      3. This sounds like an interesting way to plan exercises towards collaborative building of theories of change
    4. a productive discussion developed (fostered by John Mayne, Steve Montague and Kaireen Chaytor, among others) around how to get program staff involved in articulating the theory of change. A few key points were recurring — with strong implications for how long a lead time is needed to set up an evaluation properly (which will have longer-term benefits even if it seems to be slightly inefficient upfront):
      1. Making a theory of change and its assumptions explicit is part of a reflective practice of operations and implementation.
      2. Don’t try to start tabula rasa in articulating the theory of change (‘the arrows’) with the implementing and program staff. Start with the program documents, including their articulation of the logic model or results chain (the ‘boxes’ in a diagrammatic theory of change) and use this draft as the starting point for dialogue.
      3. It may help to start with one-on-ones with some key program informants, trying to unpack what lies in the arrows connecting the results boxes. This means digging into the ‘nitty girtty’ micro-steps and assumptions, avoiding magical leaps and miraculous interventions. Starting with one-on-ones, rather than gathering the whole group to consider the results chain, can help to manage some conflict and confusion and build a reasonable starting point — despite the fact that:
      4. Several commentators pointed out that it is unimportant whether the initial results chain was validated or correct — or was even set up as a straw-person. Rather, what is important was having something common and tangible that could serve as a touchstone or boundary object in bringing together the evaluators and implementer’s around a tough conversation. In fact, having some flaws in the initial evaluators’ depiction of the results chain and theory of change allows opportunities for program staff to be the experts and correct these misunderstandings, helping to ensure that program staff are not usurped in the evaluation design process.
      5. Initial disagreement around the assumptions (all the stuff behind the arrows) in the theory of change can be productive if they are allowed to lead to dialogue and consensus-building. Keep in mind that the theory of change can be a collaborative force. As Steve Montague noted, “building a theory of change is a team sport,” and needs to be an iterative process between multiple stakeholders all on a ‘collective learning journey.’
        1. One speaker suggested setting up a working group within the implementing agency to work on building the theory of change and, moreover, to make sure that everyone internally understands the program in the same way.
    5. This early engagement work is the time to get construct validity right.
    6. The data collection tools developed must, must, must align with the theory of change developed collectively. This is also a point Shagun and i made in our own presentation at the conference, where we discussed our working paper on meaningfully mixing methods in impact evaluation. stay tuned!
    7. The onus is on the evaluator to make sure that the theory of change is relevant to many stakeholders and that the language used is familiar to them.
    8. There was also a nice discussion about making sure to get leadership buy-in and cooperation early in the process on what the results reporting will look like. Ideally the reporting will also reflect the theory of change.

Overall, much to think about and points that I will definitely be coming back to in later work. Thanks again for a great conference.

Aside

“Politically Robust” Experimental Design in Democracies and a Plea For More Experience Sharing

Sometimes I re-read a paper and remember how nice a sentence or paragraph was (especially when thinking that a benevolent or benign dictator might make research so much easier, as though easy was the main goal of research).

So it is with the paper by Gary King and colleagues (2007) on “a ‘politically robust’ experimental design for public policy evaluation, with application to the mexican universal health insurance program”.

Scholars need to remember that responsive political behavior by political elites is an integral and essential feature of democratic political systems and should not be treated with disdain or as an inconvenience. Instead, the reality of democratic politics needs to be built into evaluation designs from the start — or else researchers risk their plans being doomed to an unpleasant demise. thus, although not always fully recognized, all public policy evaluations are projects in both political science and political science.

What would be nice is if researchers would share more of their experiences and lessons learned not just in robust research design (though this is critical) but also in working to (and failing to) persuade local political leaders to go along with randomization schemes and to potentially hold off any kind of scale-up until the results are in… and only if they are promising!

Thinking About Stakeholder Risk and Accountability in Pilot Experiments

This post is also cross-posted here in slightly modified form.

Since I keep circling around issues related to my dissertation in this blog, I decided it was time to start writing about some of that work. As anyone who has stood or sat near to me for more than 5 minutes over the past 4.25 years will know, in my thesis I examine the political-economy of adopting and implementing a large global health program (the affordable medicines facility – malaria or “AMFm”). This program was designed at the global level (meaning largely in D.C. and Geneva with tweaking workshops in assorted African capitals). Global actors invited select Sub-Saharan African countries to apply to pilot the AMFm for two years before any decision was made to continue, modify, scale-up, or terminate. It should also be noted from the outset that it was not fully clear what role the evidence would play in the board’s decision and how the evidence would be interpreted. As I highlight below, this lack of clarity helped to foster feelings of risk as well as a resistance among some of the national-level stakeholders about participating in the pilot. . . as  . .

To push the semantics a bit, several critics have (e.g.) noted that scale and scope and requisite new systems and relationships involved in the AMFm disqualify it from being considered a ‘pilot,’ though i use that term for continuity with most other AMFm-related writing. . .

In my research, my focus is on the national and sub-national processes of deciding to participate in the initial pilot (‘phase I’) stage, focusing specifically on Ghana. Besides the project scale and resources mobilized, one thing that stood out about this project is that there was a reasonable amount of resistance to piloting this program among stakeholders in several of the invited countries. I have been very fortunate that my wonderful committee and outside supporters like Owen Barder have continued to push me over the years (and years) to try to explain this resistance to an ostensibly ‘good’ program. Moreover, I have been lucky and grateful that a set of key informants in Ghana that have been willing to converse openly with me over several years as I have tried to untangle the reasons behind the support and resistance and to try to get the story ‘right’. . .

The set-up of the global health pilot experiment, from the global perspective, the set-up was a paragon of planning for evidence-informed decision-making: pilot first, develop benchmarks for success and commission an independent evaluation (a well-monitored before and after comparison) — and make decisions later. . .

In my work, through a grounded qualitative analysis, I distil the variety of reasons for supporting and resisting Ghana’s participation in the AMFm pilot to three main types: those related to direct policy goals (in this case, increasing access to malaria medication and lowering malaria mortality), indirect policy goals (indirect insofar as they are not the explicit goals of the policy in question, such as employment and economic growth), and finally those related to risk and reputation (individual, organizational, and national). I take the latter as my main focus for the rest of this post. . . . .

A key question on which I have been pushed is the extent to which resistance to participation (which meant resisting an unprecedented volume of highly subsidized, high-quality anti-malarial treatments entering both the public and the private sector) emerges from the idea of the AMFm versus the idea of piloting the AMFm with uncertain follow-up plans. . ..

Some issues, such as threats to both direct and indirect policy goals often related to the AMFm mechanism itself, including the focus on malaria prevention rather than treatment as well as broader goals related to national pride and the support of local businesses. The idea of the AMFm itself, as well as it a harbinger of approaches (such as market-based approaches) to global health, provoked both support and resistance. . . .

But some sources of resistance stemmed more directly from the piloting process itself. By evidence-informed design, the global fund gave “no assurance to continue [AMFm] in the long-term,” so that the evaluation of the pilot would shape their decision. This presented limited risks to them. At the national level, this uncertainty proved troubling, as many local stakeholders felt it posed national, organizational, and personal risks for policy goals and reputations. Words like ‘vilification‘ and ‘chastisement‘ and ‘bitter‘ came up during key informant interviews. in a point of opposing objectives (if not a full catch-22, a phrase stricken from my thesis), some stakeholders may have supported the pilot if they knew the program would not be terminated (even if modified), whereas global actors wanted the pilot to see if the evidence suggested the program should (not) be terminated. Pilot-specific concerns related to uncertainties around the sunk investments of time in setting up the needed systems and relationships, which have an uncertain life expectancy. also, for a stakeholder trying to decide whether to support or resist a pilot, it doesn’t help when the reputation and other pay-offs from supporting are uncertain and may only materialize should the pilot prove successful and be carried to the next stage. . . .

A final but absolutely key set of concerns for anyone considering working with policy champions is what, precisely, the decision to continue would hinge upon. Would failure to meet benchmarks be taken as a failure of the mechanism and concept? A failure of national implementation capacity and managerial efforts in Ghana (in the face of a key donor)? A failure of individual efforts and initiatives in Ghana? .

Without clarity on these questions about how accountability and blame would be distributed, national stakeholders were understandably nervous and sometimes resistant (passively of actively) to Ghana’s applying to be a phase I pilot country. To paraphrase one key informant’s articulation of a common view, phase I of the AMFm should have been an experiment on how to continue, not whether to continue, the initiative. . . .

How does this fit in with our ideas of ideal evidence-informed decision-making about programs and policies? The experience recorded here raises some important questions when we talk about wanting policy champions and wanting to generate rigorous evidence about those policies. Assuming that the policies and programs under study adhere to one of the definitions of equipoise, the results from a rigorous evaluation could go either way.

What risks does the local champion(s) of a policy face in visibly supporting a policy?

Is clear accountability established for evaluation outcomes?

Are there built-in buffers for the personal and political reputation of champions and supporters in the evaluation design?

The more we talk about early stakeholder buy-in to evaluation and the desire for research uptake on the basis of evaluation results, the more we need to think about the political economy of pilots and those those stepping up to support policies and the (impact) evaluation of them. Do they exist in a learning environment where glitches and null results are considered part of the process? Can evaluations help to elucidate design and implementation failures in a way that has clear lines of accountability among the ‘ideas’ people, the champions, the managers, and the implementer’s? These questions need to be taken seriously if we expect government officials to engage in pilot research to help decide the best way to move a program or policy forward (including not moving it forward at all).

hey, you’re sick? can we know where you live? (simple experiments, complicated designs)

i have been working on a blog for 3ie (*now posted, here!), based on a delhi seminar talk i gave recently, in turn based on this paper and this blog on using text messages to increase completion of anti-malarial treatment. not all of the material i initially wrote will appear in the final blog — which is good, it is more focused now. nevertheless, i thought i would share the excised part that i wrote about the actual experimental set-up. part of the point i made in the seminar — and which i intend to make in this blog — was that as a ‘simple’ and ‘hands-off’ intervention (meaning minimal input from either the surveyors or our hosts, medical vendors, in encouraging or facilitating take-up) can require an all-the-more complicated study design. .

..

rainy season!

rainy season!

the study was meant to be a proof-of-concept — it is the first randomized evaluation of a direct-to-patient (rather than to health workers) text messaging program for malaria in sub-saharan africa. as guided by our funder, CHAI (as an operational research project for the Affordable Medicines Facility – malaria (AMFm), we wanted to learn about the minimum of supportive moving parts required to get patients enrolled into a text messaging system of reminders to complete education. a relatively simple mHealth intervention – much of the intervention work was done by a computer, a modem, some lines of code, and occasionally an electric generator – it was the rainy season after all) – required a fairly intricate experimental set-up.

.. .

diligently organizing follow-up

diligently organizing follow-up

it was a deviation for ipa-ghana in terms of not-household-or-firm-based-surveying and it took a lot of logistical coordination (as shown in the photo), attention to timetables, and a willingness on the part of the surveyors to set out without perfectly clear instructions.  .. .

.

things worked roughly like this: with participating anti-malarial vendors, we (the research team) worked to maintain a façade of a separate flyer system to invite people to enroll in a mobile health information system, distinct from the surveyor talking to people obtaining an antimalarial medication. during that interview about the medicines purchased and price paid, if the respondent proved willing and eligible, the surveyor asked how we could find them at home. and for their phone number. and for directions to a neighbor’s house. and the neighbor’s phone. (i have written previously about the fun of trying to track people down based on the directions they give.) just in case we decided to follow-up to see how they were doing. and the surveyors were purposefully cagey about whether and when this would be happening and to what end. .

.

[i am still amazed that so many people agreed to this (thank you, participants!) — if someone approached me in a drug store while i was sick and asked me to chat and then how to find me at home, even my deepest beliefs in survey karma would probably be abandoned in favor of some fairly unbecoming language.] . .

.

but, in fact, we had a very precise purpose and timing in mind: to find the respondent between ~58 and 72 hours after we met them in a drug shop. with some basic assumptions about when they would start their antimalarial medication, we wanted to land up at people’s homes just after they had completed their medication. respondents often didn’t give very precise directions to their homes, resulting in a lot of motorcycle-mounted wild goose chases – but we also achieved a follow-up rate of 99% within our specified time window. phew! . this is really a tribute to the commitment and perseverance of the survey team and our field managers.