Oops, Got Long-Winded About ‘Median Impact Narratives’

*A revised version of this post is also available here.

I finally got around to reading a post that had been flagged to me awhile ago, written by Bruce Wydick. While I don’t think the general idea of taking sampling and representatives seriously is a new one, the spin of a ‘median narrative’ may be quite helpful in making qualitative and mixed work more mainstream and rigorous in (impact) evaluation.

Anyway, I got a bit long-winded in my comment on the devimpact blog site, so I am sticking it below as well, with some slight additions:

First, great that both Bruce and Bill (in the comments) have pointed out (again) that narrative has a useful value in (impact) evaluation. This is true not just for a sales hook or for helping the audience understand a concept — but because it is critical to getting beyond ‘did it work?’ to ‘why/not?’

I feel Bill’s point (“telling stories doesn’t have to be antithetical to good evaluation“) should be sharper — it’s not just that narrative is not antithetical to good evaluation but, rather, it is constitutive of good evaluation and any learning and evidence-informed decision-making agenda. And Bill’s right, part of the problem is convincing a reader that it is a median story that’s being told when an individual is used as a case study — especially when we’ve been fed outlier success stories for so long. This is why it is important to take sampling seriously for qualitative work and to report on the care that went into it. I take this to be one of Bruce’s key points and why his post is important.

I’d also like to push the idea of a median impact narrative a bit further. The basic underlying point, so far as I understand it, is a solid and important one: sampling strategy matters to qualitative work and for understanding and explaining what a range of people experienced as the result of some shock or intervention. It is not a new point but the re-branding has some important sex appeal for quantitative social scientists.

One consideration for sampling is that the same observable’s (independent vars) that drive sub-group analyses can also be used to help determine a qualitative sub-sample (capturing medians, outliers in both directions, etc). To the extent that theory drives what sub-groups are examined via any kind of data collection method, all the better. Authur Kleinman once pointed out that theory is what helps separate ethnography from journalism — an idea worth keeping in mind.

A second consideration is in the spirit of Lieberman’s call for nested analyses (or other forms of linked and sequential qual-quant work), using quantitative outcomes for the dependent variable to drive case selection, iterated down to the micro-level. The results of quantitative work can be used to inform sampling of later qualitative work, targeting those representing the range of outcomes values (on/off ‘the line’).

Both these considerations should be fit into a framework that recognizes that qualitative work has its own versions of representativeness (credibility) as well as power (saturation) (which I ramble about here).

Finally, in all of this talk about appropriate sampling for credible qualitative work, we need to also be talking about credible analysis and definitely moving beyond cherry-picked quotes as the grand offering from qualitative work. Qualitative researchers in many fields have done a lot of good work on synthesizing across stories. This needs to be reflected in ‘rigorous’ evaluation practice. Qualitative work is not just for pop-out boxes (I go so far as to pitch the idea of a qualitative pre-analysis plan).

Thanks to both Bruce and Bill for bringing attention to an important topic in improving evaluation practice as a whole — both for programmatic learning and for understanding theoretical mechanisms (as Levy-Paluck points out in her paper). I hope this is a discussion that keeps getting better and more focused on rigor and learning as a whole in evaluation, rather than quant v qual.

Advertisements

planning for qualitative data collection and analysis

this blog reflects conversations and on-going work with both mike frick (@mwfrick), shagun sabarwal (@shagunsabarwal), and urmy shukla (@urmy_shukla) — they should receive no blame if this blog is wacky and plenty of credit if it is not.

a recent post by monkey cage contributors on the washington post, then summarized by BITSS, asked/suggested whether “exploratory, qualitative, historical, and case-based research is much harder to present in a results-free manner, and perhaps impossible to pre-register.” this was just a brief point in their larger argument, which i agree with. but it seems worth pausing to consider whether it is likely to be true. below, i discuss research design for qualitative work, sampling considerations, and analysis itself.

throughout, i take a ‘pre-analysis plan’ to be a commitment on what the research will analyze and report on but not a constraint on doing analyses that are ‘off plan.’ rather, the researcher just needs to be explicit about what analyses were ‘on plan’ and which were not and there is a commitment to report everything that was ‘on plan’ – or why such reporting is infeasible.

my conclusion: a conversation on pre-analysis plans needs to distinguish whether planning is possible from whether planning is currently done. in general, I feel planning for analysis is possible when you plan to do analysis.

disclaimer: in this post, my reference to ‘qualitative research’ is to the slice of social science research that has to do with talking to and/or observing and/or participating with living people. i claim no knowledge on the analyses of historical manuscripts and a wide array of other qualitative research. by extension, i am mostly talking about planning for the collection and analysis of data from in-depth interviews, focus group discussion, and forms of (participant-) observation.

designing research: working with and asking questions of living people implies an ethics-review process, for which the research will have to lay out at least the ‘domains’ (aka themes, topics, categories) of information s/he hopes to observe and ask people about. usually, one does not get away with saying “i am just going to head over here, hang out, and see what i find.” this requires planning. like a pre-analysis plan, the domains for qualitative work can set up some bounds for the minimum of what will be collected and reported (“we will collect and analyze data on the following topics: x, y, z“), even if the final report is that a particular domain ended up being a flop because no one wanted to talk about it or it proved uninteresting for some reason.

some of the most famous ethnographies (say, nisa and tuhami) focus on a single person, often to try to give perspective on a larger culture — which may not be what the ethnographer initially set out to study. but the ethnographer can still tell you that (“i went to look at x, and here’s what i found — but i also found this really interesting person and that’s what the rest of the book is about”). so this does not seem inconsistent with the underlying logic of a plan, with the understanding that such a plan does not dictate everything that follows but does mandate that one reports why things changed.

which brings us to the nature of qualitative work: it is often iterative and the researcher often conducts data collection, entry, and analysis in parallel. analysis from an early set of interviews informs questions that are asked and observations attended to later on. this is one of the exciting (to me) elements of qualitative research, that you get to keep incorporating new learnings as you go along. this approach need not be inconsistent with having a set of domains that you intend to explore. within each, maybe the questions get sharper, deeper, or more elaborate over time. or maybe one planned domain turns out to be way too sensitive or way off-base. again, the researcher can report, relative to the initial plan, that this is what happened between designing the research and actually doing it.

sampling: certain aspects of qualitative research can be planned in advance. usually the aim is to be in some way representative. one way to aim for representation is to consider sub-groups of interest. in large-n analysis, the researcher may be able to hope that sufficient numbers of sub-groups will appear in the sample by default. in smaller-n analysis, more purposive sampling plans may be needed to be sure that different sub-groups are engaged in conversation. but, specifying sub-groups of interest can be done in advance — hence, plannable. but, at least some branches of qualitative research suggest that representativeness is about outputs rather than inputs — and that what the researcher is seeking is saturation (i am thinking of lincoln and guba here), which has implications for planning.

‘saturation’ relates to whether the researcher is starting to here the same answer over and over. in some cases, inputs are the determinant of representation — similar to the approach that can be taken in large-n work. let’s say that you want to get the input of the members of an elite government entity —  a particular committee with 20 people on it. fine, plan to talk to all of them. representativeness is here achieved by talking to all of relevant people (the whole population of interest) – and then finding someway of summarizing and analyzing the viewpoints of all of them, even if it’s 20 different viewpoints. there’s your sampling plan. (this may or may not be part of a pre-analysis plan, depending how that is defined and at what stage of the research process it is required. i take these to be open questions.)

for less clearly bounded groups that nevertheless have clear characteristics and may be expected to think or behave differently — let’s say men versus women, older versus younger people, different economic quantiles, different ethnic groups, whatever — then planning for saturation may look more like: plan to talk to men until we start getting repeated answers on key questions or interest or conduct focus groups that are internally homogenous with respect to ethnicity until we start to hear similar answers within each ethnicity (because it may be different numbers within each). that is, if representativeness is focused on output, then it is insufficient to plan at the beginning “we will do two focus groups in each village in which we collect data.” the researcher can specify the sub-groups of interest but probably not number of interviews, focus groups, or hours of observation required.

i make this point for two reasons. first, a pre-analysis plan for qualitative work should plan for iteration between considering what has been collected and whether more questions are necessary to make sense of the phenomena of interest. this makes it different in practice than a quantitative plan but the underlying principle holds. second, a pre-analysis plan, if it covers sampling, probably cannot plan for specific numbers of inputs unless the population is clearly bounded (like the committee members). rather, the plan is to aim for saturation within each sub-group of interest.

analysis: finally! in general, i feel more emphasis in a moving-towards-mixed-methods world needs to be put on analysis of qualitative inputs (and incorporation of those inputs into the larger whole). my hunch is that part of why people think planning for analysis of qualitative work may difficult is because, often, people don’t plan to ‘analyze’ qualitative data. instead, perhaps, the extent of the plan is to collect data. and then they plan to find a good quote or story (“anecdata”) — which may raise some questions about whether social science research is being done. not planning for analysis can limit one’s ability to set out a plan for analysis. this is different than saying that planning is not possible — there’s plenty of books on qualitative data analysis (e.g. here, here, here, and many others). here are some things that can be planned in advance:

  1. how will interviews and field notes be transcribed? verbatim?
  2. how will notes or transcripts be stored and analyzed? handwritten? in excel (my tendency)? using software such as NVivo?
  3. are you planning to be guided by theory or be guided by the data in your analysis? for example, maybe the domains you initially planned to study were themselves from some theory or framework about how the world works. this suggests that your analysis may be theory-driven and you may at least start with (promise to) close-code your data, looking for instances and examples of particular theoretical constructs in the transcripts and field notes and “code” (label) them as such.

or, maybe the work is far more exploratory and you set out to learn what people think and do, in general, about a particular topic. it’s more likely that you’ll be open-coding your data — looking for patterns that emerge (ideas that are repeatedly raised). and it’s likely you’ll have some idea in advance that that is what you intend to do. even if you start out closed-coding, it may turn out that a whole lot of your data end up falling outside the initially planned theoretical framework. fine. that doesn’t mean that you can’t report on what did fit in the framework (=plan) and then analyze all that interesting stuff that happened outside it as well. which, i think, is why are talking about pre-analysis plans rather than pre-analysis straitjackets. to close, in discussing whether pre-analysis plans for qualitative research — in the sense of talking to and watching living people, perhaps as part of a mixed-methods research agenda — are feasible, I hope the conversation is guided by whether planning is indeed possible in the social sciences as opposed to whether such planning currently takes place.

Anecdotes and simple observations are dangerous; words and narratives are not.

*this blog post was also cross-posted on people, spaces, deliberation, including as one of the top 10 posts of 2014. In a recent blog post on stories, and following some themes from an earlier talk by Tyler Cowen, David Evans ends by suggesting: “Vivid and touching tales move us more than statistics. So let’s listen to some stories… then let’s look at some hard data and rigorous analysis before we make any big decisions.” Stories, in this sense, are potentially idiosyncratic and over-simplified and, therefore, may be misleading as well as moving. I acknowledge that this is a dangerous situation. However, there are a couple things that are frustrating about the above quote, intentional or not.

.

  • First, it equates ‘hard data’ with ‘statistics,’ as though qualitative (text/word) data cannot be hard (or, by implication, rigorously analysed). Qualitative work – even when producing ‘stories’ – should move beyond mere anecdote (or even journalistic inquiry).
  • Second, it suggests that the main role of stories (words) is to dress up and humanize statistics – or, at best, to generate hypotheses for future research. This seems both unfair and out-of-step with increasing calls for mixed-methods to take our understanding beyond ‘what works’ (average treatment effects) to ‘why’ (causal mechanisms) – with ‘why’ probably being fairly crucial to ‘decision-making’ (Paluck’s piece worth checking out in this regard).

.

In this post, I try to make the case that there are important potential distinctions between anecdotes and stories/narratives that are too often overlooked when discussing qualitative data, focusing on representativeness and the counterfactual. Second, I suggest that just because many researchers do not collect or analyse qualitative work rigorously does not mean it cannot (or should not) be done this way. Third, I make a few remarks about numbers.

.

As a small soapbox and aside, even calls for mixed-methods for making causal claims give unnecessary priority to quantitative data and statistical analysis for making causal claims, in my opinion. A randomized-control trial – randomizing who gets a treatment and who will remain in the comparison group – is a method of assigning treatment. It doesn’t *necessarily* imply what kind of data will be collected and analyzed within that framework.

.

Anecdotes, narratives and stories As to the danger of stories, what Evans, Cowen, and others (partly) caution against is believing, using or being seduced by anecdotes – stories from a single point of view. Here I agree – development decisions (and legislative and policy decisions more generally) have too often been taken on the basis of a compelling anecdote. But not all stories are mere anecdotes, though this is what is implied when ‘hard data’ are equated with ‘statistics’ (an equation that becomes all the more odd when, say, the ‘rigorous’ bit of the analysis is referred to as the ‘quantitative narrative’).

.

Single stories from single points in time in single places – anecdotes – are indeed potentially dangerous and misleading. Anecdotes lack both representativeness and a counterfactual – both of which are important for making credible (causal) claims and both of which are feasible to accomplish with qualitative work. As revealed with the phrase ‘quantitative narrative,’ humans respond well to narratives – they help us make sense of things – the trick is to tell them from as many perspectives as possible to not un-mess the messiness too far.

.

Representitiveness: It is clear from the growing buzz about external validity that we need to be cautious of even the most objective and verifiable data analysed in the very most rigorous and internally valid way because it simply may not apply elsewhere (e.g. here and here). Putting this concern aside for a moment, both qualitative and quantitative data can be collected to be as representative of a particular time and place and circumstance as possible. I say more about this, below.

.

Counterfactuals: Cowen notes that many stories can be summed up as ‘a stranger came to town.’ True, to understand something causal about this (which is where anecdotes and tales following particular plot-lines can lead us astray), we would like to consider what would have happened if the stranger had not come to town and/or what happened in the town next door that the stranger by-passed. But those are still stories and they can be collected in multiple places, at multiple time points. Instead of dismissing it or using it only as window-dressing, we can demand more of qualitative data so that it can tell a multi-faceted, multi-perspectival, representative story.

.

That rigor thing Perhaps it seems that we have a clearer idea of how to be rigorous with collecting and analysing quantitative data. I don’t think this is necessarily true — but it does seem that many quant-focused researchers trying out mixing their methods for the first time don’t even bother to consider how to make the qualitative data more rigorous by applying similar criteria as they might to the quant part. This strikes me as very odd. We need to start holding qualitative data collection and analysis to higher standards, not be tempted to scrap it just because some people do it poorly. An excellent piece on this (though there are plenty of manuals on qualitative data collection and analysis) is by Lincoln and Guba.

.

They suggest that ‘conventional’ rigor addresses internal validity (which they take as ‘truth value’), external validity, consistency/replicability and neutrality. (The extent to which quantitative research in the social sciences fulfils all these criteria is another debate for another time.) They highlight the concept of ‘trustworthiness’ – capturing credibility, transferrability, dependability and confirmability  – as a counterpart to rigor in the quantitative social sciences. It’s a paper worth reading.

.

Regardless of what types of data are being collected, representativeness is important to being able to accommodate messiness and heterogeneity. If a research team uses stratification along several to select its sample for quantitative data collection (or intends to look at specific heterogeneities/sub-groups for the analysis), it boggles my mind why those same criteria are not used to select participants for qualitative data. Why does representativeness so often get reduced to four focus groups among men and four among women?

.

Equally puzzling, qualitative data are too often collected only in the ‘treated’ groups. Why does the counterfactual go out the window when we are discussing open-ended interview or textual data? Similarly, qualitative work has a counterpart to statistical power and sample size considerations: saturation. Generally, when the researcher starts hearing the same answers over and over, saturation is ‘reached.’ A predetermined number of interviews or focus groups does not guarantee saturation. Research budgets and timetables that take qualitative work seriously should start to accommodate that reality.  In addition, Lincoln and Guba suggest that length of engagement – with observations over time also enhancing representativeness – is critical to credibility.

.

The nature of qualitative work, with more emphasis on simultaneous and iterative data collection and analysis can make use of that time to follow up on leads and insights revealed over the study period. Also bizarre to me is that quant-focused researchers tend to spend much more time discussing data analysis than data collection and coding for quantitative stuff but then put absolutely all the focus (of the limited attention-slice qualitative gets) on collecting qualitative data and none into how those data are analysed or will be used. Too often, the report tells me that a focus group discussion was done and, if convenient, it is pointed out that the findings corroborate or ‘explain’ the numeric findings. Huh? If I am given no idea of the range of answers given (let’s say the counterpart of a minimum and a maximum) or how ‘common themes’ were determined, that thing that one person said in a focus group just becomes an anecdote with no real ‘place’ in the reporting of the results except as a useful aside.

.

One more thing on credibility – the equivalent of internal validity.Lincoln and Guba say that credibility *requires* using member-checks (stay tuned for a paper on this), which means sharing the results of the analysis back with those who provided the raw data so that interpretations can, at least in part, be co-constructed. This helps prevent off-the-mark speculation and situation analyses but also helps to breakdown the need to ‘represent’ people who ‘cannot represent themselves’ – as Said quotes from Marx. I’ve said a few things about this sort of shared interpretation here, recognizing that respondents’ perceptions will reflect the stories they tell themselves. That said, as development researchers increasingly look at nudging behavior, the stories (not-always-rational) actors tell themselves are potentially all the more important. We need to collect and present them well.

.

One key hurdle I see with enhancing the perceived rigor and non-anecdotal-ness of qualitative work is that it is hard to display the equivalent of descriptive statistics for textual/interviewin data. That doesn’t mean we shouldn’t try. In addition, it is more difficult and unwieldy to share (even ‘cleaned’) qualitative data than the quantitative equivalent, as increasingly happens to allow for replication. Still, if this would enhance some of credibility of the multifaceted stories revealed by these data, it is worth pushing this frontier.

.

Numbers aren’t always clean In terms of stories we tell ourselves, one is that data are no longer messy (and, often by implication, are clean, hard, ‘true’) because they fit in a spreadsheet. Everything that happened in the field, all the surveyors’ concerns about certain questions or reports of outright lying all often seem to fade from view as soon as the data make it into a spreadsheet. If you ask a farmer how many chickens he has and he gives you a story about how he had 5, 2 got stolen yesterday but his brother will give him 4 tomorrow, regardless of what number the enumerator records, the messiness has been contained for the analyst but not in the reality of the farmer that is meant to be represented.

.

In general, if we want to talk about creating credible, causal narratives that can be believed and that can inform decision-making at one or more levels, we need to talk about (a) careful collection of all types of data and (b) getting better at rigorously analysing and incorporating qualitative data into the overall ‘narrative’ for triangulating towards the ‘hard’ truth, not equating qualitative data with anecdotes.

here’s an idea — if they are trying to tell you something, make it easy for them to do so.

there’s been a good deal of press around the unfortunately insignificant results of a major HIV prevention trial with products for women in south africa, uganda and zimbabwe. the results had little to do with efficacy of the products (a pill and a gel) but rather with the fact that most of the participating women did not use the treatments as recommended – or at all.

one potential response is to improve our behavioral interventions to support adherence to treatment regimens (and prevention regimens) and integrate these methods more directly into medication trials. adherence and persistence with medication are global problems and we are just beginning to learn – with the help of health psychology and behavioral economics – how to tackle the challenge. efforts so far include high- and low-tech solutions, though not all the promises of the former, in terms of mhealth to facilitate behavior change and adherence, have yet been borne out.

another, not mutually exclusive, response would be to actually ask the women what they would like to see and use in the way of HIV prevention – a tool which should be empowering for them. the press seems full of comments like “the women are trying to tell us something!” why does it seem that, then, for a product made for them, they have to work so hard to tell us those things? why are we not hearing more sentences that start “the women told us…” that is, why, after such a big trial, am i not hearing anything about on-going qualitative and observational follow-up efforts to learn more about what exactly didn’t work about the methods offered to women?

there’s often a lot to learn from null results (that’s science, right?) but it doesn’t just come from brainstorming what went wrong. asking helps.

i don’t suggest that people are perfectly prescient about what they need or want. often, the innovations that we can’t live without now – smartphones, for example – weren’t a need or even desire that most people could have articulated 20 years ago. as such, directly asking people what it would take to get them to engage in X desirable behavior can’t determine the research agenda. but it should certainly be part of figuring it out.

i know – but couldn’t we have asked, too?

things i know & understand:

  • this is cool research from @poverty_action
  • it’s fun and important to show and know that ‘we’ (development types) have poor assumptions about what will work
  • time and budgets are always constrained
  • this comment is super-predictable coming from me. but…

when i read this:

our results showed that although the consulting intervention caused short-term changes in business practices, these impacts dissipated within a year after the consulting ended. on average, we found no long-term benefit from the consulting, and actually lower short-term profits. we believe some business people hoped the advice would work and thus took it. but better bookkeeping and other business practices potentially took time away from the physical act of sewing clothes. once profits took a hit, enterprise owners likely abandoned the practices and reverted to their previous methods. [emphasis added]

i think to myself, ‘heather, why is it that you so rarely read:’

  • the respondents believed this program did/not have the intended effect because…” (this would be based on more qualitative research during our at the end of the study)
  • or, something along the lines of, “the respondents felt that future efforts could be improved by…” (more of that open-ended goodness)
  • or, “from our observations during the implementation of the intervention, we believe what happened is…”
  • or, “the implementers feel that x happened, that z presented serious challenges, and y could improve it in the future”
  • or, radically, “the respondents (or implementers) feel that a better way to have approached the same goal would have been…”

i am well aware that absolutely none of these things will provide definitive – or even ‘true’ – answers about what worked and why. but surely other people’s opinions besides the reseachers’ count and should be collected, despite the extra collection and analysis time? wouldn’t it be fun to have more information about why we are seeing the treatment effect that we see? among other sources, see here.