planning for qualitative data collection and analysis

this blog reflects conversations and on-going work with both mike frick (@mwfrick), shagun sabarwal (@shagunsabarwal), and urmy shukla (@urmy_shukla) — they should receive no blame if this blog is wacky and plenty of credit if it is not.

a recent post by monkey cage contributors on the washington post, then summarized by BITSS, asked/suggested whether “exploratory, qualitative, historical, and case-based research is much harder to present in a results-free manner, and perhaps impossible to pre-register.” this was just a brief point in their larger argument, which i agree with. but it seems worth pausing to consider whether it is likely to be true. below, i discuss research design for qualitative work, sampling considerations, and analysis itself.

throughout, i take a ‘pre-analysis plan’ to be a commitment on what the research will analyze and report on but not a constraint on doing analyses that are ‘off plan.’ rather, the researcher just needs to be explicit about what analyses were ‘on plan’ and which were not and there is a commitment to report everything that was ‘on plan’ – or why such reporting is infeasible.

my conclusion: a conversation on pre-analysis plans needs to distinguish whether planning is possible from whether planning is currently done. in general, I feel planning for analysis is possible when you plan to do analysis.

disclaimer: in this post, my reference to ‘qualitative research’ is to the slice of social science research that has to do with talking to and/or observing and/or participating with living people. i claim no knowledge on the analyses of historical manuscripts and a wide array of other qualitative research. by extension, i am mostly talking about planning for the collection and analysis of data from in-depth interviews, focus group discussion, and forms of (participant-) observation.

designing research: working with and asking questions of living people implies an ethics-review process, for which the research will have to lay out at least the ‘domains’ (aka themes, topics, categories) of information s/he hopes to observe and ask people about. usually, one does not get away with saying “i am just going to head over here, hang out, and see what i find.” this requires planning. like a pre-analysis plan, the domains for qualitative work can set up some bounds for the minimum of what will be collected and reported (“we will collect and analyze data on the following topics: x, y, z“), even if the final report is that a particular domain ended up being a flop because no one wanted to talk about it or it proved uninteresting for some reason.

some of the most famous ethnographies (say, nisa and tuhami) focus on a single person, often to try to give perspective on a larger culture — which may not be what the ethnographer initially set out to study. but the ethnographer can still tell you that (“i went to look at x, and here’s what i found — but i also found this really interesting person and that’s what the rest of the book is about”). so this does not seem inconsistent with the underlying logic of a plan, with the understanding that such a plan does not dictate everything that follows but does mandate that one reports why things changed.

which brings us to the nature of qualitative work: it is often iterative and the researcher often conducts data collection, entry, and analysis in parallel. analysis from an early set of interviews informs questions that are asked and observations attended to later on. this is one of the exciting (to me) elements of qualitative research, that you get to keep incorporating new learnings as you go along. this approach need not be inconsistent with having a set of domains that you intend to explore. within each, maybe the questions get sharper, deeper, or more elaborate over time. or maybe one planned domain turns out to be way too sensitive or way off-base. again, the researcher can report, relative to the initial plan, that this is what happened between designing the research and actually doing it.

sampling: certain aspects of qualitative research can be planned in advance. usually the aim is to be in some way representative. one way to aim for representation is to consider sub-groups of interest. in large-n analysis, the researcher may be able to hope that sufficient numbers of sub-groups will appear in the sample by default. in smaller-n analysis, more purposive sampling plans may be needed to be sure that different sub-groups are engaged in conversation. but, specifying sub-groups of interest can be done in advance — hence, plannable. but, at least some branches of qualitative research suggest that representativeness is about outputs rather than inputs — and that what the researcher is seeking is saturation (i am thinking of lincoln and guba here), which has implications for planning.

‘saturation’ relates to whether the researcher is starting to here the same answer over and over. in some cases, inputs are the determinant of representation — similar to the approach that can be taken in large-n work. let’s say that you want to get the input of the members of an elite government entity —  a particular committee with 20 people on it. fine, plan to talk to all of them. representativeness is here achieved by talking to all of relevant people (the whole population of interest) – and then finding someway of summarizing and analyzing the viewpoints of all of them, even if it’s 20 different viewpoints. there’s your sampling plan. (this may or may not be part of a pre-analysis plan, depending how that is defined and at what stage of the research process it is required. i take these to be open questions.)

for less clearly bounded groups that nevertheless have clear characteristics and may be expected to think or behave differently — let’s say men versus women, older versus younger people, different economic quantiles, different ethnic groups, whatever — then planning for saturation may look more like: plan to talk to men until we start getting repeated answers on key questions or interest or conduct focus groups that are internally homogenous with respect to ethnicity until we start to hear similar answers within each ethnicity (because it may be different numbers within each). that is, if representativeness is focused on output, then it is insufficient to plan at the beginning “we will do two focus groups in each village in which we collect data.” the researcher can specify the sub-groups of interest but probably not number of interviews, focus groups, or hours of observation required.

i make this point for two reasons. first, a pre-analysis plan for qualitative work should plan for iteration between considering what has been collected and whether more questions are necessary to make sense of the phenomena of interest. this makes it different in practice than a quantitative plan but the underlying principle holds. second, a pre-analysis plan, if it covers sampling, probably cannot plan for specific numbers of inputs unless the population is clearly bounded (like the committee members). rather, the plan is to aim for saturation within each sub-group of interest.

analysis: finally! in general, i feel more emphasis in a moving-towards-mixed-methods world needs to be put on analysis of qualitative inputs (and incorporation of those inputs into the larger whole). my hunch is that part of why people think planning for analysis of qualitative work may difficult is because, often, people don’t plan to ‘analyze’ qualitative data. instead, perhaps, the extent of the plan is to collect data. and then they plan to find a good quote or story (“anecdata”) — which may raise some questions about whether social science research is being done. not planning for analysis can limit one’s ability to set out a plan for analysis. this is different than saying that planning is not possible — there’s plenty of books on qualitative data analysis (e.g. here, here, here, and many others). here are some things that can be planned in advance:

  1. how will interviews and field notes be transcribed? verbatim?
  2. how will notes or transcripts be stored and analyzed? handwritten? in excel (my tendency)? using software such as NVivo?
  3. are you planning to be guided by theory or be guided by the data in your analysis? for example, maybe the domains you initially planned to study were themselves from some theory or framework about how the world works. this suggests that your analysis may be theory-driven and you may at least start with (promise to) close-code your data, looking for instances and examples of particular theoretical constructs in the transcripts and field notes and “code” (label) them as such.

or, maybe the work is far more exploratory and you set out to learn what people think and do, in general, about a particular topic. it’s more likely that you’ll be open-coding your data — looking for patterns that emerge (ideas that are repeatedly raised). and it’s likely you’ll have some idea in advance that that is what you intend to do. even if you start out closed-coding, it may turn out that a whole lot of your data end up falling outside the initially planned theoretical framework. fine. that doesn’t mean that you can’t report on what did fit in the framework (=plan) and then analyze all that interesting stuff that happened outside it as well. which, i think, is why are talking about pre-analysis plans rather than pre-analysis straitjackets. to close, in discussing whether pre-analysis plans for qualitative research — in the sense of talking to and watching living people, perhaps as part of a mixed-methods research agenda — are feasible, I hope the conversation is guided by whether planning is indeed possible in the social sciences as opposed to whether such planning currently takes place.


3 thoughts on “planning for qualitative data collection and analysis

  1. Jessica says:

    So what might a pre-analysis plan look like for an exploratory assessment of the social, political and economic context surrounding high infant and maternal mortality rates (including the data discrepancies therein) within a given district, to develop a strategy for building government capacity for evidence-informed decision making? Hypothetically speaking, of course.

    Although I don’t entirely appreciate being disabused of the “going places, hanging out and seeing what I find” approach. So thanks for that.


    • quick reply, to be followed by longer reply: IRBs sort of take away the possibility of ‘just hanging out’ if you want to publish much of what you find (use in presentations, etc etc). in terms of exploring data discrepancies, don’t rule out the idea of putting the mystery right in front of people as one of the questions. “we’re outsiders, we’re stumped, please help.” and, of course, my take on evidence-informed decision-making is that it also helps researchers to be informed about what evidence decision-makers think will be convincing… so i’d also build that in somewhere.


Share your thoughts, please! The more minds, the merrier

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s