small thoughts on transparency in research (descriptions of methods, analysis)

there is currently a good deal of attention on transparency of social science research – as there should be. much of this is focused on keeping the analysis honest, including pre-analysis plans (e.g.) and opening up data for re-analysis (internal replication, e.g. here and here). some of this will hopefully receive good discussion at an upcoming conference on research transparency, among other fora.

but, it seems at least two points are missing from this discussion, both focused on the generation of the analyzed data itself.

 

intervention description and external replication

first: academic papers in “development” rarely provide a clear description of the contents of an intervention / experiment, such that it could be, plausibly, reproduced. growing up with a neuroscientist / physiological psychologist (that’s my pop), i had the idea that bench scientists had this part down. everyone (simultaneously researchers and implementers) has lab notebooks and they take copious notes. i know because I was particularly bad at that part when interning at the lab.*

then, the researchers report on those notes: for example, on the precise dimensions of a water maze they built (to study rodent behavior in stressful situations) and gave you a nice diagram so that you could, with a bit of skill, build your own version of the maze and follow their directions to replicate the experiment.

pop tells me i am overly optimistic on the bench guys getting this totally right. he agrees that methods sections are meant to be exact prescriptions for someone else to reproduce your study and its results. for example, they are very detailed on exactly how you ran the experiment, description of the apparatus used , where reagents (drugs) were purchased from, etc. he also notes that one thing that makes this easier in bench science is that “most experimental equipment is purchased from a manufacturer which means others can buy exactly the same equipment. gone are the dark days when we each made our own mazes and such. reagents are from specific suppliers who keep detailed records on the quality of each batch…”

then he notes: “even with all this, we have found reproducibility to be sketchy, often because the investigators are running a test for the first time. a reader has to accept that whatever methodological details were missed (your grad student only came in between 1 and 3AM when the air-conditioning was off) were not critical to the results.” or maybe this shouldn’t go unreported and accepted.

the basic idea holds in and out of the lab: process reporting on the intervention/treatment needs to get more detailed and more honest. without it, the reader doesn’t really understand what the ‘beta’ in any regression analysis means – and with any ‘real world’ intervention, there’s a chance that beta contains a good deal of messiness, mistakes, and iterative learning resulting in tweaks over time.

as pop says: “an investigator cannot expect others to accept their results until they are reproduced by other researchers.” and the idea that one can reproduce the intervention in a new setting (externally replicate) is a joke unless detailed notes are kept about what happens on a daily or weekly basis with implementation and, moreover, these notes are made available. if ‘beta’ contained some things at one time in a study and a slightly different mix at a different time, shouldn’t this be reported? if research assistants don’t / can’t mention to their PIs when things get a bit messy in ‘the field’, and PIs in turn don’t report glitches and changes to their readers or other audiences, then there’s a problem.

 

coding and internal replication

as was raised not-so-long-ago by the nice folks over at political violence at a glance, the cleaning and coding of data for analysis is critical to interpretation – and therefore critical to transparency. there is not enough conversation happening about this – with “this,” in large part, being about construct validity. there are procedures for coding, usually involving independent coders working with the same codebook and then doing a check for inter-rater reliability. and reporting the resultant kappa or other relevant statistic. the reader really shouldn’t be expected to believe the data otherwise, on the whole “shit in, shit out” principle.

in general, checks on data that i have seen relate to double-entry of data. this is important but hardly sufficient to assure the reader that the findings reported are reasonable reflections of the data collected and the process that generated them. the interpretation of the data prior to the analysis – that is, coding and cleaning — is critical, as pointed out by political violence at a glance, for both quantitative and qualitative research. and, if we are going to talk about open data for reanalysis, it should be the raw data, so that it can be re-coded as well as re-analyzed.

 

in short, there’s more to transparency in research than allowing for internal replication of a clean dataset. i hope the conversation moves in that direction — the academic, published conversation as well as the over-beers conversation.

 

*i credit my background in anthropology, rather than neuroscience, with getting better with note-taking. sorry, pop.

To Get More Out of Science, Show the Rejected Research

Originally posted on Berkeley Initiative for Transparency in the Social Sciences:

CARL WIENS

CARL WIENS

In a recent opinion piece on the New York Times news portal the Upshot, Brendan Nyhan, an assistant professor of government at Dartmouth College, comments on a host of transparency related issues.

Closely echoing the mission of BITSS, Nyhan identifies the potential of research transparency to improve the rigor and ultimately the benefits of federally funded scientific research writing:

The problem is that the research conducted using federal funds is driven — and distorted — by the academic publishing model. The intense competition for space in top journals creates strong pressures for novel, statistically significant effects. As a result, studies that do not turn out as planned or find no evidence of effects claimed in previous research often go unpublished, even though their findings can be important and informative.

View original 335 more words

COS Now Offering Free Consulting Services

Originally posted on Berkeley Initiative for Transparency in the Social Sciences:

A close partner of BITSS, the Center for Open Science (COS) has launched a free consulting service to anyone seeking help with “statistical and methodological questions related to reproducible practices, research design, data analysis, and data management.”

The Center is dedicated to increasing the “openness, integrity, and reproducibility of scientific research” and is looking to advance its mission through a more hands-on approach. Those with methodological questions can email stats-consulting@cos.io for free assistance from computer and data scientists trained in reproducibility and advanced research methods. If a question is too complicated to be answered via email, researchers can schedule a Google Hangout with a COS consultant to have their questions answered in real time. Visit the COS Google Calender for availability.

The Center also offers online and on-site workshops for those seeking to gain a greater understanding of open research topics and tools. For more information on the details of COS’s services…

View original 9 more words

sherlock holmes and the princess bride

when i need to decompress, zone out, and fall asleep, i have a tendency to watch familiar, i-can-recite-by-heart movies or TV shows to help. just enough distraction to close down my other thoughts, not gripping enough to keep me awake.

the ‘sherlock‘ TV series has joined these ranks. the more i watch ‘a study in pink‘ (if i stay awake to the near-end)  the more i have to ask whether i, during the scene of selecting the good or bad bottle, could have resisted making the cab driver a sicilian (or making another princess bride reference).

props to the screenwriters on avoiding the temptation.

theories of change, stakeholders, imagined beneficiaries, & stealing from product design. that is, meet ‘mary.’

this post is also available, lightly edited, here.

i have been thinking a lot about ‘theories of change’ this week (just did some presenting on them here!). actually, i have been thinking more about ‘conceptual models,’ which was the term by which i was first introduced to the general idea (via vic strecher in conceptual models 101) and the term i still prefer because it implies more uncertainty and greater scope for tinkering than does ‘theory.’ (i accept that ‘theory of change‘ has been branded and that i have to live with it but i don’t have to like it. when people start calling them “tocks,” it’ll be a really, really bad day. i can deal with the acronym “ToCs” but please, world, don’t pronounce it “tocks” or switch to writing “tox” or something else dreadful.)

regardless of the term, the approach of thinking seriously about how behavioral, social and economic change will happen is really important — and often overlooked during the planning stages of both projects/programs/policies and evaluations. (too often, the intricacies of how change actually happened (or didn’t) are left to academic speculation in the discussion section of an evaluation paper — a certainly not informed by talking systematically to those people who were intended to benefit from the program).

i think there is growing recognition that building a theory of change is something that should happen, at least in part, backwards (among other places where this is discussed is in ‘evidence-based policy‘ with the idea of a ‘pre-mortem‘ and ‘thinking step-by-step and thinking backwards‘). that is, you start with the end goal (usually some variant of ‘peace,’ ‘satisfaction,’ ‘wellbeing,’ ‘capabilities,’* etc) in mind and work backwards as to how you are going to get there. actually, it’s a bit more like the transcontinental railroad, where you start from both ends (where you are and where you want to get) and build backwards and forwards until the ideas meet in the middle and you have a sense of what needs to be done and what assumptions underlie one step translating to the next.

in teaching us about not only conceptual models but grant writing, vic used the analogy of an island. the island was where you wanted to get — the state of the world as things would be once your intervention was rolled-out, fully operational and lasting change affected. it wasn’t enough to just say that people would have more money or would be healthier. you had to describe how the state of the world would look, feel, and operate. how would someone’s day look in the new state of the world? what would be different about the way they undertook their daily activities, or indeed what their daily activities would be? then, once you had the new state of the world/island in mind, you could make sense of where you were currently (through one of those ex anteneeds assessment‘ things i so rarely hear about in planning development projects or building theories of change) and what needed to be done to build a bridge from where you are to the island.

some of this work in understanding where people are and where ‘they,’ and therefore, ‘we’ want to get is meant to be generated through the nebulous terms “stakeholder engagement” and “formative work.” i think we discuss much less how formative engagement and stakeholder work (probably not a great sign of specificity that all the words can be mixed up so easily) actually translates into a robust theory of change. in this regard, i have learnt quite a bit from product and engineering books like the inmates are running the asylum. these are books about product and service design and the ‘user experience’ — far-out concepts we probably (almost certainly) don’t spend enough time thinking about in ‘development’ and something that would probably really benefit our theories of change in detailed and ‘best-fitting’ a particular situation… not to mention, you know, benefit the beneficiaries.

one of the tools i like best is what is, effectively, imaginary prospective users — in cooper‘s terminology, ‘personas.’ here’s the idea, as i see it translating to development and theories of change. we know stakeholders are important but they cannot (realistically or effectively) all be in the same room, at the same table, at the same time. nor can they all be called up each time we make a small tweak in program design or the underlying assumptions. and, it is likely the intended beneficiaries that are hardest to call up and the most likely not to be at the table. but we can use personas to bring them to the table, so that what happened in ‘the field’ most certainly does not stay there.

let’s say that for a given project and evaluation, widowed women are a key sub-group of interest.

forget widowed women.

start thinking about “mary.”

mary is a widowed woman.

her husband had been a carpenter and died of c cause. she lives in x place while her n children live in z other places and provide her with s amount of support. mary can be a composite of widowed women you did meet in the field during deep, household level needs assessment and formative in-depth interviews with intended beneficiaries. that’s how you might have a picture of mary and know that she lives in h type of house, with e regular access to electricity and have g goats and l other livestock. it’s how you know she’s illiterate and has a mobile phone onto which she never adds credit. it’s how you know what time she wakes up, what her morning chores are, who she talks to, when and whether she has time to go to the market, how she gets her information, what aspects of her environment will enable change and which will hinder it, and so on.

so, all potential beneficiaries can’t be at the table but personas of key subgroups and heterogeneities of interest can be. if everyone in the room for the design (intervention and evaluation) process is introduced to the personas, then they can speak up for mary. she still gets a voice and the ability to ask, ‘what’s in all this for me?’ will she be able to deal with an extra goat if she gets one as part of a livestock program? does she have the means of transport to collect cash as part of a transfer program? is her neighborhood safe for walking so she can follow up on the health information you provide? is mary going to give a hoot about the sanitation information you provide her?

mary’s obstacles need to be dealt with in your program design and the places where mary might have trouble engaging with the program need to be put into your theory of change and monitored as part of your M&E (& e) plan. will mary help you think everything? no, of course not — she’s good but she’s not that good. but it’ll probably be nearer to something that can actually work (and don’t forget that street-level workers, other implementers and high-level stakeholders should have personas too!).

please invite mary to the table when you’re designing your intervention and constructing your theory of change. it doesn’t replace the need for actual monitoring and actually asking for beneficiary, implementer and stakeholder feedback.

but have mary describe to you how her life will be different (better!) with your program in place, how the actual structure of her day and decision-making have changed now that she’s on the aforementioned goal island. you’ll be a little closer to making it so.

this post is massively indebted to danielle giuseffi, who introduced me to some of the books above and with whom i have discussed building models more than anyone else! still one of my favorite business-partners-in-waiting, d-funk, and i still like our behavioral bridge.

*yes, i know that ‘capabilities’ were initially from amartya sen and that i should have linked to this. but for planning approaches, i find the 10 laid out by nussbaum more accessible.

ebola and public health ethics (ebolaethics?)

via reuters, KFF recently posted a short article about the ethics of giving experimental treatment to the ‘hero doctor’ Sheik Umar Khan — and, really, to any of the trained health professionals who continue to show up to work even though they were at very real risk in Sierra Leone, in Liberia, in Nigeria and, now, contemporaneously but apparently unrelatedly, DRC.

there’s a complex of issues at stake, here, around rationing a scarce (and experimental) resource when need is greater than supply. i only going to wade in on the one issue i feel comfortable putting a point on; questions of ethics related to the nationality of who has been treated and where they were treated, or the ethics of using an experimental drug once other options were exhausted are important issues — but beyond the scope of this post.

this post is specifically about priority-setting for who gets treated: those who work in health care and those who do not. whether human resources for health (HRH) — Dr. Khan, medical volunteers from abroad, Sierra Leonean nurses, etc —  should be given priority in the face of a health crisis raises the Kantian issue of whether people should be used as means for ends (with Kant saying “no.”). if HRH are prioritized because of their ability to save more lives by having their own life saved, it is because they are being viewed as a means to an end — namely, the end of potentially saving more lives. this does not, inherently, treat all individuals of being of equal moral worth.

the case above at least keeps things within the “sphere” of health, rather than raising questions about favoring saving the young versus the old, or the powerful (say, the president or prime minister) over other citizens in the face of disaster. these too are difficult questions.

the debate will continue about whether and how Dr. Khan and others should have been treated with the experimental ZMapp, of which there is expected to be a long-term (rather than a quickly resolved) scarcity relative to need. demand significantly and dauntingly and heartbreakingly outweighs supply. but the question of whether Dr. Khan — or other HRH still fighting the fight — should be given priority to receive the treatment needs to be answered along several lines: should HRH be given preference? if so, should that preference be given based on whether, once cured, they will continue to treat patients? what happens once there are no more treatments to continue treating with — does the prioritization scheme shift?

this is a separate — but important — set of questions from the more politically charged question of the nationalities of recipients of the limited supply of ZMapp. i suspect these questions of power and race and exploitation, of where drugs were developed and where they were tested and who paid for them in what ways, will dominate the discourse. but other questions of ethics, desert, scarcity, and priority-setting require consideration for dealing with the present crisis and planning for future outbreaks, as well as strengthening health systems and equipping them to make decisions more generally.

further reading on these topics include: Norm Daniels, Frances Kamm, and Dan Brock. i hope these scholars will discuss and debate these issues in the near future.