there is currently a good deal of attention on transparency of social science research – as there should be. much of this is focused on keeping the analysis honest, including pre-analysis plans (e.g.) and opening up data for re-analysis (internal replication, e.g. here and here). some of this will hopefully receive good discussion at an upcoming conference on research transparency, among other fora.
but, it seems at least two points are missing from this discussion, both focused on the generation of the analyzed data itself.
intervention description and external replication
first: academic papers in “development” rarely provide a clear description of the contents of an intervention / experiment, such that it could be, plausibly, reproduced. growing up with a neuroscientist / physiological psychologist (that’s my pop), i had the idea that bench scientists had this part down. everyone (simultaneously researchers and implementers) has lab notebooks and they take copious notes. i know because I was particularly bad at that part when interning at the lab.*
then, the researchers report on those notes: for example, on the precise dimensions of a water maze they built (to study rodent behavior in stressful situations) and gave you a nice diagram so that you could, with a bit of skill, build your own version of the maze and follow their directions to replicate the experiment.
pop tells me i am overly optimistic on the bench guys getting this totally right. he agrees that methods sections are meant to be exact prescriptions for someone else to reproduce your study and its results. for example, they are very detailed on exactly how you ran the experiment, description of the apparatus used , where reagents (drugs) were purchased from, etc. he also notes that one thing that makes this easier in bench science is that “most experimental equipment is purchased from a manufacturer which means others can buy exactly the same equipment. gone are the dark days when we each made our own mazes and such. reagents are from specific suppliers who keep detailed records on the quality of each batch…”
then he notes: “even with all this, we have found reproducibility to be sketchy, often because the investigators are running a test for the first time. a reader has to accept that whatever methodological details were missed (your grad student only came in between 1 and 3AM when the air-conditioning was off) were not critical to the results.” or maybe this shouldn’t go unreported and accepted.
the basic idea holds in and out of the lab: process reporting on the intervention/treatment needs to get more detailed and more honest. without it, the reader doesn’t really understand what the ‘beta’ in any regression analysis means – and with any ‘real world’ intervention, there’s a chance that beta contains a good deal of messiness, mistakes, and iterative learning resulting in tweaks over time.
as pop says: “an investigator cannot expect others to accept their results until they are reproduced by other researchers.” and the idea that one can reproduce the intervention in a new setting (externally replicate) is a joke unless detailed notes are kept about what happens on a daily or weekly basis with implementation and, moreover, these notes are made available. if ‘beta’ contained some things at one time in a study and a slightly different mix at a different time, shouldn’t this be reported? if research assistants don’t / can’t mention to their PIs when things get a bit messy in ‘the field’, and PIs in turn don’t report glitches and changes to their readers or other audiences, then there’s a problem.
coding and internal replication
as was raised not-so-long-ago by the nice folks over at political violence at a glance, the cleaning and coding of data for analysis is critical to interpretation – and therefore critical to transparency. there is not enough conversation happening about this – with “this,” in large part, being about construct validity. there are procedures for coding, usually involving independent coders working with the same codebook and then doing a check for inter-rater reliability. and reporting the resultant kappa or other relevant statistic. the reader really shouldn’t be expected to believe the data otherwise, on the whole “shit in, shit out” principle.
in general, checks on data that i have seen relate to double-entry of data. this is important but hardly sufficient to assure the reader that the findings reported are reasonable reflections of the data collected and the process that generated them. the interpretation of the data prior to the analysis – that is, coding and cleaning — is critical, as pointed out by political violence at a glance, for both quantitative and qualitative research. and, if we are going to talk about open data for reanalysis, it should be the raw data, so that it can be re-coded as well as re-analyzed.
in short, there’s more to transparency in research than allowing for internal replication of a clean dataset. i hope the conversation moves in that direction — the academic, published conversation as well as the over-beers conversation.
*i credit my background in anthropology, rather than neuroscience, with getting better with note-taking. sorry, pop.