Refereeing an academic paper

The below list is 100% taken from the following sources; my only contribution is to mix them up into a three-page document.

Nevertheless, may prove useful. Additions, of course, welcome.

  • Assume that no referee reports are truly anonymous.  It is fine to be critical but always be polite.
  • Skim the paper within a couple of days receiving the request- my metro rides are good for this – you can quickly tell whether this is a paper that is well below the bar for some obvious reason and can be rejected as quickly as possible.
    • Unless it is immediate junk, read the paper once and return to it a week later with deeper thoughts and a fresh mind.
    • Referee within one month.
  • Remember you are the referee, not a co-author. I hear a lot that young referees in particular write very long reports, which try and do way more than is needed to help make a paper clear, believable and correct. I think 2 pages or less is enough for most reports.
  • Your report should not assume that the editor has a working knowledge of the paper.
    • The first paragraph should summarize the contribution. Reviewers should provide a concise summary of the paper they review at the start of their report and then provide a critical but polite evaluation of the paper.
    • Explain why you recommend that the paper be accepted, rejected, or revised.
      • If you would like the editor to accept the paper, your recommendation must be strong. The more likely you think the paper is to merit a revision the more detailed should be the comments.
      • The referee report itself should not include an explicit editorial recommendation. That recommendation should be in a separate letter to the editor.
      • If you consistently recommend rejection, then the editor recognizes you are a stingy, overly critical person. Do not assume that the editor will not reveal your identity to the authors. In the long run, there are no secrets.
      • If you recommend acceptance of all papers, then the editor knows you are not a discriminating referee.

Possible considerations:

  • Research question and hypothesis:
    • Is the researcher focused on well‐defined questions?
    • Is the question interesting and important?
    • Are the propositions falsifiable?
    • Has the alternative hypothesis been clearly stated?
    • Is the approach inductive, deductive, or an exercise in data mining? Is this the right structure?
  • Research design:
    • Is the author attempting to identify a causal impact?
    • Is the “cause” clear? Is there a cause/treatment/program/fist stage?
    • Is the relevant counterfactual clearly defined? Is it compelling?
    • Does the research design identify a very narrow or a very general source of variation?
    • Could the question be addressed with another approach?
    • Useful trick: ask yourself, “What experiment would someone run to answer this question?”
  • Theory/Model:
    • Is the theory/model clear, insightful, and appropriate?
    • Could the theory benefit from being more explicit, developed, or formal?
    • Are there clear predictions that can be falsified? Are these predictions “risky” enough?
      • Does the theory generate any prohibitions that can be tested?
      • Would an alternative theory/model be more appropriate?
        • Could there be alternative models that produce similar predictions—that is, does evidence on the predictions necessarily weigh on the model or explanation?
      • Is the theory a theory, or a list of predictions?
      • Is the estimating equation clearly related to or derived from the model?
  • Data:
    • Are the data clearly described?
    • Is the choice of data well‐suited to the question and test?
    • Are there any worrying sources of measurement error or missing data?
    • Are there sample size or power issues?
    • How were data collected? Is recruitment and attrition clear?
      • Is it clear who collected the data?
      • If data are self-reported, is this clear?
      • Could the data sources or collection method be biased?
      • Are there better sources of data that you would recommend?
      • Are there types of data that should have been reported, or would have been useful or essential in the empirical analysis?
      • Is attrition correlated with treatment assignment or with baseline characteristics in any treatment arm?
  • Empirical analysis:
    • Are the statistical techniques well suited to the problem at hand?
    • What are the endogenous and exogenous variables?
    • Has the paper adequately dealt with concerns about measurement error, simultaneity, omitted variables, selection, and other forms of bias and identification problems?
    • Is there selection not just in who receives the “treatment”, but in who we observe, or who we measure?
    • Is the empirical strategy convincing?
    • Could differencing, or the use of fixed effects, exacerbate any measurement error?
    • Are there assumptions for identification (e.g. of distributions, exogeneity?)
      • Were these assumptions tested and, if not, how would you test them?
      • Are the results demonstrated to be robust to alternative assumptions?
      • Does the disturbance term have an interpretation, or is it just tacked on?
      • Are the observations i.i.d., and if not, have corrections to the standard errors been made?
      • What additional tests of the empirical strategy would you suggest for robustness and confidence in the research strategy?
      • Are there any dangers in the empirical strategy (e.g. sensitivity to identification assumptions)?
      • Is there potential for Hawthorne effects or John Henry-type biases?
  • Results:
    • Do the results adequately answer the question at hand?
    • Are the conclusions convincing? Are appropriate caveats mentioned?
    • What variation in the data identifies the elements of the model?
    • Are there alternative explanations for the results, and can we test for them?
    • Could the author have taken the analysis further, to look for impact heterogeneity, for causal mechanisms, for effects on other variables, etc?
    • Is absence of evidence confused with evidence of absence?
    • Are there appropriate corrections for multiple comparisons, multiple hypothesis testing?
  • Scope:
    • Can we generalize these results?
    • Has the author specified the scope conditions?
    • Have casual mechanisms been explored?
    • Are there further types of analysis that would illuminate the external validity, or the causal mechanism at work?
    • Are there other data or approaches that would complement the current one?

Share your thoughts, please! The more minds, the merrier

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s