Refereeing an academic paper

The below list is 100% taken from the following sources; my only contribution is to mix them up into a three-page document.

Nevertheless, may prove useful. Additions, of course, welcome.

  • Assume that no referee reports are truly anonymous.  It is fine to be critical but always be polite.
  • Skim the paper within a couple of days receiving the request- my metro rides are good for this – you can quickly tell whether this is a paper that is well below the bar for some obvious reason and can be rejected as quickly as possible.
    • Unless it is immediate junk, read the paper once and return to it a week later with deeper thoughts and a fresh mind.
    • Referee within one month.
  • Remember you are the referee, not a co-author. I hear a lot that young referees in particular write very long reports, which try and do way more than is needed to help make a paper clear, believable and correct. I think 2 pages or less is enough for most reports.
  • Your report should not assume that the editor has a working knowledge of the paper.
    • The first paragraph should summarize the contribution. Reviewers should provide a concise summary of the paper they review at the start of their report and then provide a critical but polite evaluation of the paper.
    • Explain why you recommend that the paper be accepted, rejected, or revised.
      • If you would like the editor to accept the paper, your recommendation must be strong. The more likely you think the paper is to merit a revision the more detailed should be the comments.
      • The referee report itself should not include an explicit editorial recommendation. That recommendation should be in a separate letter to the editor.
      • If you consistently recommend rejection, then the editor recognizes you are a stingy, overly critical person. Do not assume that the editor will not reveal your identity to the authors. In the long run, there are no secrets.
      • If you recommend acceptance of all papers, then the editor knows you are not a discriminating referee.

Possible considerations:

  • Research question and hypothesis:
    • Is the researcher focused on well‐defined questions?
    • Is the question interesting and important?
    • Are the propositions falsifiable?
    • Has the alternative hypothesis been clearly stated?
    • Is the approach inductive, deductive, or an exercise in data mining? Is this the right structure?
  • Research design:
    • Is the author attempting to identify a causal impact?
    • Is the “cause” clear? Is there a cause/treatment/program/fist stage?
    • Is the relevant counterfactual clearly defined? Is it compelling?
    • Does the research design identify a very narrow or a very general source of variation?
    • Could the question be addressed with another approach?
    • Useful trick: ask yourself, “What experiment would someone run to answer this question?”
  • Theory/Model:
    • Is the theory/model clear, insightful, and appropriate?
    • Could the theory benefit from being more explicit, developed, or formal?
    • Are there clear predictions that can be falsified? Are these predictions “risky” enough?
      • Does the theory generate any prohibitions that can be tested?
      • Would an alternative theory/model be more appropriate?
        • Could there be alternative models that produce similar predictions—that is, does evidence on the predictions necessarily weigh on the model or explanation?
      • Is the theory a theory, or a list of predictions?
      • Is the estimating equation clearly related to or derived from the model?
  • Data:
    • Are the data clearly described?
    • Is the choice of data well‐suited to the question and test?
    • Are there any worrying sources of measurement error or missing data?
    • Are there sample size or power issues?
    • How were data collected? Is recruitment and attrition clear?
      • Is it clear who collected the data?
      • If data are self-reported, is this clear?
      • Could the data sources or collection method be biased?
      • Are there better sources of data that you would recommend?
      • Are there types of data that should have been reported, or would have been useful or essential in the empirical analysis?
      • Is attrition correlated with treatment assignment or with baseline characteristics in any treatment arm?
  • Empirical analysis:
    • Are the statistical techniques well suited to the problem at hand?
    • What are the endogenous and exogenous variables?
    • Has the paper adequately dealt with concerns about measurement error, simultaneity, omitted variables, selection, and other forms of bias and identification problems?
    • Is there selection not just in who receives the “treatment”, but in who we observe, or who we measure?
    • Is the empirical strategy convincing?
    • Could differencing, or the use of fixed effects, exacerbate any measurement error?
    • Are there assumptions for identification (e.g. of distributions, exogeneity?)
      • Were these assumptions tested and, if not, how would you test them?
      • Are the results demonstrated to be robust to alternative assumptions?
      • Does the disturbance term have an interpretation, or is it just tacked on?
      • Are the observations i.i.d., and if not, have corrections to the standard errors been made?
      • What additional tests of the empirical strategy would you suggest for robustness and confidence in the research strategy?
      • Are there any dangers in the empirical strategy (e.g. sensitivity to identification assumptions)?
      • Is there potential for Hawthorne effects or John Henry-type biases?
  • Results:
    • Do the results adequately answer the question at hand?
    • Are the conclusions convincing? Are appropriate caveats mentioned?
    • What variation in the data identifies the elements of the model?
    • Are there alternative explanations for the results, and can we test for them?
    • Could the author have taken the analysis further, to look for impact heterogeneity, for causal mechanisms, for effects on other variables, etc?
    • Is absence of evidence confused with evidence of absence?
    • Are there appropriate corrections for multiple comparisons, multiple hypothesis testing?
  • Scope:
    • Can we generalize these results?
    • Has the author specified the scope conditions?
    • Have casual mechanisms been explored?
    • Are there further types of analysis that would illuminate the external validity, or the causal mechanism at work?
    • Are there other data or approaches that would complement the current one?