The below list is 100% taken from the following sources; my only contribution is to mix them up into a three-page document.
- http://blogs.worldbank.org/impactevaluations/how-much-to-referee-and-how-to-do-it
- http://marginalrevolution.com/marginalrevolution/2006/10/how_to_be_a_goo.html
- http://chrisblattman.com/files/2009/07/PLSC508-Syllabus-Spring2010.pdf
- http://www.roie.org/howr.pdf
Nevertheless, may prove useful. Additions, of course, welcome.
- Assume that no referee reports are truly anonymous. It is fine to be critical but always be polite.
- Skim the paper within a couple of days receiving the request- my metro rides are good for this – you can quickly tell whether this is a paper that is well below the bar for some obvious reason and can be rejected as quickly as possible.
- Unless it is immediate junk, read the paper once and return to it a week later with deeper thoughts and a fresh mind.
- Referee within one month.
- Remember you are the referee, not a co-author. I hear a lot that young referees in particular write very long reports, which try and do way more than is needed to help make a paper clear, believable and correct. I think 2 pages or less is enough for most reports.
- Your report should not assume that the editor has a working knowledge of the paper.
- The first paragraph should summarize the contribution. Reviewers should provide a concise summary of the paper they review at the start of their report and then provide a critical but polite evaluation of the paper.
- Explain why you recommend that the paper be accepted, rejected, or revised.
- If you would like the editor to accept the paper, your recommendation must be strong. The more likely you think the paper is to merit a revision the more detailed should be the comments.
- The referee report itself should not include an explicit editorial recommendation. That recommendation should be in a separate letter to the editor.
- If you consistently recommend rejection, then the editor recognizes you are a stingy, overly critical person. Do not assume that the editor will not reveal your identity to the authors. In the long run, there are no secrets.
- If you recommend acceptance of all papers, then the editor knows you are not a discriminating referee.
Possible considerations:
- Research question and hypothesis:
- Is the researcher focused on well‐defined questions?
- Is the question interesting and important?
- Are the propositions falsifiable?
- Has the alternative hypothesis been clearly stated?
- Is the approach inductive, deductive, or an exercise in data mining? Is this the right structure?
- Research design:
- Is the author attempting to identify a causal impact?
- Is the “cause” clear? Is there a cause/treatment/program/fist stage?
- Is the relevant counterfactual clearly defined? Is it compelling?
- Does the research design identify a very narrow or a very general source of variation?
- Could the question be addressed with another approach?
- Useful trick: ask yourself, “What experiment would someone run to answer this question?”
- Theory/Model:
- Is the theory/model clear, insightful, and appropriate?
- Could the theory benefit from being more explicit, developed, or formal?
- Are there clear predictions that can be falsified? Are these predictions “risky” enough?
- Does the theory generate any prohibitions that can be tested?
- Would an alternative theory/model be more appropriate?
- Could there be alternative models that produce similar predictions—that is, does evidence on the predictions necessarily weigh on the model or explanation?
- Is the theory a theory, or a list of predictions?
- Is the estimating equation clearly related to or derived from the model?
- Data:
- Are the data clearly described?
- Is the choice of data well‐suited to the question and test?
- Are there any worrying sources of measurement error or missing data?
- Are there sample size or power issues?
- How were data collected? Is recruitment and attrition clear?
- Is it clear who collected the data?
- If data are self-reported, is this clear?
- Could the data sources or collection method be biased?
- Are there better sources of data that you would recommend?
- Are there types of data that should have been reported, or would have been useful or essential in the empirical analysis?
- Is attrition correlated with treatment assignment or with baseline characteristics in any treatment arm?
- Empirical analysis:
- Are the statistical techniques well suited to the problem at hand?
- What are the endogenous and exogenous variables?
- Has the paper adequately dealt with concerns about measurement error, simultaneity, omitted variables, selection, and other forms of bias and identification problems?
- Is there selection not just in who receives the “treatment”, but in who we observe, or who we measure?
- Is the empirical strategy convincing?
- Could differencing, or the use of fixed effects, exacerbate any measurement error?
- Are there assumptions for identification (e.g. of distributions, exogeneity?)
- Were these assumptions tested and, if not, how would you test them?
- Are the results demonstrated to be robust to alternative assumptions?
- Does the disturbance term have an interpretation, or is it just tacked on?
- Are the observations i.i.d., and if not, have corrections to the standard errors been made?
- What additional tests of the empirical strategy would you suggest for robustness and confidence in the research strategy?
- Are there any dangers in the empirical strategy (e.g. sensitivity to identification assumptions)?
- Is there potential for Hawthorne effects or John Henry-type biases?
- Results:
- Do the results adequately answer the question at hand?
- Are the conclusions convincing? Are appropriate caveats mentioned?
- What variation in the data identifies the elements of the model?
- Are there alternative explanations for the results, and can we test for them?
- Could the author have taken the analysis further, to look for impact heterogeneity, for causal mechanisms, for effects on other variables, etc?
- Is absence of evidence confused with evidence of absence?
- Are there appropriate corrections for multiple comparisons, multiple hypothesis testing?
- Scope:
- Can we generalize these results?
- Has the author specified the scope conditions?
- Have casual mechanisms been explored?
- Are there further types of analysis that would illuminate the external validity, or the causal mechanism at work?
- Are there other data or approaches that would complement the current one?