first of all, i admit that i had to use the interwebs to make certain of the meaning of ‘put that in your pipe and smoke it,’ having initially heard the phrase via west side story and only inferring the meaning. wikitionary suggests that it is truth-telling of a surprising or unpleasant nature, which might be a bold claim (the truth part, anyway) for the blog below. meanwhile, the phrase finder links it with dickens and more with meditation than a particular claim to truth. this dickensian/sherlockian/caterpillarian interpretation seems slightly less intimidating.
in any case, i am sticking with it, as it is mildly witty since i want to meditate on (, second of all,) ‘pipeline’ or ‘phase-in’ designs for impact evaluation. i’ve been bothered about them for some time, which means i either need to fix them or blog about them. one of those is easier to do over beer than the other, so i’ll try that first. in a fanciful theory-of-change (bolstered by a different pipe, different smoke, perhaps), a blog post can change the practice of policy evaluation. so here goes…
phase-in designs (for example, here, with no intention of picking on this example), with a staggered roll-in of a *real! live!* policy idea, provide an exciting opportunity for evaluation. because of supply-side constraints, say, a government opts to introduce a policy/program/policy in only among some areas or groups in the first go. in this scenario of constrained resources, we (evaluators, onlookers, policymakers) can feel extra-special-good about randomizing the sequence of intervention receipt. in fact, a random lottery seems the fairest way to deal with the binding supply-side constraint, when contrasted with convenience or political favoritism (somehow, this just came to mind).
randomization of the phase-in schedule allows the researchers to create a clean counterfactual – and to do so guilt-free, since everyone will get the policy/treatment eventually and the order is strictly determined by luck. with such a comparison group (proxying the counterfactual), researchers are in a good position to make causal statements about the impact of the p/p/p (by comparing the observed results to what effects would have been achieved in the absence of the p/p/p).
so, at the end of phase 1 of a roll-in, the researchers can collect their endline data and the government can (drumroll!)… continue with phase 2 of the roll-in. as-is, as-promised.
this is where things feel a bit goofy. surely the most policy-relevant application of the evaluation described above are lessons are for phase 2 of the same policy. and yet, it does not seem common (with ‘not common’ here defined as “i haven’t read about it and i did ask a few people”) to read about planning to use the results from the impact evaluation (or, really, even the accompanying process data) of phase 1 to inform phase 2, the most immediate and obvious application.
i imagine this is damn-the-results-full-speed-ahead plan comes about for one of (at least) two reasons.
- after endline data are collected, it will still likely be months before the data are entered, cleaned and analyzed to a point at which the researchers are comfortable sharing preliminary results. such a timeline may well be out of sync with political cycles, even accounting for the binding constraints on time that led to the pipeline design in the first place.
- the phased-in strategy was a response to a budgetary (or other supply) constraint but the politicians in question have already publicly promised a full rollout of the p/p/p under study as soon as funds allowed.
but what this means is that a phased-in design is primarily a nifty identification strategy, not a tool for directly improving policy. this seems unfulfulling (right? because it’s not just about niftiness, right?). i see at least two possible (and linked) ways forward on this.
- if a set of policymakers or implementers agree to randomize roll-out of their p/p/p in the face of a supply-side constraint, might they also be willing to agree to leaving a sufficient time gap between the completion of phase n and the initiation of phase n+1 to allow for, at least, rough analysis of the results before deciding whether and how to proceed with phase n + 1?
- the types of p/p/p that are so resource-intensive that they result in supply-side constraints are likely quite visible – even flagship programs. this can put policymakers in a tough position; even if the evaluation results from phase n come back null/negative/confusing, public promises have likely been made. what to do then? elected officials pulling a syke/sike/psych doesn’t always work out so well. i feel this creates a space for us to (re-)consider what is being tested and therefore what type of policy advice can come out of the test. let’s say that a government body has, broadly, decided that they want to offer an old-age pension program. even if phase n data suggest that the pension does nothing, it’ll be pretty hard for a government not to carry forward with a pension plan (or, moreover, revoke the program from those given it in phase n). however, if the mindset for phased-in designs shifts away from testing treated against not-treated and, rather, toward testing Option A versus Option B in pension scheme design, and the policymakers can wait for the results, then at least the policymaker can be offered with evidence-informed advice on the “best” (among the options) way to move forward, rather than the politically untenable question of whether’ to move forward.
to sum up: phase-in or pipeline designs offer a unique identification opportunity with regards to the causal effects of a ‘real’ p/p/p. however, by (lack of?) design, researchers using such designs often fail to provide evidence that informs the most immediate policy question: what to do in phase n+1, if anything. timing is one barrier between analyzing phase n and deciding what to do in phase n+1. but so are political promises, entrenchment and irrevocability. if the pipeline approaches shifts away from taking advantage of a program/no-program discontinuity and instead looks at program options, then these data, if timely, may actually improve programs rather than only generating nice papers.
this is part of a larger issue on the plan to use evidence generated in evaluations, on which i hope to write soon (here’s looking at you, @suvojitc).
(also, with discussion of pipelines, i feel like i should add at least a ‘boo‘ on the keystone pipeline.)