A civil servant’s guide to policy evaluation: six lessons for a robust approach

25 Jun 2018

Evaluating the impact of policies is vital, but often beset by difficulties. Dan Hodges of Innovate UK sets out how the agency measure its effectiveness – and what it can teach the rest of government

Evaluating the impact of public policy is of ever-increasing importance. Policy makers expect robust evaluation methods in order to understand which policies and programmes are most effective in which circumstances, and whether their overall benefits justify the costs. However, in many areas robust evaluation is difficult. Innovate UK has recently published our evaluation framework, setting out our approach to overcoming the challenges we – and many other branches of the public sector – face in evaluating our impact.

Innovate UK is an agency of the non-departmental public body UK Research and Innovation and is funded by government grant. We support UK business innovation through funding and connecting innovative businesses, researchers and investors.

“Impact might not materialise for several years, but it is never too late to start evaluating a programme. For the most robust approach, evaluation should be designed into a programme’s implementation and delivery”

However, evaluating the impact of this support on businesses and the wider economy is notoriously difficult. The returns to innovation occur over long time periods, in unpredictable ways, and many of the impacts spill over to businesses that are not directly engaged with programmes. Added to that, innovation support programmes have relatively small sample sizes for statistical analysis, and the primary outcome of innovation activity – knowledge – is unobservable, with limited standard measures for capturing its creation and flow.

We published our evaluation framework in January, setting out the main challenges we face in evaluating our programmes, and the solutions we have been developing to help mitigate them. We have implemented improved monitoring processes, begun conducting evaluations over longer time periods, with larger cohorts and a logic model approach which demonstrates outputs and outcomes before full impacts are realised. We have used third-party data to improve our understanding of impact, and enhanced our own data collection. We have now successfully implemented more robust evaluation methods, including randomised control trials and regression discontinuity design – a quasi-experimental method which can be used to mimic a randomised treatment in areas where randomisation is unfeasible. If there are challenges which limit the use of such methods, we have implemented theory-based approaches such as contribution analysis to ensure the most complete picture of impact possible is formed.

The approach we have taken to improving our evaluation framework provides many lessons that are applicable across the public sector. Through implementing some of these lessons, a consistent and transparent approach to evaluation can enable improved robustness of evidence, greater understanding of what works and why, and improved programme design and delivery.

Design evaluation into the programme

Impact might not materialise for several years, but it is never too soon to start evaluating a programme. For the most robust approach, evaluation should be designed into a programme’s implementation and delivery.

Research questions should be agreed at the outset to provide consistent measures against which the programme will be evaluated; a logic model (setting out inputs, actions, outputs and intended outcomes or impacts) should be drawn up; and data collection mechanisms need to be instituted that can measure this performance.

Following a logic model approach means that even where the final impacts might be several years down the road, the evaluation is active early on, assessing progress against inputs, activities, and outputs as they’re happening.

This will provide leading indicators of impact, and could provide early evidence that a programme is reaching its target audience, that anticipated activities are resulting in expected outputs, or it could flag up any emerging issues.

Where projects or programmes are already under way, there is still a lot of value in implementing an evaluation. Where a programme regularly recruits new participants, such as government grants as part of continuing programmes, an evaluation can be designed around a particular cohort of those participants.

The evaluation can follow that cohort through the lifecycle of a beneficiary, achieving many of the same benefits as if the evaluation was designed into the programme, such as real-time data collection from baseline to impact, a consistent set of research questions, and reduction in cherry-picked evaluation participants.

Even where evaluation is restricted to being implemented once projects are under way, there is value in the exercise.

Baseline data could be collected from verified historical sources, where possible, or at least retrospectively at the earliest opportunity if there is confidence that results won’t be entirely spurious (it is important to bear in mind the old adage, “bad data is worse than no data”). Where baseline data can’t be reliably collected, but the evaluation is active as projects are completing, data on outputs and longer-term impacts can still be collected. Combined with qualitative methods, it should still be possible to build a reasonable picture of the types and magnitude of impact realised.

Evaluation is often very difficult

Although there are a lot of barriers to robust results, this doesn’t mean evaluation should not be attempted. Instead, ensure the most robust methods practical are applied to each element of the programme to build as strong a story of impact as possible.

A mixed-methods approach is usually most suitable. It might be that some aspects of a programme lend themselves more to quantitative analysis whilst others require a more qualitative approach. Inputs and activities are usually straightforward to capture, but as we move further down the logic model, towards outcomes and impacts, it may be that for some programmes consistent quantification isn’t possible. Consider this in the evaluation design, and capture evidence from a broad range of methods, looking for any patterns, trends, or inconsistencies in the findings.

Data is key

There are not many areas of public policy where reliable data on programme inputs, activities, outputs, outcomes, and impacts is readily available.

It is crucial to consider data collection during the evaluation design phase. For something like a competitive grant programme, this will mean ensuring you are collecting – and retaining – basic data on the characteristics of applicants, both successful and unsuccessful.

This should include contact details and suitable permissions to use those details for the purposes of evaluation. Make sure you’ll be able to share with third parties if commissioning the evaluation externally.

Past the initial point of engagement on a programme, you’ll need to ensure data is being collected at key points throughout the customer journey – and in many cases this will involve engagement beyond the end of the programme to capture those longer term impacts.

Data collection should be proportionate to minimise the burden on the participant, using third-party sources where possible, but it is often reasonable to expect programme beneficiaries to participate in evaluation surveys.

Sample size is fundamental

Where a programme has a relatively low number of participants over any particular time period, consider using longer time periods to increase the sample.

We have used a two-year cohort in some evaluations for this purpose, implementing a system of rolling surveys such that data is collected at a consistent time relative to when support is received. So long as a control group is constructed in a similar manner, external shocks should be possible to control for.

Do not get too preoccupied with a single number

Evaluation findings will always come with some gaps and uncertainties. The headline return-on-investment figure is important, but there will be a significant margin of error around it. The narrative and lessons around it will inform decision-making just as much, if not more.

Be innovative when evaluating

It is important to ensure robust measurement. But with this core in place, look at where you can try novel techniques, and push the boundaries of the evaluation a bit further.

We try and learn something new about how to evaluate innovation support with every evaluation we run.