Dods

Navigating the metrics maze

25 Apr 2012

Getting the right metrics to drive effective markets is no easy task, but it’s key to payment by results. It can also be a dry topic – which may be why Joshua Chambers begins his investigation with Pacman and Jurassic Park.

While payment by results systems can seem intimidatingly complex, we can throw light on them by dipping into a couple of pop culture classics. The venerable video game Pacman, for example, teaches the importance of focusing on outcomes, and shows how people can be focused on desired goals through a judicious mix of carrots and sticks (or pills and ghosts, in this particular case).

Meanwhile, the movie Jurassic Park contains two salutary lessons for all civil servants. First, systems must be flexible, because nothing remains unchanged for long. Second, companies must closely scrutinise contractors to prevent bad behaviour.

The bioengineers of the film’s prehistoric theme park assumed that their dinosaur population would remain fixed – but the creatures managed to breed, multiplying the number of potential problems that had to be controlled. Similarly, designers of the Department for Work and Pensions’ (DWP’s) Work Programme assumed a far lower level of unemployment than the current reality. No matter how carefully laid your plans, if you assume that the environment will remain static, you’re likely to come a cropper.

As for controlling contractors – well, at Jurassic Park an IT supplier overcame security checks to defraud his commissioner, disabling electric fences to steal dinosaur embryos. In the real world, allegations fly that the subvertion of metrics systems by contractors at welfare-to-work programmes has led to systemic fraud.

The topic of metrics is particularly pertinent because government wants to increase its use of contractors, often by building complex payment by results (PBR) systems. Until 2010, such schemes were limited to welfare programmes, but there are now numerous pilots across government; they’re coming to a department near you. And while Pacman and Jurassic Park are useful parables, they do seem to have missed a few important details. So CSW has researched the key factors and potential pitfalls that civil servants should consider when developing PBR metrics, examining most closely their use in welfare schemes – the biggest and oldest of the UK’s PBR programmes.

What results can you sensibly measure?
It may sound obvious, but it’s important to first properly understand what results you want to achieve – as David Clarke, director of commissioning and civil society at the National Audit Office (NAO) explains. Clarke recently led a team of auditors analysing the Work Programme, and says that defining the results for which you want to pay people “can be difficult in the sort of services that are provided in the public services arena”.

Rob Wormald agrees. As the head of market development at the DWP from 2009 until 2011, he was in charge of the contracting for multiple welfare-to-work programmes. Now working as a consultant for Publico, he warns that “in social policy, there’s a range of gains, some of which may be interim means to an end. The end itself may not be directly measurable.” Where that’s the case, he suggests identifying “things that will be close substitutes, or tell you that there’s progress being made towards a destination you are seeking”. For example, if the ultimate aim is to get someone into employment, interim measurements may include individuals’ timekeeping, their participation in skills development courses and their ability to show up for an interview – all factors indicating their engagement with the world of work. Similarly, in an education scheme designed to produce qualifications, one interim measure might be levels of truancy.

Clarke explains that once you’ve defined your outcomes, you then have to figure out how you measure the contribution that contractors are making to any observed progress. On an employment scheme, some people will find work regardless of the assistance of a particular provider. So how do you measure and pay for only the results a supplier has achieved?
While you’re mulling that over, there are other issues to consider, too. Departments need a clear understanding of the costs involved in tackling a problem, Clarke says. If payment is too generous, you may be wasting money; if it’s too tight, there’s a risk that providers won’t be able to achieve your ambitions.

Measuring success
So, how do you cut out the white noise and ensure that you’re paying contractors only – and fully – for the results of their work? There are numerous metrics available, each with their own strengths and weaknesses.
The Ministry of Justice has just considered a number of methods for its own pilot schemes (see box), while the DWP has trialled two approaches on its welfare-to-work schemes. One of these is to measure outcomes within both the client group, and a control group where no intervention is made. Mr Wormald explains that this approach was taken in the Pathways to Work programme, but because it means leaving some people without the service it can’t be used on national programmes: “The risk is that you’re effectively saying: ‘Let’s leave some individuals with no hope at all’,” he points out. “And if you’re the MP for the constituency where the [control group] is based, you get pretty exercised if they’re left as guinea pigs.” Pathways to Work was able to demonstrate a marked improvement in the areas where the system was used, but “it’s quite a difficult road to take in social policy,” Wormald warns.

Instead, the Work Programme simply pays contractors per client who gets into work and stays there. Providers do receive a small payment for each new participant they take on, but the majority of their remuneration is for their jobseekers sustaining employment for up to two years, paid in monthly sums after an initial 26-week period. The NAO approves the department’s definition of success, with Clarke stating that “it seems like a sensible way of defining an outcome in this case.” It is, however, critical of the rewards that providers can receive.

Everything changes eventually
It’s a given that government can’t predict the future in precise detail. According to the NAO, when calculating the payment levels for each job outcome, the DWP underestimated the impact of the recession on job numbers. With the current high levels of unemployment, contractors are struggling to help people find jobs and thus each outcome is costing them more than they’d expected.
Clarke led the team that analysed the DWP’s figures, and which concluded that the Work Programme was underpinned by “over-optimistic” assumptions about likely performance. The department still holds that 40 per cent of its largest group of jobseekers will be placed into jobs for which providers will be paid, but the NAO – building its calculation on the latest forecasts by the Office for Budget Responsibility – put that figure at just 26 per cent.

If contractors have to put in extra, unpaid work to produce job outcomes, on the face of it the department will do well out of its miscalculation. “In some ways, it’s a good deal for the Exchequer,” Clarke says. But the consequences could be negative in the longer term: if contractors find they can’t turn a profit they may try to find shortcuts, cut the quality of services, further squeeze SME and charity subcontractors, or drop out altogether, leaving the department to carry the can. They may also prove reluctant to bid for future DWP contracts, leaving the department short of providers.

One provider is Tomorrow’s People, a subcontractor in four regions of England. The charity’s chief executive, Baroness Stedman-Scott, says that “we are getting people into work, but not at the levels prior to the difficult economic downturn – and we’re having to work twice, three times as hard to make sure we do our best for them.” Another charity, SHP, has already dropped out of the scheme on the basis that “the Work Programme does not adequately fund the intensive preparatory work required to achieve these results. We just do not have the resources to effectively subsidise a national government programme.”

Be careful what you wish for
Another problem faced by all PBR systems is that targets can create perverse incentives that encourage unproductive or even harmful activity. If badly designed, metrics can reward contractors for doing things that don’t help realise intended goals.

In these circumstances contractors may engage in a practice called ‘gaming’: playing the system to maximise their gains, regardless of the social outcome. For example, they may claim multiple payments from different government organisations for helping the same person: in some cases suppliers may seek funding from a welfare programme for providing skills that get someone back into work, while simultaneously claiming funding from the business department for providing adult education to the same person.

Another method of gaming the system is by meeting the targets set, but not in the spirit they are intended. For example, Wormald says under previous welfare programmes, contractors were paid for finding people employment within a 13-week deadline. Some were simply hiring individuals for the 13th week and dumping them again afterwards.

These examples show how all PBR schemes need to have auditing systems in place to scrutinise contractors. One approach is to ask contractors to employ their own auditors, says Wormald – but that approach is vulnerable to obvious conflicts of interest. He argues that the DWP was right to run its own auditing operation on the Work Programme, more effectively deterring companies from playing the system.

The contractors that get the cream
Even if contractors are deterred from gaming, they may engage in a practice called ‘creaming’. This is where they choose the candidates most likely to succeed and focus their resources on those individuals. This means that people are receiving variable services, of course; and, worse, that the greatest resources are focused on those who need help the least. Wormald is relaxed about creaming, arguing that it produces good value for money – but it does suggest the need for PBR systems to ensure that those requiring more assistance aren’t neglected. The Work Programme, for example, pays contractors more when those ‘furthest from work’ find jobs; a differentiated payment system incentivises contractors to focus resources on need.
Another dubious tactic suppliers may engage in is ‘parking’, where contractors ignore candidates entirely, failing to provide a minimum level of service. This can be tackled by setting mandatory minimum standards of provision, says Wormald, policed through spot checks where people are selected at random and contractors are forced to prove they have helped them. He also suggests an extra lever: those who fail such tests should be blocked from tendering for future contracts. Abroad, countries such as Australia are examining the option of financial sanctions for companies which misreport their progress, or ‘park’ people they should be helping.

Finally, there’s the problem of fraud, where providers claim payments for results that are manufactured or fictional. Clarke warns that this is an inherent risk in all PBR schemes because “providers tend to have aggressive incentive arrangements to get their staff to deliver [against outcome targets], and this means that unless there are compensating controls, there’s pressure to perhaps overstep the mark entirely and fraudulently claim outcomes that don’t exist.”
One welfare provider, A4e, is alleged to have done this on a previous scheme. The Work Programme has a dedicated DWP team checking contractors, and a whistleblower’s charter to encourage accusations of wrongdoing to emerge. Further, the department uses data including information from HMRC to check that claimants are off benefits before a contractor gets paid.

New tricks
Be aware: Clarke says that implementing PBR schemes requires new skills. Traditionally, contractors have been paid on the basis of outputs, not outcomes – work done rather than results achieved – paying for results requires new commissioning capabilities. Managing a market and setting an appropriate pricing level is a complex skill, and one that departments will need to improve, he thinks.

Wormald, meanwhile, says that departments should work together closely when establishing PBR schemes, ensuring that contractors are focused on broad social aims rather than narrow targets. The Department for Communities and Local Government is piloting this approach in a scheme to help troubled families. As well as its own budget, it draws on those of the health, education, and work and pensions departments, as well as the Home Office and the Ministry of Justice. It could be a forerunner to other joined-up approaches.

Don’t get carried away, though. PBR can’t address every problem, according to Dan Corry, chief executive of New Philanthropy Capital (NPC) – a think-thank that helps charities measure their social impact. He warns that complex problems may not have outcomes that are easy to fit into a PBR model. And PBR is not necessarily the best way of addressing problems best tackled by charities. Because contractors are only paid once the results are in, they need lots of upfront capital, disadvantaging smaller, often more innovative charities in favour of larger commercial suppliers.
What’s more, some provider markets may not be mature enough to cope with particular PBR schemes. Clarke recommends piloting small schemes, rather than rushing headlong into national-scale programmes. That enables appropriate metrics to be created and interest to develop in the private and third sectors.

These examples show that PBR isn’t a panacea for all ills; it forms just one weapon in a department’s armoury. Clarke says “a lesson here is what used to happen on the PFI [private finance initiative] scheme, where it became the only game in town, and was applied to circumstances where it shouldn’t be. It would be a terrible shame if that happened to PBR”.
In many fields, however, PBR is the future. And where departments do pursue such systems, don’t forget to spend the time and energy developing good metrics. The ultimate social outcomes may be the glittering prize, but the complex measurements and assumptions underpinning these schemes are where the story really is. No special effects required. ?

The metrics considered by the Ministry of Justice
The Ministry of Justice has numerous PBR schemes, but its most advanced are two schemes running in the prisons system. They use different metrics to measure outcomes, and demonstrate the pros and cons of each approach.

In Peterborough, the ministry is running a scheme that analyses the frequency of reoffending, measuring contractors’ success in reducing the proportion of recently-released prisoners who commit a crime against the results for a cohort where there’s no intervention. Working with three cohorts of 10,000 people, contractors will be charged with ensuring that the reconviction rate in each group is at least 10 percentage points lower than that within the 10,000-strong control group: where they hit this target for a group, they’ll be paid a fee for each ‘missing’ reconviction above that 10 per cent differential. Should the contractors fail to deliver at least 10 per cent in any cohort, but still reduce reoffending by 7.5 per cent across all three, they will receive a smaller payment. Anything less, and contractors will not receive any payment.

Rebecca Endean is director of analytical services at the ministry, and says that it’s important to match the demographics and characteristics of the control group with those of the cohorts receiving services, because “if the frequency of reoffending falls [in the Peterborough groups], we can be sure that it’s not to do with the fact that people leaving Peterborough have different characteristics.”

Meanwhile, in Doncaster prison, the MoJ is trying out the Work Programme’s approach: doing without a control, and simply paying contractors if they do better than the department’s calculation of likely future results if there’s no intervention. This ‘before and after’ approach is much simpler: reoffending will be measured as the percentage of offenders who commit an offence during the 12 months following discharge, proved by a court conviction during this period or in the subsequent six months. Providers will be paid if they achieve a five per cent reduction in overall reoffending.

One problem has been determining what reoffending reduction warrants a payment to contractors. “Our concern is that five percentage points is quite difficult to achieve, so people might not want to bid for all of the contracts,” says Endean. The Doncaster pilot is up and running, but when the results come in a few years down the line, it may be that a follow-up scheme requires a lower target to ensure contractors can gain some payment, she says.

Read the most recent articles written by Joshua.Chambers - Civil Service Awards