Why embracing messy data can help us make better policy

It is time for government to fill its data gaps via more alternative and disruptive information sources, argues Keren Pakes
Credit: Pixabay

By Keren Pakes

26 Nov 2021


Our real-time economy relies on data – a key element that is being created in masses each and every minute. With this in mind, Ed Humpherson’s interesting Civil Service World analysis piece on data paucity for the social sector (The civil society data gap matters – and this is how we fill it) was important in highlighting a pressing issue.

As Humpherson points out, timely, high-quality information is vital to allow policymakers to quickly see where problems lie and where policy intervention and help is most needed.

But as I digested the piece, I felt there was something missing. With such a yawning data gap to address, is now the time to adopt more alternative and disruptive methods of gathering the information required?

The piece drew heavily on a valuable new report from Pro Bono Economics for The Law Family Commission on Civil Society. While there is mention in the report of modernising the collection of social sector data, there is nothing on the role of new technologies, which could help rapidly fill the holes in our knowledge.

The solutions proposed are all good – but could easily be expanded beyond making “better use of existing administrative and survey data” and working to “modernise the submission of annual reports and accounts”.

One huge source of data that was originally designed to be public is the internet – the largest database ever created. In an ideal world, all public data would be open and accessible by design, reducing the human time needed to gather and analyse it.

But the huge volume of unstructured, “messy” public data worldwide is set to keep growing – and the only way to gather real-time insights and make sense of this is by using automated tools and platforms. For example, imagine developing a Covid-19 vaccine without having a full picture of the global reaction to the virus and without taking into account real-time experiences on a global scale. Would that be possible? The answer is, obviously, “no”.

The cumbersome pdf documents that Humpherson highlights in his piece are certainly not best practice when it comes to open data standards. But I fear they aren’t going to be usurped overnight. With new tech tools and the magic of data science, these awkward, unstructured data nuggets can be tamed – giving us key insights into the state of the social sector and other parts of society.

A brief footnote in the Pro Bono Economics report does, encouragingly, suggest that alternative data “may be worth exploring”. Here we can take a cue from other industries. Non-traditional web-based datasets are being used right now to make big, important decisions; for example, insurers are using new tools to find insightful alternative datasets that allow them to assess levels of risk down to a more granular level.

At the Office for National Statistics, use of gathered web data is being trialled to inform consumer price statistics. The Bright Initiative has also been supporting the Department for Digital, Culture, Media and Sport to explore the aggregation of public job vacancies, helping to inform skills policy in real time.

International Data Corporation estimates that 80% of data worldwide will be unstructured by 2025. It will flow from social media, streaming audio and video, as well as machines and sensors. This type of alternative data isn’t neat and well-ordered in spreadsheets; it is sprawling and unruly.

If we are to truly leverage data to “improve a range of public services”, as outlined in the National Data Strategy, we must look to harness new and alternative data sources – whether that’s from individual and organisational social media posts, blogs or even online reviews of services.

Of course, alternative data is not a panacea. It should be used responsibly and carefully in conjunction with traditional sources to ensure that policymaking is inclusive and informed – almost up to the minute – and as good as it can be.

And we must do it in a way that is open, compliant and responsible. Research from the Open Data Institute suggests that people may be happy with their data being used to benefit society – but not so happy with it being used to assist the investment decisions of hedge funds, for example. Transparency and open debate around use of alternative data is key.

As well as expending our energy and efforts on making traditional data more structured, we should also now embrace the opportunity of unstructured data – to help us address the civil society data gap that threatens to undermine effective policy-making.

Keren Pakes is general manager of the Bright Initiative, a global programme that uses public web data to drive positive societal change

Share this page