Cabinet Office taps data analytics firm for ‘spring clean’ of millions of files

Department extends existing supplier engagement in £500k deal

By Sam Trendall

22 Feb 2023

The Cabinet Office has signed a £500,000 deal with to expand the department’s use of automated data analytics to support “spring clean” exercises in which millions of files will be sifted – and, where applicable, deleted.

Freshly released commercial documents indicate that, two years ago, the department “decommissioned a legacy IT system called Apollo… [which] contained significant volumes of unstructured and semi-structured data” – amounting to about 4TB, across 11 million individual files held in 170 differing formats.

Initial analysis suggested “there were significant volumes of ROT – redundant, outdated, trivial (ROT) – [information] as well as historically valuable information that must be retained by the Cabinet Office for long-term preservation”. 

On 1 February 2021, the department signed a contract with Belfast-based data analytics firm Automated Intelligence. In a £124,000-a-year deal that ran until 31 January 2023, the firm’s AI.Datalift technology has been used by the Cabinet Office as the foundation for developing an “automated decision-making tool… [that] was piloted on the legacy Apollo data set”.  

“The tool has proven to deliver accurate and reliable disposal decisions, aiding digital archivists to greatly reduce the volume of ROT held by the Cabinet Office for this data set,” according to newly published procurement documents.

Senior executives on the department’s People and Operations Committee last year approved an expansion of the automated analysis tool for use on other data sets held by the government's central agency.

The Cabinet Office has thus awarded a new two-year deal to Automated Intelligence, running directly on from the conclusion of the previous deal on 31 January. This contract will be worth £236,500 a year – almost double the value of the prior engagement.

The contract outlines that the firm is expected to provide “data analytics tool that is programmable so that data can be classified as either ROT or as valuable… [and] that is able to action a deletion on data so designated”.

The system developed using the company’s technology will now be used to support annual “spring clean” projects in which millions more files will be analysed.

These exercises – led by the Cabinet Office’s Digital Knowledge and Information Management (DKIM) team – are designed to sort through digital documents and categorise them for either deletion or retention.

This process has created an accumulation of “legacy information awaiting a disposal decision”.

“DKIM currently holds 4.9million digital files in its ‘holding pens’ which is made up of information collated from across Cabinet Office through the annual DKIM-led spring clean process, an annual muster of records identified for disposal and retention according to Cabinet Office policy,” the text of the new contract says.

Automation technology will be used to further analyse files that are set to be retained, in order to identify and remove any remaining redundant, outdated, and trivial data.

Having completed sifting of this legacy data, the system will be used to “analyse information collected through the DKIM spring clean in future years, currently estimated to be 350,000 documents per annum”.

The deal with Automated Intelligence will see the company work with the Cabinet Office to “explore the use of the tool to support business units to identify information in scope for future spring clean activity – [which] is currently a resource intensive, manual process”, the contract says.

Sam Trendall is editor of CSW's sister title PublicTechnology, where this story first appeared

Read the most recent articles written by Sam Trendall - Government Digital Service chief to leave post

Share this page