Can we secure mass data collection as IoT networks become ubiquitous?

20 Mar 2018

BT's Mike Pannell on the different ways of anonymising information and their application to IoT data

When I recently visited the national cycling centre, it was no surprise to see British athletes discussing training with their coach whilst peering into a laptop screen. Data such as heart rate, power, speed and elevation all feed into complex mathematical models. This information can be used to predict performance and highlight areas to improve.

As a keen athlete, I’m accustomed to collecting data from races and training sessions. I can choose whether to share that data with companies like Garmin or Strava, and I have an idea how they use that data. Clearly, as with any other social media platform, users need to control the privacy settings otherwise running the risk of having their data misused or at worst stolen. Thieves can break into your home identifying where you live and knowing when you are away from the data you might have shared of your cycling routes.

When Strava publicised the heat maps showing where people exercise the most and from this military bases could be identified, to some this was a great publicity stunt, but it should also be a warning on the risks of collating large volumes of data.

Despite these issues, most athletes actively use these fitness devices and share their data. Generally, users are given options to express a clear personal consent to data collection and sharing by companies like Strava and Garmin.

Compare this to Internet-of-Things (IoT) networks, these are becoming popular with organisations looking to automate their business. Sensors in household bins, rivers and car parks are all some applications of IoT technology. Unlike fitness devices where I can tick a box to opt-in or opt-out to use their service, many IoT applications presume a level of consent.

Any IoT data that may be personal to an individual are still likely to be governed by the GDPR. These regulations call for informed consent specific to the application with a clear opt-out option. Any large scale IoT deployment is not going to work if everybody affected needs to consent. Instead data anonymization is seen as the answer to this problem. After all, anonymous data are thought not to be personal.

Is data anonymization good enough? That depends upon how it’s done and what safeguards are in place. There are two approaches to anonymising information - Unlinked where any sensitive data are replaced with random information, and Linked where sensitive data are replaced with a code known by somebody.

Unlinked data are fine for general statistics gathering, but Linked Anonymised Data are more useful if you want to take action on a particular event. Take the scenario of a sensor in household bins, if anonymous data is sent to the refuge company to gather statistical data then there is little sensitive information. However if a link exists to an address ID and the local authority has a list of IDs and actual addresses with householder’s identity, then together these form personal information and then a GDPR risk exists.

Therefore, all elements of a linked data-set must be equally protected and GDPR compliance requires complete control of data usage by all parties. This is a complex issue and requires careful thought on how to protect personal information and comply with GDPR regulation. The Information Commissioner’s Office has produced a comprehensive code of practice on using anonymised data.

Consider another example of a housing company who may want to use IoT sensors to monitor individual properties for damp, cold or energy consumption. If they detected a cold, damp property, then heating system may have failed or the tenant couldn’t afford to turn the heat on. Knowing this information in real-time can bring help quicker and issues could be addressed in a single visit. There are clear social benefits of identifying people at risk of health problems exacerbated by poor housing.

Knowledge of building occupancy, whether the tenant could afford to turn the heating on etc. are examples of linked personal information that would be governed by GDPR. It is not easy to gain clear and informed consent from every tenant, and regularly review it.

Will tenants accept this level of monitoring of their private lives? Will the benefits outweigh the perceived risks?

If as a society we want to benefit from sensors incorporated into all aspects of our lives, the people who collect and process this information on our behalf should be totally clear on how they protect our privacy.

We will solve this conundrum, but it’ll take time to resolve the complex technical and legal issues. There are good anonymization techniques and strong mechanisms to protect the codebook used to protect identity. These are fundamental principles of cryptography and if we apply that rigour to protecting IoT data then we will solve this problem.