Was 2020 a watershed for public sector use of algorithms?

Last year saw public protests about the use of predictive analytics in decision making. As the technology becomes more widely used in many public services, Sam Trendall asks experts about the implications for government, the law – and the citizens affected

A protest about the use o algorithms in exam results. Photo: PA

By Sam Trendall

25 Jan 2021

If government did not know it has an algorithm problem, it certainly does now.

“There is a clear, and understandable, nervousness about the use and consequences of algorithms, exacerbated by the events of this summer,” said the Centre for Data Ethics and Innovation in a major report published in late November.

On the subject of those events – the use of an algorithm to calculate grades for hundreds of thousands pupils sitting GCSE and A-level exams or their Scottish equivalents – Sharon Witherspoon, vice president for education and statistical literacy at the Royal Statistical Society, said in August: “The lack of transparency around the process has not only caused significant distress for thousands of students, it has threatened to undermine public trust in statistics and their use.

Roger Taylor, chair of the CDEI and also the soon-to-depart chair of Ofqual – whose resignation announcement last month referenced the “very difficult summer” the exams regulator had endured – offered another damning assessment of what took place. “The fundamental mistake was to believe this would ever be acceptable to the public,” he said.

But the loudest of all the critical voices was the one that rang out in the form of a chant heard at numerous student protests that took place in the days after the results were published: “F**k the algorithm.”

“Everybody is getting more worried – which is good... because it is absolutely necessary and important to address these problems, now that our life chances can be decided by technology” Prof Sandra Wachter, Oxford Internet Institute

Many students found that the algorithm had marked them one or more grades lower than their teachers had projected, with about two in five of all results lowered. Higher-performing pupils from more disadvantaged backgrounds were most badly affected.

Following a national outcry, all algorithmic grades across the country were scrapped, and replaced with teacher assessments – albeit not before the university places of thousands of students had been imperilled.

Even if all of those affected were able to rectify the potential damage – and it is not a given that they were – the summer exams disaster seemed to mark a watershed moment in the public’s relationship with algorithms.

In recent years, the use of predictive technology has increased across a range of industries – including in some high-stakes use cases in areas such as insurance, employment, healthcare and even policing. It has, on occasion, courted negative headlines for the decisions it has made, the way it has made them, and the perceived opacity with which it has done so.

But the fallout from this summer’s grading scandal was the first time large volumes of citizens – accompanied by national journalists – took to the streets to protest the use of an algorithm, and the impact of technology-powered decision-making. It may not be the last.

Speaking to a range of experts in the days and weeks following the exams debacle, PublicTechnology first asked: what went wrong?

Many of those who protested would surely contend that an algorithm should never have been used to make such an important determination. Those whose idea it was to do so might prefer to suggest that the problem lay only in how the technology was designed and implemented.

Jeni Tennison, vice president of the Open Data Institute, told us: “The outcome that was being asked from the algorithm and from data was not within the scope of what the algorithm and data could provide. Different choices could have been made… in particular, how much to rely on using human methods to correct for teacher over-predictions, versus using data and algorithmic methods.

“They plumped for using data and algorithms to get to the result that they wanted – when really the data and algorithms were not capable of providing what they needed. Ofqual tried as hard as possible to make it fair, but the education system as a whole isn’t fair, so you end up replicating and accentuating the unfairness that is already in the system.”

Even having decided to pursue a method that relied so heavily on an algorithm, Tennison says three things could have mitigated the eventual problems.

The first of these is to have engaged more and earlier with data science experts and students, parents and teachers. The second is to have conducted more ongoing monitoring and testing of likely outcomes – including opening up the tool to scrutiny by third parties.

The final step the exams regulator could have taken is to adopt a more proactive approach to detecting errors made by the algorithm, and building in a system of redress for affected students. Tennison points to Northern Ireland, where a divergence of at least two grades from teacher-predicted results automatically prompted human review.

“When people have to appeal errors, that puts a big burden on them – especially when you’re talking about something as important as qualification results which dictate your future life opportunities,” she said. “They could have done more proactively to detect and create some redress around any errors that were in the algorithm.”

Eleonora Harwich, director of research and head of tech innovation at think tank Reform, echoes the view that the biggest mistake was “a lack of real engagement with people who might be able to help develop solutions” – such as the RSS, which expressed concerns and offered to advise on the design of the algorithm, but was told by Ofqual that its experts would need to sign a five-year non-disclosure agreement.

“Why would you ever need to sign an NDA?” Harwich asked. “In so far as is possible, there needs to be definite transparency and oversight of the process.”

She added: “It is just about bringing in those people who would be a critical friend. There were quite a lot of people voicing concerns, which leads to quite a lot of frustration for people that thought this would be an issue.”

Attila Tomaschek, a researcher at advocacy organisation ProPrivacy, said the exams algorithm – and any similar system – ought to be developed by as diverse a team as possible, to ensure that all perspectives and potential impacts are considered.

“They did not provide any real information or context of what they were doing to limit the bias or how they were working to remove those biases from the system,” he said. “In developing AI teams that develop these systems, there should be a push to have a greater level of diversity.”

Tomaschek added that regulation requiring greater openness would help. “There should be a level of transparency and these algorithms should be opened up to scrutiny by the public and by experts; that would go a long way,” he said.

Tackling transparency
This seems to be a view shared by the Centre for Data Ethics and Innovation, a unit set up within the Department for Digital, Culture, Media and Sport in 2018 to advise the government on ethical issues related to artificial intelligence and the use of personal data.

The CDEI recently published the findings of an 18-month review of bias in algorithmic decision making. The most significant of the three recommendations made by the report – which is quoted at the start of this article – is that the use of the technology in public services should be made to be open.

“Government should place a mandatory transparency obligation on all public sector organisations using algorithms that have an impact on significant decisions affecting individuals,” it said.
CDEI also recommended that organisations across all sectors “should be actively using data to identify and mitigate bias”, as well as urging an update to discrimination law to account for the prevalence of AI tools.

“Government should issue guidance that clarifies the application of the Equality Act to algorithmic decision making,” the review said. “This should include guidance on the collection of data to measure bias, as well as the lawfulness of bias-mitigation techniques – some of which risk introducing positive discrimination, which is illegal under the Equality Act.”

Professor Sandra Wachter, a senior research fellow at the Oxford Internet Institute focused on the ethical and legal issues of AI, echoes the view that discrimination law needs to be re-examined to consider the implications of algorithmic bias. She adds that this process may yet be prompted by “the tipping point where a court says this has gone too far”.

According to Wachter, there is currently invariably “a very high – almost impossible to escape – chance that there will be some form of bias” in algorithmic decision making.

In the case of students from more disadvantaged backgrounds being more likely to see their results downgraded, the algorithm is a symptom of an underlying problem.

“If you go to the doctor when you’re having pain, it is a symptom of something; you can give somebody a pain pill to fix the symptom… but what you really want to do is find out the cause of the pain,” Wachter said. “It is about using statistical methods that allow you to figure out what discrimination [is inherent].”

According to Wachter, the challenge of tackling bias should not be met solely by people in her field of computer science and AI.

At the OII, she works on the Governance of Emerging Technologies project that aims to bring together experts from different fields to examine the legal and ethical issues of AI. “The thing that is still missing is enough interaction between the disciplines,” Wachter says. “I work together with ethicists, computer scientists and lawyers.”

Harwich from Reform agrees that addressing the problem of bias will require input from those outside traditional STEM fields.

“We are dealing with questions that are part of social science – this is not the laws of physics,” she said.

“It is not so much to do with the tools themselves… it is more about the research method, and the structure that you’re going to build around that project to ensure that you have safeguards at different stages.”

“They plumped for using data and algorithms to get to the result that they wanted – when really they were not capable of doing so. Ofqual tried as hard as possible to make it fair, but the education system as a whole isn’t fair, so you end up replicating and accentuating the unfairness” Jeni Tennison, Open Data Institute

Battling bias
Many would contend that, so long as data is gathered in a world where injustice and inequality persists, it will always reflect and reinforce those problems. It is questionable whether the biases of the world can ever be removed from the data designed to tell us more about it.

“I think that stripping the bias out of data is not the right way of thinking about it,” Tennison from the ODI says. “Because every data set is biased through the context in which it’s collected, the mechanism by which it’s collected, the choices that are made about how to model things – data is always simplified, and is always a partial reflection of reality.”

She added: “Data can tell us about how the world was, and how it is – it can’t tell us about the world we want, because that’s to do with our values. What becomes important is how any output of an algorithm gets interpreted and used, versus various other bits of evidence that we might have to help inform a decision. We need to have a more nuanced understanding around what algorithms come out with – the fact that there is uncertainty in any kind of outcome from an algorithm, and how you treat that uncertainty and the people who are affected by it – become very important.”

Dr David Leslie, ethics theme lead at The Alan Turing Institute – the UK’s national institute for AI and data science – believes there are effective measures that can be taken during the development of systems to mitigate bias. But the wider context and the instance in which the technology is being used should always be borne in mind.

“If you have bias-aware practices – starting with the moment that you scan horizons and then onto thinking about how you formulate your problem and what your target variable is – in all of those moments of initial decision making, if there is bias awareness across that whole pipeline, then bias mitigation can be effective and help to rectify strains of societal discrimination and bias that cause discriminatory outcomes when you use these systems. But that doesn’t address bigger picture issues about how, at the very moment when we decide to use a system, there are larger issues – equality, social inequity – that externally frame the justifiability of using it.”

To help understand that external context, the Turing has previously hosted citizen juries to examine AI issues, and Leslie was involved with a project by the institute and Oxford’s Rees Centre which assessed the attitudes of families to the use of AI and data in the context of children’s social care.

“If data analytics is going to support decision making that isn’t biased against them – that is able to possibly correct some human biases in a way that a social-care worker assesses a situation – then families were encouraging of it,” he said. “At the same time, they were also very concerned that the data that was being used in those systems wasn’t accurate. So, there’s this balancing that, when used correctly, there could be a positive dimension of data analytics – but, at the same time, trepidation about the ways in which the quality of the data might not be helpful.”

What next?
In the case of grading A-level and GCSE students, another potential algorithm disaster has been headed off ahead of next summer – as, perhaps unsurprisingly, the technology will not be used.
Exams have, once again, been cancelled across the UK. But all four countries have made it clear they will be replaced with a system based on human assessments.

Or as education secretary Gavin Williamson put it: “This year we are going to put our trust in teachers rather than algorithms.”

But there are plenty of other potentially contentious uses of predictive decision-making.
During the same week that the exam-results story blew up – but receiving significantly fewer column inches – the Government Legal Department, acting on behalf of the Home Office, announced that the department would suspend the use of a “streaming” algorithm used to filter visa applications.

The program was subject to a High Court judicial review, launched by charity the Joint Council for the Welfare of Immigrants and campaign group Foxglove, which challenged the lawfulness of the tool and claimed it was inherently racist and discriminatory.

The Home Office pre-empted the court’s decision by announcing it would halt the use of the algorithm, pending a review and a redesign.

The department has previously put millions of pounds into controversial police trials of predictive analytics to combat knife crime, while government has also invested heavily in exploring the use of AI in cancer diagnosis.

With the public and the press becoming ever-more aware of the use of algorithms, each one of the potentially millions of individual decisions stemming from these uses of technology has the potential to become a national scandal.

Wachter from the OII believes that the increased public awareness exemplified by this summer’s outcry should be welcomed.

“Everybody is getting more worried about it – which is good, because there are obviously problems there, and it is important that we [address them],” she says. “People might be hesitant to engage with the technology because it is complex. But once you explain how it works, it is demystifying. What you’re left with is a certain set of problems that always emerge – you will always have a fairness problem, you will always have an explainability problem, a privacy problem. It is absolutely necessary and important to look at these problems, now that our chances in life can be decided by technology.”

Sam Trendall is editor of CSW’s sister title PublicTechnology, where a version of this feature first appeared