Five data-ethics and AI experts on... what we can learn from the qualifications algorithm fiasco

The Open Data Institute asked experts to explain where Ofqual and other qualifications regulators went wrong in this week's A-Level grade controversy

Students protesting their algorithm-determined A-Level results. Photo: Dominika Zarzycka/NurPhoto/PA Images

By Carly Kind, Jeni Tennison, Rachel Coldicutt, Swee Leng Harris, Cori Crider

21 Aug 2020

Algorithms are increasingly encroaching into our lives, and being used to make decisions that affect our futures.

This has been brought into sharp relief by the recent furore around the use of algorithms to grade GCSE, AS, A level and BTEC results. Teachers' grade predictions tend to favour students by giving them the benefit of the doubt, so Ofqual and other qualifications regulators attempted to adjust them to bring them into line with prior results.

But the results were traumatic, with some A-Level grades marked down from predicted results and students consequently losing hoped-for university places. People took to the streets with “F**k the algorithm” chants. The government and Ofqual went from backing their approach as the least-worst solution, and insisting that grades needed to stay in line with prior results, to abandoning it and guaranteeing students could use whichever was better out of their teacher or algorithm prediction.

"The government and Ofqual went from backing their approach as the least-worst solution, to abandoning it"

This experience will join a litany of touchstone examples – care.data, DeepMind/Royal Free, and Facebook/Cambridge Analytica – that illustrate how public trust in the use of data, algorithms and AI can be irreparably damaged.

We asked experts in data ethics and responsible AI to describe where they think Ofqual and other qualifications regulators went wrong.

Carly Kind, Ada Lovelace Institute

The A-Level results algorithm has done immeasurable harm to trust and confidence in the use of algorithmic systems by public bodies. Images of disadvantaged students directing their legitimate protest against injustice at an algorithm will persist in people’s minds.

How should the government rebuild trust in data-driven decision-making? Our report Confidence in a crisis? analysed the public’s response to another recent instance of public-sector technology deployment: the NHS contact tracing app.

"The A-Level results algorithm has done immeasurable harm to trust and confidence in public bodies' use of algorithmic systems"

We distilled four conditions for trustworthy technology deployment that are as applicable to public-sector algorithms as they are to public-sector apps:

Provide the public with a transparent evidence base. The public wants to see clear, accessible evidence on whether technologies (and algorithms) are effective and accurate.
Offer independent assessment and review. Trust can be strengthened with the inclusion of independent reviewers, assessors and evaluators to shape the development and use of algorithmic systems.
Clarify boundaries on data use, rights and responsibilities. It must be easy to discover what data would be held, by whom, for what purpose and for how long.
Proactively address the needs of, and risks relating to, vulnerable groups. People want reassurance that a system is fair for everyone, and that benefits will be distributed equally.

The failure of the A-Level algorithm highlights the need for a more transparent, accountable and inclusive process in the deployment of algorithms, to earn back public trust.

Jeni Tennison, Open Data Institute

Providing pupils with grades without exam results is an extraordinary ask in extraordinary times. It’s unsurprising that qualifications regulators turned to data and algorithms to do so, given the growing trend in the public sector.

But the mess that has unfolded illustrates the limits of data-driven approaches to make decisions about people’s futures as the data is frequently biased, simplified, and of poor quality.

"It’s clear Ofqual was aware of the challenges around subjective things like the definition of fairness and tried to counter them, but not to the extent it should have"

Reading Ofqual’s technical report, it’s clear it was aware of the challenges around subjective things like the definition of fairness, and that it tried to counter them, but not to the extent it should have:

Transparency and engagement: While Ofqual did make information about the algorithm public, it was only transparent about the details on A-Level results day. It should have been open throughout the process and engaged with education and data experts.
Detecting and redressing errors: Any algorithm produces errors; the challenge is to build a surrounding process that identifies and corrects them quickly and with the minimum long-term impact on those affected.
Monitoring and evaluation: Ofqual’s technical report indicates that it did examine the overall impact of the algorithm on the distribution of grades to students from different genders, ethnicities, socioeconomic backgrounds and so forth. However, this did not detect the problems favouring private school pupils that were picked up in the press.

The bigger picture here is around choosing the right balance between algorithmic and human decision making. Regulators could have chosen an approach that enabled teachers to work together to standardise grade predictions, rather than relying on an algorithm. No matter how well algorithmic solutions are done, they are not magic bullets.

Rachel Coldicutt, independent technologist, formerly Doteveryone

Exams are contentious and unfair for all sorts of reasons that have nothing to do with technology, but people know how they work and accept the risks involved.

Automated decisions, meanwhile bring high-definition clarity to unfairness. Bias, prejudice and unfair advantages that are hidden in daily life are given focus and coherence when analysed at speed and at scale.

Dropping an algorithm into systems with time-sensitive dependencies – university admissions and job offers – was always going to require care and diligence, as well as frank and clear public communication. Unfortunately, none of those were present.

"A data-driven government must be accompanied by transparency and openness"

A data-driven government must be accompanied by transparency and openness. But how to do it differently?

Underlying models, assumptions and success criteria could have been made available for external scrutiny and challenge.
System should have been designed inclusively and robustly tested to make sure it accommodated the outliers and wildcards.
Real-world consequences should have been considered. The appeals process should have been set up and communicated in advance, and a buffer period imposed on universities confirming places.
Clearer and more frequent communication should have been used.

Swee Leng Harris, Luminate

The rule of law requires that government and public agencies act in accordance with the law, including when using algorithmic systems. There was a lack of due consideration by Ofqual of two key aspects of law:

Ofqual’s power as defined by law: As Foxglove argued, the algorithmic system graded individuals based on their school’s historic performance and their rank at their school, which was not a reliable indication of individual students’ knowledge, understanding, or achievement as required under the Apprenticeships, Skills, Children and Learning Act.
Article 22 of the GDPR on automated decision making: Ofqual’s Privacy Impact Statement (PIS), asserts that Article 22 does not apply as there was human intervention in the algorithmic grading system. Ofqual’s pointing to human involvement in inputs misunderstands automated decision-making, and if human intervention in the signing off was not meaningful – e.g. if all of algorithmic awards were signed off without change – then it was automated decision making under Article 22.

"Government and public agencies must understand the applicable laws and how their algorithmic systems operate. This means understanding a system as a whole, including the role of people interacting with the technology"

Government and public agencies must understand the applicable laws and how their algorithmic systems operate. This means understanding a system as a whole, including the role of people interacting with the technology, to understand whether it complies with the law.

Cori Crider, Foxglove

This fiasco was political, not technical. Several lessons emerge:

State your purpose honestly. Imagine the government had simply said: “To avoid a one-time uptick in grade inflation, we’ve substituted teachers’ grades with a statistical prediction based on schools’ prior performance. This will limit inflation, but lead to unfair results, such as downgrading bright students in large subjects and struggling schools.”

Had the government explained what it had prioritised – and who it had left behind – it might have corrected course sooner.

No more permissionless systems. No one sought a public mandate for algorithmic grading. It was not the public’s job to submit statistical analyses to the Ofqual consultation website. For decisions which affect thousands of lives, meaningful democratic engagement is not a step you can skip.

Don’t be afraid of scrutiny. Experts from the Royal Statistical Society saw risks early and offered to assist. Ofqual demanded a five-year silence from them. Understandably, they refused.

Don’t assume the algorithm. Procedural protections – reducing bias, increasing transparency – can set a floor of fairness. But focusing on them can elide a key question: when are algorithmic systems democratically acceptable? We have rejected the grading algorithm. Is one appropriate for visas? Policing? Benefits? We need to engage this core democratic question.