Don’t End Up on This Artificial Intelligence Hall of Shame

A list of incidents that caused, or nearly caused, harm aims to prompt developers to think more carefully about the tech they create….

When a person dies in a car crash in the US, data on the incident is typically reported to the National Highway Traffic Safety Administration. Federal law requires that civilian airplane pilots notify the National Transportation Safety Board of in-flight fires and some other incidents.

The grim registries are intended to give authorities and manufacturers better insights on ways to improve safety. They helped inspire a crowdsourced repository of artificial intelligence incidents aimed at improving safety in much less regulated areas, such as autonomous vehicles and robotics. The AI Incident Database launched late in 2020 and now contains 100 incidents, including #68, the security robot that flopped into a fountain, and #16, in which Google’s photo organizing service tagged Black people as “gorillas.” Think of it as the AI Hall of Shame.

The AI Incident Database is hosted by Partnership on AI, a nonprofit founded by large tech companies to research the downsides of the technology. The roll of dishonor was started by Sean McGregor, who works as a machine learning engineer at voice processor startup Syntiant. He says it’s needed because AI allows machines to intervene more directly in people’s lives, but the culture of software engineering does not encourage safety.

“Often I’ll speak with my fellow engineers and they’ll have an idea that is quite smart, but you need to say ‘Have you thought about how you’re making a dystopia?’” McGregor says. He hopes the incident database can work as both a carrot and stick on tech companies, by providing a form of public accountability that encourages companies to stay off the list, while helping engineering teams craft AI deployments less likely to go wrong.

The database uses a broad definition of an AI incident as a “situation in which AI systems caused, or nearly caused, real-world harm.” The first entry in the database collects accusations that YouTube Kids displayed adult content, including sexually explicit language. The most recent, #100, concerns a glitch in a French welfare system that can incorrectly determine people owe the state money. In between there are autonomous vehicle crashes, like Uber’s fatal incident in 2018, and wrongful arrests due to failures of automatic translation or facial recognition.

Anyone can submit an item to the catalog of AI calamity. McGregor approves additions for now and has a sizable backlog to process but hopes eventually the database will become self-sustaining and an open source project with its own community and curation process. One of his favorite incidents is an AI blooper by a face-recognition-powered jaywalking-detection system in Ningbo, China, which incorrectly accused a woman whose face appeared in an ad on the side of a bus.

The 100 incidents logged so far include 16 involving Google, more than any other company. Amazon has seven, and Microsoft two.  “We are aware of the database and fully support the partnership’s mission and aims in publishing the database,” Amazon said in a statement. “Earning and maintaining the trust of our customers is our highest priority, and we have designed rigorous processes to continuously improve our services and customers’ experiences.” Google and Microsoft did not respond to requests for comment.

Georgetown’s Center for Security and Emerging Technology is trying to make the database more powerful. Entries are currently based on media reports, such as incident 79, which cites WIRED reporting on an algorithm for estimating kidney function that by design rates Black patients’ disease as less severe. Georgetown students are working to create a companion database that includes details of an incident, such as whether the harm was intentional or not, and whether the problem algorithm acted autonomously or with human input.

Helen Toner, director of strategy at CSET, says that exercise is informing research on the potential risks of AI accidents. She also believes the database shows how it might be a good idea for lawmakers or regulators eyeing AI rules to consider mandating some form of incident reporting, similar to that for aviation.

EU and US officials have shown growing interest in regulating AI, but the technology is so varied and broadly applied that crafting clear rules that won’t be quickly outdated is a daunting task. Recent draft proposals from the EU were accused variously of overreach, techno-illiteracy, and being full of loopholes. Toner says requiring reporting of AI accidents could help ground policy discussions. “I think it would be wise for those to be accompanied by feedback from the real world on what we are trying to prevent and what kinds of things are going wrong,” she says.

Live Updates for COVID-19 CASES