Artie releases tool to measure bias in speech recognition models

Researchers at Artie released a tool to detect bias in speech recognition systems. In a proof of concept, they find evidence of bias in a Google model. …

Last Chance: Register for Transform, VB’s AI event of the year, hosted online July 15-17.

Artie, a startup developing a platform for mobile games on social media, today released a data set and tool for detecting demographic bias in voice apps. The Artie Bias Corpus (ABC), which consists of audio files along with their transcriptions, aims to diagnose and mitigate the impact of factors like age, gender, and accent in voice recognition systems.

Speech recognition has come a long way since IBM’s Shoebox machine and Worlds of Wonder’s Julie doll. But despite progress made possible by AI, voice recognition systems today are at best imperfect — and at worst discriminatory. In a study commissioned by the Washington Post, popular smart speakers made by Google and Amazon were 30% less likely to understand non-American accents than those of native-born users. More recently, the Algorithmic Justice League’s Voice Erasure project found that that speech recognition systems from Apple, Amazon, Google, IBM, and Microsoft collectively achieve word error rates of 35% for African-American voices versus 19% for white voices.

The Artie Bias Corpus is a curated subset of Mozilla’s Common Voice corpus representing three gender classes, eight age ranges (from 18 to 80), and 17 different English accents. In addition to 2.4 hours of audio (1,712 individual clips) and transcriptions vetted by votes on the Common Voice web platform and native-speaker experts, it comprises self-identified, opt-in demographic data about speakers.

In a proof of concept, Artie researchers applied the Artie Bias Corpus to Mozilla’s open source DeepSpeech models, which were trained on at least one corpora with a known bias toward North American English. In another experiment, they evaluated gender bias in publicly available Google and Amazon U.S. English models.

According to the researchers, the DeepSpeech indeed showed a bias toward U.S. and Great British accents but not a gender bias. On the other hand, as of early December 2019, Google’s U.S. English model showed “statistically significant” gender bias compared with Amazon Transcribe’s U.S. English model, performing on average 6.4% worse on female speakers.

We’ve reached out to Google for comment and will update this article once we hear back.

“As voice technology becomes more common, we discover how fragile it can be … In some cases, demographic bias can render a technology unusable for someone because of their demographic,” Josh Meyer, lead scientist at Artie and a research fellow at Mozilla, wrote in a blog post. “Even for well-resourced languages like English, state of the art speech recognizers cannot understand all native accents reliably, and they often understand men better than women … The solution is to face the problem, and work toward solutions.”

Live Updates for COVID-19 CASES