NIST benchmarks show facial recognition technology still struggles to identify Black faces

NIST benchmarks suggest some facial recognition algorithms haven’t corrected historic bias — and are actually getting worse. …

Every few months, the U.S. National Institute of Standards and Technology (NIST) releases the results of benchmark tests it conducts on facial recognition algorithms submitted by companies, universities, and independent labs. A portion of these tests focus on demographic performance — that is, how often the algorithms misidentify a Black man as a white man, a Black woman as a Black man, and so on. Stakeholders are quick to say that the algorithms are constantly improving with regard to bias, but a VentureBeat analysis reveals a different story. In fact, our findings cast doubt on the notion that facial recognition algorithms are becoming better at recognizing people of color.

That isn’t surprising, as numerous studies have shown facial recognition algorithms are susceptible to bias. But the newest data point comes as some vendors push to expand their market share, aiming to fill the gap left by Amazon, IBM, Microsoft, and others with self-imposed moratoriums on the sale of facial recognition systems. In Detroit this summer, city subcontractor Rank One began supplying facial recognition to local law enforcement over the objections of privacy advocates and protestors. Last November, Los Angeles-based Trueface was awarded a contract to deploy computer vision tech at U.S. Air Force bases. And the list goes on.

Industrywide trends

NIST uses a mugshot corpus collected over 17 years to look for demographic errors in facial recognition algorithms. Specifically, it measures the rates at which:

  • White men are misidentified as Black men
  • White men are misidentified as different white men
  • Black men are misidentified as white men
  • Black men are misidentified as different Black men
  • White women are misidentified as Black women
  • White women are misidentified as different white women
  • Black women are misidentified as white women
  • Black women are misidentified as different Black women

NIST determines the error rate for each category — also known as the false match rate (FMR) — by recording how often an algorithm returns a wrong face for 10,000 mugshots. An FMR of .0001 implies one mistaken identity for every 1,000, while an FMR of .1 implies one mistake for every 10.

To get a sense of whether FMRs have decreased or increased in recent years, we plotted the algorithms’ FMRs from organizations with commercial deployments, as measured by NIST — two algorithms per organization. Comparing the performance of the two algorithms provided us an idea of bias over time.

NIST’s benchmarks don’t account for adjustments vendors make before the algorithms are deployed, and some vendors might never deploy the algorithms commercially. Because the algorithms submitted to NIST are often optimized for best overall accuracy, they’re also not necessarily representative of how facial recognition systems behave in the wild. As the AI Now Institute notes in its recent report: While current standards like the NIST benchmarks “are a step in the right direction, it would be premature to rely on them to assess performance … [because there] is currently no standard practice to document and communicate the histories and limits of benchmarking datasets … and thus no way to determine their applicability to a particular system or suitability for a given context.”

Still, the NIST benchmarks are perhaps the closest thing the industry has to an objective measure of facial recognition bias.

Rank One Computing

Rank One Computing, whose facial recognition software is currently being used by the Detroit Police Department (DPD), improved across all demographic categories from November 2019 to July 2020, particularly with respect to the number of Black women it misidentifies. However, the FMRs of its latest algorithm remain high; NIST reports that Rank One’s software misidentifies Black men between 1 and 2 times in 1,000 and Black women between 2 and 3 times in 1,000. That error rate could translate to substantial numbers, considering roughly 3.4 million of Detroit’s over 4 million residents are Black (according to the 2018 census).

Above: FMR rates as measured by NIST. Higher is worse.

Perhaps predictably, Rank One’s algorithm was involved in a wrongful arrest that some publications mistakenly characterized as the first of its kind in the U.S. (Following a firestorm of criticism, Rank One said it would add “legal means” to thwart misuse and the DPD pledged to limit facial recognition to violent crimes and home invasions.) In the case of the arrest, the DPD violated its own procedural rules, which restrict the use of the system to lead generation. But there’s evidence of bias in the transparency reports from the DPD, which show that nearly all (96 out of 98) of the photos Detroit police officers have run through Rank One’s software to date are of Black suspects.

Detroit’s three-year, $1 million facial recognition technology contract with DataWorks Plus, a reseller of Rank One’s algorithm, expired on July 24. But DataWorks agreed last year to extend its service contract through September 30. Beyond that, there’s nothing preventing the city’s IT department from servicing the software itself in perpetuity.

TrueFace

TrueFace’s technology, which early next year will begin powering facial recognition and weapon identification systems on a U.S. Air Force base, became worse at identifying Black women from October 2019 to July 2020. The latest version of the algorithm has an FMR between 0.015 and 0.020 for misidentifying Black women compared with the previous version’s FMR of between 0.010 and 0.015. U.S. Air Force Personnel Center statistics show there were more than 49,200 Black service members enlisted as of January 2020.

Above: FMR rates as measured by NIST. Higher is worse.

RealNetworks and AnyVision

Equally troubling are the results for algorithms from RealNetworks and from AnyVision, an alleged supplier for Israeli army checkpoints in the West Bank.

AnyVision, which recently raised $43 million from undisclosed investors, told Wired its facial recognition software has been piloted in hundreds of sites around the world, including schools in Putnam County, Oklahoma and Texas City, Texas. RealNetworks offers facial recognition for military drones and body cameras through a subsidiary called SAFR. After the Parkland, Florida school shooting in 2018, SAFR made its facial recognition tech free to schools across the U.S. and Canada.

While AnyVision’s and RealNetworks’ algorithms misidentify fewer Black women than before, they perform worse with Black men. Regarding other demographic groups, they show little to no improvement when measured against FMR.

Above: FMR rates as measured by NIST. Higher is worse.

Above: FMR rates as measured by NIST. Higher is worse.

NtechLab

NtechLab’s algorithm exhibits a comparable regression in FMR. The company, which gained notoriety for an app that allowed users to match pictures of people’s faces to a Russian social network, recently received a $3.2 million contract to deploy its facial recognition tools throughout Moscow. NtechLab also has contracts in Saint Petersburg and in Jurmala, Latvia.

Above: FMR rates as measured by NIST. Higher is worse.

While the company’s newest algorithm achieved reductions in FMR for white men and women, it performs worse with Black men than its predecessor. FMR in this category is closer to 0.005, up from just over 0.0025 in June 2019.

Gorilla Technologies

Another contender is Gorilla Technologies, which claims to have installed facial recognition technology in Taiwanese prisons. NIST data shows the company’s algorithm became measurably worse at identifying Black women and men. The newest version of Gorilla’s algorithm has an FMR score of between 0.004 and 0.005 for misidentifying Black women and a score of between 0.001 and 0.002 for misidentifying white women.

Above: FMR rates as measured by NIST. Higher is worse.

Dangerous applications

These are just a few examples of facial recognition algorithms whose biases have been exacerbated over time, at least according to NIST data. The trend points to the intractable problem of mitigating bias in AI systems, particularly computer vision systems. One issue in facial recognition is that the data sets used to train algorithms skew white and male. IBM found that 81% of people in the three face-image collections most widely cited in academic studies have lighter-colored skin. Academics have found that photographic technology and techniques can also favor lighter skin, including everything from Sepia-tinged film to low-contrast digital cameras.

The algorithms are often misused in the field, as well, which tends to amplify their underlying biases. A report from Georgetown Law’s Center on Privacy and Technology details how police feed facial recognition software flawed data, including composite sketches and pictures of celebrities who share physical features with suspects. The New York Police Department and others reportedly edit photos with blur effects and 3D modelers to make them more conducive to algorithmic face searches.

Whatever the reasons for the bias, an increasing number of cities and states have expressed concerns about facial recognition technology — particularly in the absence of federal guidelines. Oakland and San Francisco in California; Portland, Oregon; and Somerville, Massachusetts are among the metros where law enforcement is prohibited from using facial recognition. In Illinois, companies must get consent before collecting biometric information, including face images. And in Massachusetts, lawmakers are considering a moratorium on government use of any biometric surveillance system in the state.

Congress, too, has put forth a bill — the Facial Recognition and Biometric Technology Moratorium Act of 2020 — that would sharply limit federal government officials’ use of facial recognition systems. The bill’s introduction follows the European Commission’s consideration of a five-year moratorium on facial recognition in public places.

“Facial recognition is a uniquely dangerous form of surveillance. This is not just some Orwellian technology of the future — it’s being used by law enforcement agencies across the country right now, and doing harm to communities right now,” Fight for the Future deputy director Evan Greer said earlier this year in a statement regarding proposed legislation. “Facial recognition is the perfect technology for tyranny. It automates discriminatory policing … in our deeply racist criminal justice system. This legislation effectively bans law enforcement use of facial recognition in the United States. That’s exactly what we need right now. We give this bill our full endorsement.”

Live Updates for COVID-19 CASES