The Efforts to Make Text-Based AI Less Racist and Terrible

Language models like GPT-3 can write poetry, but they often amplify negative stereotypes. Researchers are trying different approaches to address the problem….

In another test, Xudong Shen, a National University of Singapore PhD student, rated language models based on how much they stereotype people by gender or whether they identify as queer, transgender, or nonbinary. He found that larger AI programs tended to engage in more stereotyping. Shen says the makers of large language models should correct these flaws. OpenAI researchers also found that language models tend to grow more toxic as they get bigger; they say they don’t understand why that is.

Text generated by large language models is coming ever closer to language that looks or sounds like it came from a human, yet it still fails to understand things requiring reasoning that almost all people understand. In other words, as some researchers put it, this AI is a fantastic bullshitter, capable of convincing both AI researchers and other people that the machine understands the words it generates.

UC Berkeley psychology professor Alison Gopnik studies how toddlers and young people learn to apply that understanding to computing. Children, she said, are the best learners, and the way kids learn language stems largely from their knowledge of and interaction with the world around them. Conversely, large language models have no connection to the world, making their output less grounded in reality.

“The definition of bullshitting is you talk a lot and it kind of sounds plausible, but there’s no common sense behind it,” Gopnik says.

Yejin Choi, an associate professor at the University of Washington and leader of a group studying common sense at the Allen Institute for AI, has put GPT-3 through dozens of tests and experiments to document how it can make mistakes. Sometimes it repeats itself. Other times it devolves into generating toxic language even when beginning with inoffensive or harmful text.

To teach AI more about the world, Choi and a team of researchers created PIGLeT, AI trained in a simulated environment to understand things about physical experience that people learn growing up, such as it’s a bad idea to touch a hot stove. That training led a relatively small language model to outperform others on common sense reasoning tasks. Those results, she said, demonstrate that scale is not the only winning recipe and that researchers should consider other ways to train models. Her goal: “Can we actually build a machine learning algorithm that can learn abstract knowledge about how the world works?”

Choi is also working on ways to reduce the toxicity of language models. Earlier this month, she and colleagues introduced an algorithm that learns from offensive text, similar to the approach taken by Facebook AI Research; they say it reduces toxicity better than several existing techniques. Large language models can be toxic because of humans, she says. “That’s the language that’s out there.”

Perversely, some researchers have found that attempts to fine-tune and remove bias from models can end up hurting marginalized people. In a paper published in April, researchers from UC Berkeley and the University of Washington found that Black people, Muslims, and people who identify as LGBT are particularly disadvantaged.

The authors say the problem stems, in part, from the humans who label data misjudging whether language is toxic or not. That leads to bias against people who use language differently than white people. Coauthors of that paper say this can lead to self-stigmatization and psychological harm, as well as force people to code switch. OpenAI researchers did not address this issue in their recent paper.

Jesse Dodge, a research scientist at the Allen Institute for AI, reached a similar conclusion. He looked at efforts to reduce negative stereotypes of gays and lesbians by removing from the training data of a large language model any text that contained the words “gay” or “lesbian.” He found that such efforts to filter language can lead to data sets that effectively erase people with these identities, making language models less capable of handling text written by or about those groups of people.

Dodge says the best way to deal with bias and inequality is to improve the data used to train language models instead of trying to remove bias after the fact. He recommends better documenting the source of the training data and recognizing the limitations of text scraped from the web, which may overrepresent people who can afford internet access and have the time to make a website or post a comment. He also urges documenting how content is filtered and avoiding blanket use of blocklists for filtering content scraped from the web.

Live Updates for COVID-19 CASES