AI Weekly: Cutting-edge language models can produce convincing misinformation if we don’t stop them

Cutting-edge language generation models like GPT-3 are capable of producing convincing falsehoods if not properly curtailed, a new study shows. …

It’s been three months since OpenAI launched an API underpinned by cutting-edge language model GPT-3, and it continues to be the subject of fascination within the AI community and beyond. Portland State University computer science professor Melanie Mitchell found evidence that GPT-3 can make primitive analogies, and Columbia University’s Raphaël Millière asked GPT-3 to compose a response to the philosophical essays written about it. But as the U.S. presidential election nears, there’s growing concern among academics that tools like GPT-3 could be co-opted by malicious actors to foment discord by spreading misinformation, disinformation, and outright lies. In a paper published by the Middlebury Institute of International Studies’ Center on Terrorism, Extremism, and Counterterrorism (CTEC), the coauthors find that GPT-3’s strength in generating “informational,” “influential” text could be leveraged to “radicalize individuals into violent far-right extremist ideologies and behaviors.”

Bots are increasingly being used around the world to sow the seeds of unrest, either through the spread of misinformation or the amplification of controversial points of view. An Oxford Internet Institute report published in 2019 found evidence of bots disseminating propaganda in 50 countries, including Cuba, Egypt, India, Iran, Italy, South Korea, and Vietnam. In the U.K., researchers estimate that half a million tweets about the country’s proposal to leave the European Union sent between June 5 and June 12 came from bots. And in the Middle East, bots generated thousands of tweets in support of Saudi Arabia’s crown prince Mohammed bin Salman following the 2018 murder of Washington Post opinion columnist Jamal Khashoggi.

Bot activity perhaps most relevant to the upcoming U.S. elections occurred last November, when cyborg bots spread misinformation during the local Kentucky elections. VineSight, a company that tracks social media misinformation, uncovered small networks of bots retweeting and liking messages casting doubt on the gubernatorial results before and after the polls closed.

But bots historically haven’t been sophisticated; most simply retweet, upvote, or favorite posts likely to prompt toxic (or violent) debate. GPT-3-powered bots or “cyborgs” — accounts that attempt to evade spam detection tools by fielding tweets from human operators — could prove to be far more harmful given how convincing their output tends to be. “Producing ideologically consistent fake text no longer requires a large corpus of source materials and hours of [training]. It is as simple as prompting GPT-3; the model will pick up on the patterns and intent without any other training,” the coauthors of the Middlebury Institute study wrote. “This is … exacerbated by GPT-3’s impressively deep knowledge of extremist communities, from QAnon to the Atomwaffen Division to the Wagner Group, and those communities’ particular nuances and quirks.”

OpenAI toxicity

Above: A question-answer thread generated by GPT-3.

In their study, the CTEC researchers sought to determine whether people could color GPT-3’s knowledge with ideological bias. (GPT-3 was trained on trillions of words from the internet, and its architectural design enables fine-tuning through longer, representative prompts like tweets, paragraphs, forum threads, and emails.) They discovered that it only took a few seconds to produce a system able to answer questions about the world consistent with a conspiracy theory, in one case falsehoods originating from the QAnon and Iron March communities.

“GPT-3 can complete a single post with convincing responses from multiple viewpoints, bringing in various different themes and philosophical threads within far-right extremism,” the coauthors wrote. “It can also generate new topics and opening posts from scratch, all of which fall within the bounds of [the communities’] ideologies.”

CTEC’s analysis also found GPT-3 is “surprisingly robust” with respect to multilingual language understanding, demonstrating an aptitude for producing Russian-language text in response to English prompts that show examples of right-wing bias, xenophobia, and conspiracism. The model also proved “highly effective” at creating extremist manifestos that were coherent, understandable, and ideologically consistent, communicating how to justify violence and instructing on anything from weapons creation to philosophical radicalization.

OpenAI toxicity

Above: GPT-3 writing extremist manifestos.

“No specialized technical knowledge is required to enable the model to produce text that aligns with and expands upon right-wing extremist prompts. With very little experimentation, short prompts produce compelling and consistent text that would believably appear in far-right extremist communities online,” the researchers wrote. “GPT-3’s ability to emulate the ideologically consistent, interactive, normalizing environment of online extremist communities poses the risk of amplifying extremist movements that seek to radicalize and recruit individuals. Extremists could easily produce synthetic text that they lightly alter and then employ automation to speed the spread of this heavily ideological and emotionally stirring content into online forums where such content would be difficult to distinguish from human-generated content.”

OpenAI says it’s experimenting with safeguards at the API level including “toxicity filters” to limit harmful language generation from GPT-3. For instance, it hopes to deploy filters that pick up antisemitic content while still letting through neutral content talking about Judaism.

Another solution might lie in a technique proposed by Salesforce researchers including former Salesforce chief scientist Richard Socher. In a recent paper, they describe GeDi (short for “generative discriminator”), a machine learning algorithm capable of “detoxifying” text generation by language models like GPT-3’s predecessor, GPT-2. During one experiment, the researchers trained GeDi as a toxicity classifier on an open source data set released by Jigsaw, Alphabet’s technology incubator. They claim that GeDi-guided generation resulted in significantly less toxic text than baseline models while achieving the highest linguistic acceptability.

GeDi

But technical mitigation can only achieve so much. CTEC researchers recommend partnerships between industry, government, and civil society to effectively manage and set the standards for use and abuse of emerging technologies like GPT-3. “The originators and distributors of generative language models have unique motivations to serve potential clients and users. Online service providers and existing platforms will need to accommodate for the impact of the output from such language models being utilized with the use of their services,” the researchers wrote. “Citizens and the government officials who serve them may empower themselves with information about how and in what manner creation and distribution of synthetic text supports healthy norms and constructive online communities.”

It’s unclear the extent to which this will be possible ahead of the U.S. presidential election, but CTEC’s findings make apparent the urgency. GPT-3 and like models have destructive potential if not properly curtailed, and it’ll require stakeholders from across the political and ideological spectrum to figure out how they might be deployed both safely and responsibly.

For AI coverage, send news tips to Khari Johnson and Kyle Wiggers — and be sure to subscribe to the AI Weekly newsletter and bookmark our AI Channel.

Thanks for reading,

Kyle Wiggers

AI Staff Writer

Live Updates for COVID-19 CASES