Twitter CTO on machine learning challenges: ‘I’m not proud that we miss a lot of misinformation’

Twitter is making big investments in machine learning to weed out manipulation on its platform. These efforts show the promise of ML as well as the limits. …

Last Chance: Register for Transform, VB’s AI event of the year, hosted online July 15-17.


Twitter considers itself a hub of global conversation but any regulator user knows how frequently the discourse veers into angry rants or misinformation. While the company’s investments in machine learning are intended to address these issues, executives understand the company has a long way to go.

According to Twitter CTO Parag Agrawal, it’s likely the company will never be able to declare victory because tools like conversational AI in the hands of adversaries continue to make the problems evolve rapidly. But Agrawal said he’s determined to turn the tide to help Twitter fulfill its potential for good.

“It’s become increasingly clear what our role is in the world,” Agrawal said. “It is to serve the public conversation. And these last few months, whether they be around the implications on public health due to COVID, or to have a conversation around racial injustices in this country, have emphasized the role of public conversation as a concept.”

Agrawal made his remarks during VentureBeat’s Transform 2020 conference in a conversation with VentureBeat founder and CEO Matt Marshall.

During the interview, Agrawal noted that Twitter has been investing more in trying to highlight positive and productive conversations. That led to the introduction of following topics as a way to get people out of silos and to discover a broader range of views.

That said, much of his work still focuses on adversaries who are trying to manipulate the public conversations and how they might use these new techniques. He broke these adversaries into four categories:

  1. Machine-powered bots.
  2. A machine-powered bot but with a human in the loop.
  3. An entirely human manipulator being coordinated by a single entity.
  4. Real accounts that get compromised by an adversary.

“Typically, an attempt at manipulating the conversation uses some combination of all of these four to achieve some sort of objective,” he said.

The most harmful are those bots that manage to disguise themselves successfully as humans using the most advanced conversational AI. “These mislead people into believing that they’re real people and allow people to be influenced by them,” he said.

This multi-layered strategy makes fighting manipulation extraordinarily complex. Worse, those techniques advance and change constantly. And the impact of bad content is swift.

“If a piece of content is going to matter in a good or a bad way, it’s going to have its impact within minutes and hours, and not days,” he said. “So, it’s not okay for me to wait a day for my model to catch up and learn what to do with it. And I need to learn in real-time.”

Twitter has won some praise recently for taking steps toward labeling misleading or violent tweets posted by President Trump when other platforms such as Facebook have been more reluctant to take action. Beyond those headline-making decisions, however, Agrawal said the task of monitoring the platform has grown even more difficult in recent months as issues like the pandemic and then Black Lives Matter sparked global conversations.

“We’ve had to work with an increased amount of passion on the service on whatever the topic of conversation because of the heightened importance of these topics,” he said. “And I’ve had to prioritize our work to best to help people and improve the health of the conversation during this time.”

Agrawal does believe the company is making progress.

“We quickly worked on a policy around misinformation around COVID-19 as we saw that threat emerge,” he said. “Our policy was meant specifically to mitigate harms. Out strategy in this space is not to tackle all misinformation in the world. There’s too much of it and we don’t have clinical approaches to navigate…Our efforts are not focused on determining what’s true or false. They’re focused on providing labels and annotations, so people can find easy access to reliable information, as well as the greater conversation around the topic so that they can make up their mind.”

The company will continue to expand its machine learning to flag bad content, he said. Currently, about 50% of enforcement actions involve content that is flagged for violating terms of service is caught by those machine learning systems.

Still, there remains a sense of disappointment that more has not been done. Agrawal acknowledges that, noting that the process of turning policy into standards that can be enforced by machine learning remains a practical challenge.

“We build systems,” he said. “That’s why we ground solutions in policy, and then build using product and technology and our processes. It’s designed to avoid biases. At the same time, it puts us in a situation where things move slower than most of us would like. It takes us a while to develop a process to scale, to have automation to enforce the policy. I’m not proud that we missed a large amount of misinformation even where we have a policy because we haven’t been able to build these automated systems.”

Live Updates for COVID-19 CASES