How to Stop Your Data From Being Used to Train AI

Some companies let you opt out of allowing your content to be used for generative AI. Here’s how to take back (at least a little) control from ChatGPT, Google’s Gemini, and more….

On its help pages, OpenAI says ChatGPT web users without accounts should navigate to Settings and then uncheck Improve the model for everyone. If you have an account and are logged in through a web browser, select ChatGPT, Settings, Data Controls, and then turn off Chat History & Training. If you’re using ChatGPT’s mobile apps, go to Settings, pick Data Controls, and turn off Chat History & Training. Changing these settings, OpenAI’s support pages say, won’t sync across different browsers or devices, so you need to make the change everywhere you use ChatGPT.

OpenAI is about a lot more than ChatGPT. For its Dall-E 3 image generator, the startup has a form that allows you to send images to be removed from “future training datasets.” It asks for your name, email, whether you own the image rights or are getting in touch on behalf of a company, details of the image, and any uploads of the image(s). OpenAI also says if you have a “high volume” of images hosted online that you want removed from training data, then it may be “more efficient” to add GPTBot to the robots.txt file of the website where the images are hosted.

Traditionally a website’s robots.txt file—a simple text file that usually sits at websitename.com/robots.txt—has been used to tell search engines, and others, whether they can include your pages in their results. It can now also be used to tell AI crawlers not to scrape what you have published—and AI companies have said they’ll honor this arrangement.

Perplexity

Perplexity is a startup that uses AI to help you search the web and find answers to questions. Like all of the other software on this list, you are automatically opted in to having your interactions and data used to train Perplexity’s AI further. Turn this off by clicking on your account name, scrolling down to the Account section, and turning off the AI Data Retention toggle.

Quora

Image may contain Page Text File and Webpage

Quora via Matt Burgess

Quora says it “currently” doesn’t use answers to people’s questions, posts, or comments for training AI. It also hasn’t sold any user data for AI training, a spokesperson says. However, it does offer opt-outs in case this changes in the future. To do this, visit its Settings page, click to Privacy, and turn off the “Allow large language models to be trained on your content” option. Despite this choice, there are some Quora posts that may be used for training LLMs. If you reply to a machine-generated answer, the company’s help pages say, then those answers may be used for AI training. It points out that third parties may just scrape its content anyway.

Rev

Rev, a voice transcription service that uses both human freelancers and AI to transcribe audio, says it uses data “perpetually” and “anonymously” to train its AI systems. Even if you delete your account, it will still train its AI on that information.

Kendell Kelton, head of brand and corporate communications at Rev, says it has the “largest and most diverse data set of voices,” made up of more than 6.5 million hours of voice recording. Kelton says Rev does not sell user data to any third parties. The firm’s terms of service say data will be used for training, and that customers are able to opt out. People can opt out of their data being used by sending an email to support@rev.com, its help pages say.

Slack

All of those random Slack messages at work might be used by the company to train its models as well. “Slack has used machine learning in its product for many years. This includes platform-level machine-learning models for things like channel and emoji recommendations,” says Jackie Rocca, a vice president of product at Slack who’s focused on AI.

Even though the company does not use customer data to train a large language model for its Slack AI product, Slack may use your interactions to improve the software’s machine-learning capabilities. “To develop AI/ML models, our systems analyze Customer Data (e.g. messages, content, and files) submitted to Slack,” says Slack’s privacy page. Similar to Adobe, there’s not much you can do on an individual level to opt out if you’re using an enterprise account.

Live Updates for COVID-19 CASES