On March 14, OpenAI released the successor to ChatGPT: GPT-4. It impressed observers with its markedly improved performance across reasoning, retention, and coding. It also fanned fears around AI safety, around our ability to control these increasingly powerful models. But that debate obscures the fact that, in many ways, GPT-4âs most remarkable gains, compared to similar models in the past, have been around safety.
According to the companyâs Technical Report, during GPT-4âs development, OpenAI âspent six months on safety research, risk assessment, and iteration.â OpenAI reported that this work yielded significant results: âGPT-4 is 82% less likely to respond to requests for disallowed content and 40% more likely to produce factual responses than GPT-3.5 on our internal evaluations.â (ChatGPT is a slightly tweaked version of GPT-3.5: if youâve been using ChatGPT over the last few months, youâve been interacting with GPT-3.5.)
This demonstrates a broader point: For AI companies, there are significant competitive advantages and profit incentives for emphasizing safety. The key success of ChatGPT over other companiesâ large language models (LLMs) â apart from a nice user interface and remarkable word-of-mouth buzz â is precisely its safety. Even as it rapidly grew to over 100 million users, it hasnât had to be taken down or significantly tweaked to make it less harmful (and less useful).
Tech companies should be investing heavily in safety research and testing for all our sakes, but also for their own commercial self-interest. That way, the AI model works as intended, and these companies can keep their tech online. ChatGPT Plus is making money, and you canât make money if youâve had to take your language model down. OpenAIâs reputation has been increased by its tech being safer than its competitors, while other tech companies have had their reputations hit by their tech being unsafe, and even having to take it down. (Disclosure: I am listed in the acknowledgments of the GPT-4 System Card, but I have not shown the draft of this story to anyone at OpenAI, nor have I taken funding from the company.)
The competitive advantage of AI safety
Just ask Mark Zuckerberg. When Meta released its large language model BlenderBot 3 in August 2022, it immediately faced problems of making inappropriate and untrue statements. Metaâs Galactica was only up for three days in November 2022 before it was withdrawn after it was shown confidently âhallucinatingâ (making up) academic papers that didnât exist. Most recently, in February 2023, Meta irresponsibly released the full weights of its latest language model, LLaMA. As many experts predicted would happen, it proliferated to 4chan, where it will be used to mass-produce disinformation and hate.
I and my co-authors warned about this five years ago in a 2018 report called âThe Malicious Use of Artificial Intelligence,â while the Partnership on AI (Meta was a founding member and remains an active partner) had a great report on responsible publication in 2021. These repeated and failed attempts to âmove fast and break thingsâ have probably exacerbated Metaâs trust problems. In surveys from 2021 of AI researchers and the US public on trust in actors to shape the development and use of AI in the public interest, âFacebook [Meta] is ranked the least trustworthy of American tech companies.â
But itâs not just Meta. The original misbehaving machine learning chatbot was Microsoftâs Tay, which was withdrawn 16 hours after it was released in 2016 after making racist and inflammatory statements. Even Bing/Sydney had some very erratic responses, including declaring its love for, and then threatening, a journalist. In response, Microsoft limited the number of messages one could exchange, and Bing/Sydney no longer answers questions about itself.
We now know Microsoft based it on OpenAIâs GPT-4; Microsoft invested $11 billion into OpenAI in return for OpenAI running all their computing on Microsoftâs Azure cloud and becoming their âpreferred partner for commercializing new AI technologies.â But it is unclear why the model responded so strangely. It could have been an early, not fully safety-trained version, or it could be due to its connection to search and thus its ability to âreadâ and respond to an article about itself in real time. (By contrast, GPT-4âs training data only runs up to September 2021, and it does not have access to the web.) Itâs notable that even as it was heralding its new AI models, Microsoft recently laid off its AI ethics and society team.
OpenAI took a different path with GPT-4, but itâs not the only AI company that has been putting in the work on safety. Other leading labs have also been making clear their commitments, with Anthropic and DeepMind publishing their safety and alignment strategies. These two labs have also been safe and cautious with the development and deployment of Claude and Sparrow, their respective LLMs.
A playbook for best practices
Tech companies developing LLMs and other forms of cutting-edge, impactful AI should learn from this comparison. They should adopt the best practice as shown by OpenAI: Invest in safety research and testing before releasing.
What does this look like specifically? GPT-4âs System Card describes four steps OpenAI took that could be a model for other companies.
First, prune your dataset for toxic or inappropriate content. Second, train your system with reinforcement learning from human feedback (RLHF) and rule-based reward models (RBRMs). RLHF involves human labelers creating demonstration data for the model to copy and ranking data (âoutput A is preferred to output Bâ) for the model to better predict what outputs we want. RLHF produces a model that is sometimes overcautious, refusing to answer or hedging (as some users of ChatGPT will have noticed).
RBRM is an automated classifier that evaluates the modelâs output on a set of rules in multiple-choice style, then rewards the model for refusing or answering for the right reasons and in the desired style. So the combination of RLHF and RBRM encourages the model to answer questions helpfully, refuse to answer some harmful questions, and distinguish between the two.
Third, provide structured access to the model through an API. This allows you to filter responses and monitor for poor behavior from the model (or from users). Fourth, invest in moderation, both by humans and by automated moderation and content classifiers. For example, OpenAI used GPT-4 to create rule-based classifiers that flag model outputs that could be harmful.
This all takes time and effort, but itâs worth it. Other approaches can also work, like Anthropicâs rule-following Constitutional AI, which leverages RL from AI feedback (RLAIF) to complement human labelers. As OpenAI acknowledges, their approach is not perfect: the model still hallucinates and can still sometimes be tricked into providing harmful content. Indeed, thereâs room to go beyond and improve upon OpenAIâs approach, for example by providing more compensation and career progression opportunities for the human labelers of outputs.
Has OpenAI become less open? If this means less open source, then no. OpenAI adopted a âstaged releaseâ strategy for GPT-2 in 2019 and an API in 2020. Given Metaâs 4chan experience, this seems justified. As Ilya Sutskever, OpenAI chief scientist, noted to The Verge: âI fully expect that in a few years itâs going to be completely obvious to everyone that open-sourcing AI is just not wise.â
GPT-4 did have less information than previous releases on âarchitecture (including model size), hardware, training compute, dataset construction, training method.â This is because OpenAI is concerned about acceleration risk: âthe risk of racing dynamics leading to a decline in safety standards, the diffusion of bad norms, and accelerated AI timelines, each of which heighten societal risks associated with AI.â
Providing those technical details would speed up the overall rate of progress in developing and deploying powerful AI systems. However, AI poses many unsolved governance and technical challenges: For example, the US and EU wonât have detailed safety technical standards for high-risk AI systems ready until early 2025.
Thatâs why I and others believe we shouldnât be speeding up progress in AI capabilities, but we should be going full speed ahead on safety progress. Any reduced openness should never be an impediment to safety, which is why itâs so useful that the System Card shares details on safety challenges and mitigation techniques. Even though OpenAI seems to be coming around to this view, theyâre still at the forefront of pushing forward capabilities, and should provide more information on how and when they envisage themselves and the field slowing down.
AI companies should be investing significantly in safety research and testing. It is the right thing to do and will soon be required by regulation and safety standards in the EU and USA. But also, it is in the self-interest of these AI companies. Put in the work, get the reward.
Haydn Belfield has been academic project manager at the University of Cambridgeâs Centre for the Study of Existential Risk (CSER) for the past six years. He is also an associate fellow at the Leverhulme Centre for the Future of Intelligence.
<div class="c-article-footer c-article-footer-cta" data-cid="site/article_footer-1679751024_7098_78971" data-cdata="{"base_type":"Entry","id":23419123,"timestamp":1679743800,"published_timestamp":1679743800,"show_published_and_updated_timestamps":false,"title":"If your AI model is going to sell, it has to be safe ","type":"Article","url":"https://www.vox.com/future-perfect/2023/3/25/23655082/ai-openai-gpt-4-safety-microsoft-facebook-meta","entry_layout":{"key":"unison_standard","layout":"unison_main","template":"standard"},"additional_byline":null,"authors":[{"id":10344514,"name":"Haydn Belfield","url":"https://www.vox.com/authors/haydn-belfield","twitter_handle":"","profile_image_url":"https://www.vox.com/images/unison/placeholders/profile/medium.png","title":"","email":null}],"byline_enabled":true,"byline_credit_text":"By","byline_serial_comma_enabled":true,"comment_count":0,"comments_enabled":false,"legacy_comments_enabled":false,"coral_comments_enabled":false,"coral_comment_counts_enabled":false,"commerce_disclosure":null,"community_name":"Vox","community_url":"https://www.vox.com/","community_logo":"\r\n\r\n \r\n
vox-mark\r\n \r\n \r\n \r\n \r\n \r\n","cross_community":false,"groups":[{"base_type":"EntryGroup","id":76815,"timestamp":1679743803,"title":"Future Perfect","type":"SiteGroup","url":"https://www.vox.com/future-perfect","slug":"future-perfect","community_logo":"\r\n\r\n \r\n
vox-mark\r\n \r\n \r\n \r\n \r\n \r\n","community_name":"Vox","community_url":"https://www.vox.com/","cross_community":false,"entry_count":1522,"always_show":false,"description":"Finding the best ways to do good. ","disclosure":"","cover_image_url":"","cover_image":null,"title_image_url":"https://cdn.vox-cdn.com/uploads/chorus_asset/file/16290809/future_perfect_sized.0.jpg","intro_image":null,"four_up_see_more_text":"View All","primary":true},{"base_type":"EntryGroup","id":27524,"timestamp":1679747423,"title":"Technology","type":"SiteGroup","url":"https://www.vox.com/technology","slug":"technology","community_logo":"\r\n\r\n \r\n
vox-mark\r\n \r\n \r\n \r\n \r\n \r\n","community_name":"Vox","community_url":"https://www.vox.com/","cross_community":false,"entry_count":24333,"always_show":false,"description":"Uncovering and explaining how our digital world is changing â and changing us.","disclosure":"","cover_image_url":"","cover_image":null,"title_image_url":"","intro_image":null,"four_up_see_more_text":"View All","primary":false},{"base_type":"EntryGroup","id":80311,"timestamp":1679743803,"title":"Artificial Intelligence","type":"SiteGroup","url":"https://www.vox.com/artificial-intelligence","slug":"artificial-intelligence","community_logo":"\r\n\r\n \r\n
vox-mark\r\n \r\n \r\n \r\n \r\n \r\n","community_name":"Vox","community_url":"https://www.vox.com/","cross_community":false,"entry_count":340,"always_show":false,"description":"Vox’s coverage of artificial intelligence.","disclosure":"","cover_image_url":"","cover_image":null,"title_image_url":"","intro_image":null,"four_up_see_more_text":"View All","primary":false},{"base_type":"EntryGroup","id":102794,"timestamp":1679743803,"title":"Innovation","type":"SiteGroup","url":"https://www.vox.com/innovation","slug":"innovation","community_logo":"\r\n\r\n \r\n
vox-mark\r\n \r\n \r\n \r\n \r\n \r\n","community_name":"Vox","community_url":"https://www.vox.com/","cross_community":false,"entry_count":137,"always_show":false,"description":"","disclosure":"","cover_image_url":"","cover_image":null,"title_image_url":"","intro_image":null,"four_up_see_more_text":"View All","primary":false}],"internal_groups":[{"base_type":"EntryGroup","id":112403,"timestamp":1679743803,"title":"Approach â Dissects something complicated","type":"SiteGroup","url":"","slug":"approach-dissects-something-complicated","community_logo":"\r\n\r\n \r\n
vox-mark\r\n \r\n \r\n \r\n \r\n \r\n","community_name":"Vox","community_url":"https://www.vox.com/","cross_community":false,"entry_count":53,"always_show":false,"description":"","disclosure":"","cover_image_url":"","cover_image":null,"title_image_url":"","intro_image":null,"four_up_see_more_text":"View All"}],"image":{"ratio":"*","original_url":"https://cdn.vox-cdn.com/uploads/chorus_image/image/72113519/GettyImages_1249183770.0.jpg","network":"unison","bgcolor":"white","pinterest_enabled":false,"caption":null,"credit":"CFOTO/Future Publishing via Getty Images","focal_area":{"top_left_x":1680,"top_left_y":1180,"bottom_right_x":2320,"bottom_right_y":1820},"bounds":[0,0,4000,3000],"uploaded_size":{"width":4000,"height":3000},"focal_point":null,"image_id":72113519,"alt_text":"A hand holding a phone in front of a screen with the OpenAI logo and the term GPT-4."},"hub_image":{"ratio":"*","original_url":"https://cdn.vox-cdn.com/uploads/chorus_image/image/72113519/GettyImages_1249183770.0.jpg","network":"unison","bgcolor":"white","pinterest_enabled":false,"caption":null,"credit":"CFOTO/Future Publishing via Getty Images","focal_area":{"top_left_x":1680,"top_left_y":1180,"bottom_right_x":2320,"bottom_right_y":1820},"bounds":[0,0,4000,3000],"uploaded_size":{"width":4000,"height":3000},"focal_point":null,"image_id":72113519,"alt_text":"A hand holding a phone in front of a screen with the OpenAI logo and the term GPT-4."},"lede_image":{"ratio":"*","original_url":"https://cdn.vox-cdn.com/uploads/chorus_image/image/72113522/GettyImages_1249183770.0.jpg","network":"unison","bgcolor":"white","pinterest_enabled":false,"caption":null,"credit":"CFOTO/Future Publishing via Getty Images","focal_area":{"top_left_x":1680,"top_left_y":1180,"bottom_right_x":2320,"bottom_right_y":1820},"bounds":[0,0,4000,3000],"uploaded_size":{"width":4000,"height":3000},"focal_point":null,"image_id":72113522,"alt_text":"A hand holding a phone in front of a screen with the OpenAI logo and the term GPT-4."},"group_cover_image":null,"picture_standard_lead_image":{"ratio":"*","original_url":"https://cdn.vox-cdn.com/uploads/chorus_image/image/72113522/GettyImages_1249183770.0.jpg","network":"unison","bgcolor":"white","pinterest_enabled":false,"caption":null,"credit":"CFOTO/Future Publishing via Getty Images","focal_area":{"top_left_x":1680,"top_left_y":1180,"bottom_right_x":2320,"bottom_right_y":1820},"bounds":[0,0,4000,3000],"uploaded_size":{"width":4000,"height":3000},"focal_point":null,"image_id":72113522,"alt_text":"A hand holding a phone in front of a screen with the OpenAI logo and the term GPT-4.","picture_element":{"html":{},"alt":"A hand holding a phone in front of a screen with the OpenAI logo and the term GPT-4.","default":{"srcset":"https://cdn.vox-cdn.com/thumbor/KQBArTbJbX5xPQW71GDb-jALUto=/0x0:4000×3000/320×240/filters:focal(1680×1180:2320×1820)/cdn.vox-cdn.com/uploads/chorus_image/image/72113522/GettyImages_1249183770.0.jpg 320w, https://cdn.vox-cdn.com/thumbor/RchXjkJuOqD76yAnGNLm6Br-Ulg=/0x0:4000×3000/620×465/filters:focal(1680×1180:2320×1820)/cdn.vox-cdn.com/uploads/chorus_image/image/72113522/GettyImages_1249183770.0.jpg 620w, https://cdn.vox-cdn.com/thumbor/nWbNr2ofeBJPsjs2mmjHTm55KcI=/0x0:4000×3000/920×690/filters:focal(1680×1180:2320×1820)/cdn.vox-cdn.com/uploads/chorus_image/image/72113522/GettyImages_1249183770.0.jpg 920w, https://cdn.vox-cdn.com/thumbor/IaXRWcuQIuAzQv42tS4zPv2M6w4=/0x0:4000×3000/1220×915/filters:focal(1680×1180:2320×1820)/cdn.vox-cdn.com/uploads/chorus_image/image/72113522/GettyImages_1249183770.0.jpg 1220w, https://cdn.vox-cdn.com/thumbor/uUq5G_sfPRSbcZj8mKRlVJnmbvY=/0x0:4000×3000/1520×1140/filters:focal(1680×1180:2320×1820)/cdn.vox-cdn.com/uploads/chorus_image/image/72113522/GettyImages_1249183770.0.jpg 1520w","webp_srcset":"https://cdn.vox-cdn.com/thumbor/Mk5zl7oglyeQ_ME3DSi5ryXnwq4=/0x0:4000×3000/320×240/filters:focal(1680×1180:2320×1820):format(webp)/cdn.vox-cdn.com/uploads/chorus_image/image/72113522/GettyImages_1249183770.0.jpg 320w, https://cdn.vox-cdn.com/thumbor/yMnpkhx7j5KM63IkHYzvfnZ0fs0=/0x0:4000×3000/620×465/filters:focal(1680×1180:2320×1820):format(webp)/cdn.vox-cdn.com/uploads/chorus_image/image/72113522/GettyImages_1249183770.0.jpg 620w, https://cdn.vox-cdn.com/thumbor/AhwvqJsbs0K1T6v86rmFfhlv9XM=/0x0:4000×3000/920×690/filters:focal(1680×1180:2320×1820):format(webp)/cdn.vox-cdn.com/uploads/chorus_image/image/72113522/GettyImages_1249183770.0.jpg 920w, https://cdn.vox-cdn.com/thumbor/jYpsqk_fgzixitTNo-4GGd_fXYE=/0x0:4000×3000/1220×915/filters:focal(1680×1180:2320×1820):format(webp)/cdn.vox-cdn.com/uploads/chorus_image/image/72113522/GettyImages_1249183770.0.jpg 1220w, https://cdn.vox-cdn.com/thumbor/GlXwvn3K58ICBWg5XbiK2Emn2fY=/0x0:4000×3000/1520×1140/filters:focal(1680×1180:2320×1820):format(webp)/cdn.vox-cdn.com/uploads/chorus_image/image/72113522/GettyImages_1249183770.0.jpg 1520w","media":null,"sizes":"(min-width: 809px) 485px, (min-width: 600px) 60vw, 100vw","fallback":"https://cdn.vox-cdn.com/thumbor/JQT9RyTDQLQ7ptNAphtDqUWy-f4=/0x0:4000×3000/1200×900/filters:focal(1680×1180:2320×1820)/cdn.vox-cdn.com/uploads/chorus_image/image/72113522/GettyImages_1249183770.0.jpg"},"art_directed":[]}},"image_is_placeholder":false,"image_is_hidden":false,"network":"vox","omits_labels":true,"optimizable":false,"promo_headline":"If your AI model is going to sell, it has to be safe ","recommended_count":0,"recs_enabled":false,"slug":"future-perfect/2023/3/25/23655082/ai-openai-gpt-4-safety-microsoft-facebook-meta","dek":"OpenAIâs GPT-4 shows the competitive advantage of putting in safety work.","homepage_title":"If your AI model is going to sell, it has to be safe ","homepage_description":"OpenAIâs GPT-4 shows the competitive advantage of putting in safety work.","show_homepage_description":false,"title_display":"If your AI model is going to sell, it has to be safe ","pull_quote":null,"voxcreative":false,"show_entry_time":true,"show_dates":true,"paywalled_content":false,"paywalled_content_box_logo_url":"","paywalled_content_page_logo_url":"","paywalled_content_main_url":"","article_footer_body":"Vox’s journalism is free because we believe that everyone deserves to understand the world that they live in. That kind of knowledge helps create better citizens, neighbors, friends, parents, consumers and stewards of this planet. In short, understanding benefits everyone. You can join in on this mission by making a financial gift to Vox today. Reader support helps keep our work free, for everyone. Will you join us? ","article_footer_header":"
We have a request","use_article_footer":true,"article_footer_cta_annual_plans":"{\r\n \"default_plan\": 1,\r\n \"plans\": [\r\n {\r\n \"amount\": 95,\r\n \"plan_id\": 74295\r\n },\r\n {\r\n \"amount\": 120,\r\n \"plan_id\": 81108\r\n },\r\n {\r\n \"amount\": 250,\r\n \"plan_id\": 77096\r\n },\r\n {\r\n \"amount\": 350,\r\n \"plan_id\": 92038\r\n }\r\n ]\r\n}","article_footer_cta_button_annual_copy":"year","article_footer_cta_button_copy":"Yes, I’ll give","article_footer_cta_button_monthly_copy":"month","article_footer_cta_default_frequency":"annual","article_footer_cta_monthly_plans":"{\r\n \"default_plan\": 1,\r\n \"plans\": [\r\n {\r\n \"amount\": 9,\r\n \"plan_id\": 77780\r\n },\r\n {\r\n \"amount\": 20,\r\n \"plan_id\": 69279\r\n },\r\n {\r\n \"amount\": 50,\r\n \"plan_id\": 46947\r\n },\r\n {\r\n \"amount\": 100,\r\n \"plan_id\": 46782\r\n }\r\n ]\r\n}","article_footer_cta_once_plans":"{\r\n \"default_plan\": 0,\r\n \"plans\": [\r\n {\r\n \"amount\": 20,\r\n \"plan_id\": 69278\r\n },\r\n {\r\n \"amount\": 50,\r\n \"plan_id\": 48880\r\n },\r\n {\r\n \"amount\": 100,\r\n \"plan_id\": 46607\r\n },\r\n {\r\n \"amount\": 250,\r\n \"plan_id\": 46946\r\n }\r\n ]\r\n}","use_article_footer_cta_read_counter":true,"use_article_footer_cta":true,"featured_placeable":false,"video_placeable":false,"disclaimer":null,"volume_placement":"lede","video_autoplay":false,"youtube_url":"http://bit.ly/voxyoutube","facebook_video_url":"","play_in_modal":true,"user_preferences_for_privacy_enabled":false,"show_branded_logos":true}” readability=”9.8231132075472″>
We have a request
Vox’s journalism is free because we believe that everyone deserves to understand the world that they live in. That kind of knowledge helps create better citizens, neighbors, friends, parents, consumers and stewards of this planet. In short, understanding benefits everyone. You can join in on this mission by making a financial gift to Vox today. Reader support helps keep our work free, for everyone. Will you join us?
Yes, I’ll give $120/year
Yes, I’ll give $120/year
We accept credit card, Apple Pay, and Google Pay. You can also contribute via