Is OpenAI’s superalignment team dead in the water after two key departures?

It wasn’t just Ilya Sutskever, the former Chief Scientist and co-founder of OpenAI, who departed the company yesterday.

Sutskever was joined shortly after out the door by colleague Jan Leike, co-lead of OpenAI’s “superalignment” team, who posted about his departure with the simple message “I resigned” on his account on X.

Leike joined OpenAI in early 2021, posting on X at the time stating that he “love[d] the work that OpenAI has been doing on reward modeling, most notably aligning #gpt3 using human preferences. Looking forward to building on it!” and linking to this OpenAI blog post.

Leike described some of his work at OpenAI over on his own Substack account “Aligned,” posting in December 2022 that he was “optimistic about our alignment approach” at the company.

Prior to joining OpenAI, Leike worked at Google’s DeepMind AI laboratory.

The departure of the two co-leaders of OpenAI’s superalignment team had many on X cracking jokes and wondering about whether or not the company has given up on or is in trouble with its effort to design ways to control powerful new AI systems, including OpenAI’s eventual goal of artificial general intelligence (AGI) — which the company defines as AI that outperforms humans at most economically valuable tasks.

What is superalignment?

Large language models (LLMs) such as OpenAI’s new GPT-4o and other rivals like Google’s Gemini and Meta’s Llama can function in mysterious ways. In order to ensure they deliver consistent performance and don’t respond to users with harmful or undesired responses, such as nonsense, the model makers and software engineers behind them must first “align” the models — getting them to behave the way they want.

This is accomplished through machine learning techniques such as reinforcement learning and proximal policy optimization (PPO).

IBM Research of all places has a decent overview on alignment for those looking to read more.

It follows then, that superalignment would be a more intensive effort designed to align even more powerful AI models — superintelligences — than what we have available today.

OpenAI first announced the formation of the superalignment team back in July 2023, writing at the time in a company blog post:

While superintelligenceA seems far off now, we believe it could arrive this decade.

Managing these risks will require, among other thingsnew institutions for governance and solving the problem of superintelligence alignment:

How do we ensure AI systems much smarter than humans follow human intent?

Currently, we don’t have a solution for steering or controlling a potentially superintelligent AI, and preventing it from going rogue. Our current techniques for aligning AI, such as reinforcement learning from human feedback, rely on humans’ ability to supervise AI. But humans won’t be able to reliably supervise AI systems much smarter than us,B and so our current alignment techniques will not scale to superintelligence. We need new scientific and technical breakthroughs.

Interestingly, OpenAI also pledged in this blog post to dedicate “20% of the compute we’ve secured to date to this effort,” meaning that 20% of its rarified and incredibly valuable graphics processing units (GPUs) from Nvidia and other AI training and deployment hardware would be taken up by the superalignment team.

What happens to superalignment in a post-Sutskever and post-Leike world?

Now that its two co-leads are out, the question remains as to whether or not the venture will continue, and in what capacity. Will OpenAI still devote the 20% of its compute earmarked for superalignment to this purpose, or will it redirect it to something else?

After all, some have concluded that Sutskever — who was among the group that fired OpenAI CEO and co-founder Sam Altman as CEO last year (briefly) — was a so-called “doomer,” or focused on the capacity for AI to lead to existential risks for humanity (also known as “x-risk”).

There is ample reporting and statements Sutskever made previously to support this idea.

Yet the narrative emerging from observers is that Altman and others at OpenAI are not as concerned about x-risk as Sutskever, and so perhaps the less concerned faction won out.

We’ve reached out to OpenAI contacts to ask about what will become of the superalignment team and will update when we hear back.

