Ex-Microsoft engineer’s AI video startup scores $60M from top VCs, Jared Leto

With this round of funding, Captions will expand its machine learning team and invest more in in-house research and technical infrastructure. …

We want to hear from you! Take our quick AI survey and share your insights on the current state of AI, how you’re implementing it, and what you expect to see in the future. Learn More


As the demand for generative AI video grows, startups in this space are attracting big money from venture capital firms. Recently, RunwayML was reported to be raising a large round of funding. Now, Captions, an AI video startup founded by former Microsoft engineer Gaurav Misra, has secured $60 million in Series C funding.

Founded in 2021, Captions started as a camera app in which users recorded ‘talking videos’ by engaging with the camera directly. Over the past year, the company has shifted its focus to AI, enabling users to create videos with avatars from scratch. It has essentially become an AI-powered creative suite for video creators.

Misra stated that the investment, led by Index Ventures, values the company at $500 million. However, it’s not a lone effort, as existing investors Kleiner Perkins, Sequoia Capital and Andreessen Horowitz, as well as new investors Adobe Ventures, HubSpot Ventures and actor and singer Jared Leto (who previously backed Pika) also participated in the round. 

Captions will use the capital to grow its ML team in New York and launch new generative innovations to position itself as the leader in the AI video space. 


Countdown to VB Transform 2024

Join enterprise leaders in San Francisco from July 9 to 11 for our flagship AI event. Connect with peers, explore the opportunities and challenges of Generative AI, and learn how to integrate AI applications into your industry. Register Now


Captions’ holistic suite for AI videos

From the roots of manual video creation, where users had to talk, record and edit on their own, Captions has evolved into a holistic AI-powered suite that provides everything a brand or individual needs to generate high-quality videos.

To achieve this shift, the company first expanded its editing offerings with a range of AI-assisted tools, including those for correcting eye contact, dubbing a speaker’s words into another language, executing human-like voiceovers and producing short clips from a longer piece of video (among other things). 

The features drew several million users to Captions, but the real milestone came earlier this year when it announced generative AI-driven AI Creator and AI Edit capabilities. 

Captions AI Creator

The former allowed users to add custom 3D avatars to their video content, saving the hassle of talking manually to the audience in every video. Meanwhile, the latter provided users with the ability to tweak their content — complete with custom graphics, B-roll, transitions and zooms — just with a single tap.

“These innovative features (have) catalyzed a new age for creativity, empowering an entirely new group of creators who otherwise may have been blocked by the barriers that come with recording or editing,” Misra noted in a blog post.

A lot more in the pipeline

While the AI smarts have drawn a large number of users — more than 10 million of them creating three million-plus videos every month — to Captions, Misra says this is just the beginning for the company. 

With this round of funding, the startup will expand its machine learning (ML) team and invest more in in-house research and technical infrastructure to launch more AI capabilities for its video platform. 

“Looking to the future, we’re excited to share our plans to invest $100 million into advancing generative video research here in New York City,” he wrote. “We believe that New York is emerging as the epicenter for AI research and look forward to building our world-class team here.”

While it remains unclear what exactly Captions has in the pipeline, the move to double down on existing features could bolster the company’s suite into a unified platform for producing avatar-based videos. This could make the company a strong competitor for niche players like ElevenLabs and Synthesia

The enhanced set of capabilities will also make Captions more valuable to businesses, especially those involved in social media management, advertising, content marketing and growth marketing.

“Short-form video has become the most dominant content format, and Captions is uniquely positioned
to transform how marketers create videos and engage with customers,” said Brandon Greer, head of HubSpot Ventures. “We’re excited to support Captions as they enable businesses of all sizes to produce high-quality video content faster than ever.”

Some notable names currently using the Captions’ platform are Disney-owned sports network ESPN, “Mr. Wonderful” Kevin O’Leary of Shark Tank fame, Twitch’s founder Justin Kan and the influencer Unnecessary Inventions.