Pika adds generative AI sound effects to its video maker
Pika is offering sound effects only to those who are creating AI video as part of its super-collaborators program or are paying $58/month. …
Join leaders in Boston on March 27 for an exclusive night of networking, insights, and conversation. Request an invite here.
OpenAI continues to share clips of Sora, its impressively smooth and photorealistic generative AI video model, but it remains an internal model only for now. Meanwhile, rivals in the fast-moving video AI space including Pika aren’t letting the moment go to waste.
This weekend, Pika added a new feature that allows users to automatically generate sound effects for their AI videos made on the web platform, pika.art. The move promises to add a whole new dimension to AI generated videos, most of which are soundless and previously required a user to add their own sound files through other editing software. Now, with Pika’s new addition, they can do it right within the app and create new sound files without sourcing them separately.
The move comes less than two weeks after its launch of lip-syncing capabilities and takes AI-generated content to a whole new level, making it more suited for individual creators and enterprises use cases. Together with lip synch and the existing generative AI video visuals maker, Pika has created among the first “all-in-one” major generative AI video creation platforms, where users can do everything with AI — sound effects, voiceovers, and visuals — all with AI algorithms in one place.
By offering these three major capabilities, Pika could prove to be an attractive proposition to filmmakers and eliminate the need for separate cinematographers, videographers, sound designers, or the responsibilities of a sole filmmaker sourcing all this content on their own — at least for certain projects. Instead of the filmmaker going out to film in the field or sorting through different stock image and sound databases and programs to find all the files to stitch together a movie, Pika now lets the user type into it and generate all of them much faster than before, and more directly from their imagination.
However, as of now, Pika says the capability is being offered only to those who are a part of its super-collaborators program or are paying $58/month for its Pro subscription. Eventually, it plans to move it out of the beta stage and make it available to all users of the Pika platform.
How Pika’s AI videos will get sound effects?
In a press statement and X post announcing the capability, Pika confirmed users will get sound effects in two ways.
One would be contextual generation, where the AI models underneath the platform would decide what audio would go best with the clip being produced from the text prompt.
Meanwhile, the second would be a follow-up approach, where the user could add specific AI-generated sounds after they have generated or uploaded an audio-less clip on the platform.
For the former, the company explained, all a user would have to do is turn the “sound effects” toggle on when entering the prompt. The proprietary model will do the rest of the job and provide a complete audiovisual output – with sounds relevant to the scene – in a matter of seconds.
However, for the follow-up approach, the user would have to click on ‘Edit’ and ‘Sound Effects’ (available next to modify region and expand canvas functions) and then write a complete text prompt describing what kind of sound they want to add to the clip in question. Based on the provided prompt, the model will generate multiple sound options, allowing the user to pick and add what works best for their need.
While the feature has just been announced, its rollout is expected to give AI video creators a much-needed tool to enhance their creations. Previously, users were forced to use audio from other sources – which brought friction to the process and took more time. Pika claims to be the first one in the AI video space to include generated audio as part of the video output.
That said, it’s important to note that Pika is not the only one exploring sound generation with text prompts. Just recently, ElevenLabs, known for its text-to-speech and speech-to-speech AI technology, also opened early signups for its text-to-sound AI that will allow creators to generate sound effects by simply describing their imagination in words.
Meta also offers a similar technology called AudioGen. Yet neither of these rivals also offer a baked-in video generative AI model, as well.
Roll-out expected over time
Pika says the new sound effects feature will roll out gradually to users. Currently, it is in the beta stage and accessible only to those who are on the company’s super-collaborators program or are paying $58/month for its Pro subscription. Using the feedback from these early users, the company plans to iterate on and improve the capability, making it suitable for all users of the platform.
Since launching its web platform in December 2023, Pika has been going all in to strengthen its offering against competition, especially OpenAI’s yet-to-launch Sora.
Just recently, it launched lip-sync in partnership with ElevenLabs, allowing users to add AI voices to their videos, while also adding matching animation to ensure the speaking characters’ mouths move in time with the dialog. Pika says even these lip-synced videos can be enhanced with sound effects to create a much more complete and immersive scene.
As the next step, the startup plans to build on this work with more features in the pipeline. It has raised $55 million in funding at a valuation of nearly $200 million and is taking on not just OpenAI but also other heavily funded players in the creative AI space, including Adobe, Runwa, Stability AI and recently-introduced Haiper.
VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.