DeepMind’s New AI Can Generate Sound and Dialogue for Videos

June 27, 2024
DeepMind AI
DeepMind AI

Though we are seeing a rapid advancement in video generation models, many existing systems only generate silent videos. Now, Google’s AI research lab DeepMind has revealed that it’s working on a Video-to-audio (V2A) technology to enable synchronized audiovisual generation.

DeepMind describes the technology, known as V2A (short for “video-to-audio”), as a crucial component of AI-generated media in a post on its official blog. a lot of companies including DeepMind have created AI models that can generate videos, but these models are unable to produce sound effects that correspond with the videos they produce.

With the help of DeepMind’s SynthID technology, DeepMind’s V2A technology uses the description of a soundtrack (for eg:  A drummer on a stage at a concert surrounded by flashing lights and a cheering crowd) coupled with a video to generate dialogue, sound effects, and music that complement the characters and tone of the video. According to DeepMind, the AI model underlying V2A, a diffusion model, was trained on a blend of audio and transcripts of dialogue in addition to short video clips.

Now, AI-powered audio generation is not new. Companies like ElevenLabs and platforms like Pika and GenreX have all released something similar. However, DeepMind asserts that its V2A technology is unique in that it can interpret a video’s raw pixels and automatically synchronize generated sounds with the video, even without a description.

“We experimented with autoregressive and diffusion approaches to discover the most scalable AI architecture and the diffusion-based approach for audio generation gave the most realistic and compelling results for synchronizing video and audio information,” the company explains.

There are a few issues that need to be considered when it comes to this technology. For one, when it comes to videos with artifacts or distortions, the underlying model doesn’t produce audio that is exceptionally high-quality because it wasn’t trained on many of them. DeepMind says it won’t make the technology available to the general public anytime soon, if at all, for these reasons and to prevent misuse. The company is also yet to clarify if the tool was trained using copyrighted data.

Article Tags:
· · · · ·
Article Categories:
Tech News

Leave a Reply

Your email address will not be published. Required fields are marked *

The maximum upload file size: 2 MB. You can upload: image, audio, video, document, spreadsheet, interactive, text, archive, code, other. Links to YouTube, Facebook, Twitter and other services inserted in the comment text will be automatically embedded. Drop file here