Google Announces Imagen, Its Text-To-Video Tool

Artificial Intelligence: Google Announces Imagen, Its Text-To-Video Tool

October 6, 2022 | Artificial Intelligence, Latest News, News

https://dailyalts.com/wp-content/uploads/2022/10/imagen.gif

It’s Google’s reply to Meta’s video generating tool.

Google has announced Imagen Video, an AI video generator that can create 1280×768 HD video at 24 fps from written text. Though it’s still in research mode, it seems to be a quick riposte to Meta’s recently unveiled text-to-video AI tool, Make-A-Video. It also comes within six months of OpenAi’s DALLE-2 text-to-image generator. (ARS TECHNICA)

Here are some examples of Imagen videos generated with the indicated written prompts below each.

An astronaut riding a horse. Flying through an intense battle between pirate ships in a stormy ocean.

A Panda Eating Bamboo On a Rock. View of a castle with fantastically high towers reaching into the clouds in a hilly forest at dawn.

The Imagen system generates a a 16-frame, three-frames-per-second video at 24-by-48-pixel resolution from the input text prompt. In a subsequent “cascade” of steps, it simulates additional frames at higher resolutions, finally producing a 128-frame, 24-frames-per-second video at 720p (1280×768) that is 5.3 seconds in length.

Imagen has some unique abilities, including creating works that are similar to those from famous painters such as Vincent van Gogh, rotating 3D objects without distorting their shape, and various animation styles.

Imagen is what is known as a “diffusion” model, and competitive activity in this area is growing rapidly. Apart form the aforementioned tool from Meta, other rivals include Phenaki and DreamFusion.

The model was trained on a combination of an internal dataset consisting of 14 million video-text pairs and 60 million image-text pairs, and the publicly available LAION-400M image-text dataset.

Caveat

Google warns in its research paper: “Imagen Video and its frozen T5-XXL text encoder were trained on problematic data… While our internal testing suggests much of explicit and violent content can be filtered out, there still exists social biases and stereotypes which are challenging to detect and filter.”

“We have decided not to release the Imagen Video model or its source code until these concerns are mitigated.”