Google VideoPoet: a revolution in video creation through artificial intelligence

Google VideoPoet: a revolution in video creation through artificial intelligence

Google has unveiled VideoPoet, a new experimental large-scale language model (LLM) that uses artificial intelligence to create videos with text, images and even editing. Depending on the context of the user input, it can also provide audio for videos. According to Google researchers, VideoPoet is the first of its kind that can generate consistent motion, although artificial intelligence-based video production still has some challenges.

  • VideoPoet uses visual tokens that are customised to the video environment and enable the creation of videos with a higher level of detail.
  • VideoPoet’s offering includes adding artefacts, converting images into animated videos, video editing based on artificial intelligence, styling and adding effects.
  • VideoPoet is able to create longer videos with full detail, which indicates the potential for applications in the entertainment sector and in technological development.

While AI content generators such as Midjourney and DALL-E 3 have proven their potential, these services are not able to add enough motion to videos compared to the level of detail they can guarantee for images. This is where the animated content created by Google VideoPoet comes into play, which has been carefully trained based on numerous large language models (LLM).

The options offered by VideoPoet go beyond the simple creation of videos. Below is a list of all the functions that Google’s new AI bot can perform:

Text to video: Use text to create videos

Image to video: Convert photos into animated videos

Video editing: Use artificial intelligence to add artefacts to videos, e.g. moving objects

Stylization: Add effects to videos, e.g. colour correction, clip art styling and more

Inpainting: Adding details to a video, e.g. backgrounds or filling or masking blank spaces

The way VideoPoet handles fill-in-the-blanks sets it apart from other AI content generators. While VideoPoet uses visual tokens that have learnt to adapt to the video environment, Midjourney uses a diffusion-based method to generate backgrounds from random noise.

VideoPoet then processes the matching audio tokens using an audio stream encoder. In this way, it can generate audio that matches the main theme and concept of the video.

How VideoPoet creates “realistic” videos

Similar to how ChatGPT and Google Bard select responses based on word combinations, VideoPoet can create pixelated videos by recognising the subject and object in a video. This allows VideoPoet to create videos that are more realistic than the less detailed or blurred videos of other websites.

VideoPoet has countless applications. With the ability to add additional objects to moving objects and fill in empty areas, VideoPoet could revolutionise the entertainment sector where computer-generated imagery is still a difficult process.

According to Google, VideoPoet can also create longer videos. Currently, it can create 8-10 second animations with full details.

The company’s website contains several short clips with examples of text-to-video conversions. All examples can be seen on the Google Research Blog.

However, as this is only a preview, VideoPoet is not currently available. The current selection of Google clips may not be the most visually appealing, but the technology behind VideoPoet is intriguing. We have already seen how far Midjourney has come in the last two years – from creating pixelated images to stunning portraits. Therefore, we can expect VideoPoet to have even more far-reaching capabilities when it is released to the general public.

Total
0
Shares
Dodaj komentarz

Podobne Wpisy