Google has created a video generation system based on a text description based on the Imagen neural network. Imagen Video is capable of creating 5-second videos at a resolution of 1280×768 pixels at 24 frames per second.
As Google researchers explain, Imagen Video takes a textual description and first creates a 16-frame video at 24×48 pixels and 3 FPS. The system then scales and “predicts” additional images, resulting in a 128-frame animation at 1280×768 pixels and 24 FPS, says SearchEngines.
During testing, the researchers found that the algorithm can create “watercolor” videos or mimic the style of Van Gogh, as well as add depth and three-dimensionality to the video image, as if shooting with a moving camera or drone.
Recall that the Imagen neural network, which is an analogue of DALL-E 2 from Open AI, was introduced by Google in May this year. To recognize a text query, the neural network uses large language models on which natural speech processing algorithms are based.
Both Imagen algorithms – for generating images and for generating videos – work on the same principle, improving the created prototype until artificial intelligence realizes that it can no longer make it better according to the given parameters, and then increases it to the desired size. The developers emphasize that the improvement in the resolution of the source does not occur by scaling – at each of the three stages, the neural network improves the details in the image.
NIX Solutions notes that so far, Imagen Video, as well as the image generator, is in closed beta and is not available to everyone. The developers fear that users will use the neural network to generate inappropriate videos and images, thereby exacerbating the prejudices and stereotypes accepted by society.