Google’s DeepMind division has introduced Veo 2, its next-generation AI-powered video generator capable of creating 4K (4096 × 2160 pixels) videos up to two minutes long. This sets it apart from OpenAI’s Sora, offering four times the resolution and six times the length. However, these capabilities remain largely theoretical for now, as real-world use comes with significant limitations.
Limited Testing and Current Capabilities
At present, Veo 2 is only accessible through the VideoFX experimental platform, where video resolution is capped at 720p and video length at 8 seconds. In comparison, OpenAI Sora provides videos up to 1080p resolution and 20 seconds in length. To access VideoFX, users need to join a waiting list, though Google promises a broader rollout soon. Business users can expect Veo 2 to appear on the Vertex AI platform, but no specific launch date has been announced yet.
Like its predecessor, Veo 2 generates videos based on text prompts, which can include images. The new version improves significantly in terms of understanding physical properties, delivering clearer visuals, and enhancing virtual camera movements. For example, it more accurately models motion, such as coffee pouring into a mug, and renders light properties like shadows and reflections. Cinematic effects and virtual lenses are now mimicked more realistically, contributing to a more lifelike viewing experience.
Improvements and Lingering Challenges
One of the key advancements with Veo 2 is its ability to reduce hallucinations, such as generating additional fingers or unexpected objects. DeepMind claims that while these issues have been minimized, they have not been entirely eliminated. For instance, videos can still exhibit the “uncanny valley” effect. In a test video featuring a moving car, closer inspection revealed that the road appeared unnaturally smooth, pedestrians blended into one another, and building facades defied physical logic.
Veo 2 was trained on a vast library of videos, though DeepMind has not disclosed specific sources. YouTube, which Google owns, is likely among them. To address concerns about misuse, such as deepfakes, Veo 2 incorporates SynthID, an invisible watermark system that identifies AI-generated content.
DeepMind has also made strides in static image generation. Imagen 3, the upgraded static image generator, produces brighter, more detailed images and better adheres to user instructions. The ImageFX interface now includes drop-down lists directly within the prompt field, helping users achieve more accurate results.
We’ll keep you updated as more integrations and features become available, especially as Veo 2 moves closer to its full potential.