NIX Solutions: Nvidia's AI Text-to-Image Translation System

GauGAN2 applies segmented mapping, touch-up and text-to-image conversion. Compared to other existing models, the underlying neural network of GauGAN2 produces more varied and high-quality images. To do this, users can enter a short phrase and generate its key features and plot, says Dev.

NIX Solutions

“This starting block can then be completed by making this or that mountain higher and adding trees in the background or clouds in the sky,” said Nvidia team member Isha Salian.

GauGAN2 is an improved version of the GauGAN system introduced in 2019. It has been trained on over a million open images from the Flickr platform. The new version understands the relationship between objects such as snow, trees, water, flowers, bushes, hills and mountains, the neural network “realizes” what types of precipitation are typical for each season.

The system is based on a generative adversarial network consisting of a generator and a discriminator, notes NIX Solutions. The generator takes samples of images with accompanying text and suggests which words correspond to the elements of the image. The discriminator evaluates whether this assumption is true.

The GauGAN2 version has already used 10 million images for training. If you enter the text “sunset on the beach”, the network will generate the corresponding image; if you expand the phrase to “sunset on a rocky beach” or replace “sunset” with “rainy day”, the neural network will understand the meaning and make the appropriate changes.