3 minute read
One of the many talents that I wish I had and unfortunately was not blessed with is the ability to draw realistically. As a kid, I was obsessed with colouring books, patterns and drawing. I used to record the TV screen using the VHS video player (yes this was before the dawn of Netflix and DVD) and then try to recreate what was on the screen. Needless to say, none of them were any good and I don’t have any masterpieces hanging in any galleries around the world. But, with the help of the latest developments from Nvidia Research, I now get to see my ’masterpieces’ come to life.
This week at the Nvidia GPU Technology Conference (GTC) 2019, Nvidia unveiled their latest image processing research efforts, and it’s pretty incredible. Trained using 1 million images on Flickr (Burns, M., 2019) and leveraging generative adversarial networks (GANs), the deep learning model transforms basic segmentation maps into highly realistic images.
Video 01: GauGAN
The software allows you to draw an images using basic shapes, and then with the inclusion of labelling, be able to transform those shapes into realistic landscape imagery. In addition, it also has the built-in awareness of what has been generated within the scene, allowing the image to dynamically adapt to the other textures. For example, if you have an image of a green leafy tree on a summers day and then label the ground as snow, the tree will convert to barren. If you then introduce a lake, the tree and nearby elements will automatically be reflected in the water.
“ It’s like a coloring book picture that describes where a tree is, where the sun is, where the sky is…and then the neural network is able to fill in all of the detail and texture, and the reflections, shadows and colors, based on what it has learned about real images” - Bryan Catanzaro, VP of Applied Deep Learning Research, Nvidia
Invented in 2014, GANs contain the potential to learn the natural features of a dataset, resulting in the ability to imitate any distribution of data. The concept takes two cooperating neural networks; a generator and a discriminator. The Discriminator is initially trained on real data to produce a blueprint of what a real image should look like. The Generator then produces new data instances and presents them to the discriminator. The Discriminator then evaluates the authenticity by comparing the sample to its blueprint. The Discriminator provides feedback to the Generator on how to improve the realism of its synthetic images. Both work together to improve each other’s function (i.e. Generator is improving at creating realistic images and Discriminator is improving at detecting them).
This approach has the potential to completely transform the creation of scenic environments. Not only does allow people who may not be highly skilled start creating realistic scenery, but also allows the professionals to focus more of their efforts on prototyping more interesting sceneries and making quicker changes more efficiently. This powerful tool could have potential applications in architecture, landscaping, game designing and even film making. Although GauGAN at present focuses purely on natural elements like rocks, land and sea, “the underlying neural network is capable of filling in other landscape features, including buildings, roads and people” (Nvidia, 2019). With over 170 full-time researchers and in 2018, rolling out 104 publications, 51 patent applications and 12 open-source software packages (Medium, 2019), I really interested to see what further developments are made in this space.
The following are more links if you need further information