Generative Adversarial Networks, an overview

In this article, we’ll explain GANs by applying them to the task of generating images. One of the few successful techniques in unsupervised machine learning, and are quickly revolutionizing our ability to perform generative tasks.



Challenges

The most critical challenge in training GANs is related to the possibility of non-convergence. Sometimes this problem is also called mode collapse. To explain this problem simply, lets consider an example. Suppose the task is to generate images of digits such as those in the MNIST dataset. One possible issue that can arise (and does arise in practice) is that G might start producing images of the digit 6 and no other digit. Once D adapts to G’s current behavior, in-order to maximize classification accuracy, it will start classifying all digit 6’s as fake, and all other digits as real (assuming it can’t tell apart fake 6’s from real 6’s). Then, G adapts to D’s current behavior and starts generating only digit 8 and no other digit. Then D adapts, and starts classifying all 8’s as fake and everything else as real. Then G moves onto 3’s, and so on. Basically, G only produces images that are similar to a (very) small subset of the training data and once D starts discriminating that subset from the rest, G switches to some other subset. They are simply oscillating. Although this problem is not completely resolved, there are some solutions to it. We won’t discuss them in detail here, but one of them involves minibatch features and / or backpropagating through many updates of D. To learn more about this, check out the suggested readings in the next section.

Further reading

If you would like to learn about GANs in much more depth, I suggest checking out the ICCV 2017 tutorials on GANs. There are multiple tutorials, each focusing on different aspect of GANs, and they are quite recent.

I’d also like to mention the concept of Conditional GANs. Conditional GANs are GANs where the output is conditioned on the input. For example, the task might be to output an image matching the input description. So if the input is “dog”, then the output should be an image of a dog.

Below are results from some recent research (along with links to those papers).

 

Results for ‘Text to Image synthesis’ by Reed et. al

Results for Image Super-resolution by Ledig et. al

Results for Image to Image translation by Isola et. al

Generating high resolution ‘celebrity like’ images by Karras et. al

Last but not the least, if you would like to do a lot more reading on GANs, check out this list of GAN papers categorized by application and this list of 100+ different GAN variations.

Conclusion

I hope that in this article, you have understood a new technique in deep learning called Generative Adversarial Networks. They are one of the few successful techniques in unsupervised machine learning, and are quickly revolutionizing our ability to perform generative tasks. Over the last few years, we’ve come across some very impressive results. There is a lot of active research in the field to apply GANs for language tasks, to improve their stability and ease of training, and so on. They are already being applied in industry for a variety of applications ranging from interactive image editing, 3D shape estimation, drug discovery, semi-supervised learning to robotics. I hope this is just the beginning of your journey into adversarial machine learning.

Original. Reposted with permission.

Bios: Keshav is a cofounder of Compose Labs (commonlounge.com) and has spoken on GANs at international conferences including DataSciCon.Tech, Atlanta and DataHack Summit, Bangaluru, India. He did his masters in Artificial Intelligence from MIT, and his research focused on natural language processing, and before that, computer vision and recommendation systems.

Arash previously worked on data science at MIT and is the cofounder of Orderly, an SF-based startup using machine learning to help businesses with customer segmentation and feedback analysis.

Related