More Deep Learning “Magic”: Paintings to photos, horses to zebras, and more amazing image-to-image translation
This is an introduction to recent research which presents an approach for learning to translate an image from a source domain X to a target domain Y in the absence of paired examples.
CycleGAN is the implementation of recent research by Jun-Yan Zhu, Taesung Park, Phillip Isola & Alexei A. Efros, which is "software that can generate photos from paintings, turn horses into zebras, perform style transfer, and more." The research builds on the authors' earlier work pix2pix (paper: Image-to-Image Translation with Conditional Adversarial Networks).
Collection style transfer.
The following is from the abstract of the authors' paper, Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks:
Image-to-image translation is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output image using a training set of aligned image pairs. However, for many tasks, paired training data will not be available. We present an approach for learning to translate an image from a source domain X to a target domain Y in the absence of paired examples. Our goal is to learn a mapping G:X→Y such that the distribution of images from G(X) is indistinguishable from the distribution Y using an adversarial loss. Because this mapping is highly under-constrained, we couple it with an inverse mapping F:Y→X and introduce a cycle consistency loss to push F(G(X))≈X (and vice versa).
A sample of possible applications, based on pre-trained models by the authors, is shown below (see their paper for further details):
- Photo generation from paintings
- Collection style transfer - model is trained on a set of landscape photographs, which, unlike neural style transfer, mimics an entire set's style
- Object transfiguration - model is trained to translate one object class from Imagenet to another class
- Season transfer - model is trained on winter and summer photos of Yosemite National Park
CycleGAN also works for video-to-video transfer:
While research and results are promising, the authors note that limitations are real, and point out the following (from their paper):
On translation tasks that involve color and texture changes, like many of those reported above, the method often succeeds. We have also explored tasks that require geometric changes, with little success. For example, on the task of dog → cat transfiguration, the learned translation degenerates to making minimal changes to the input (Figure 17). Handling more varied and extreme transformations, especially geometric changes, is an important problem for future work.
The accompanying paper also goes on to distinguish the author's work from recent similar research, notably A Neural Algorithm of Artistic Style (Leon A. Gatys, Alexander S. Ecker & Matthias Bethge). Results are promising, and the work is at the forefront of image-to-image translation.
The Torch code, which runs on both Linux and OSX, can be found here, as can the pre-trained models. The accompanying research paper is located here. A project page has also been set up, which you can find here.