Machine learning is a tool to inculcate intelligence in a machine to make decisions . The learning can be either supervised or unsupervised. Supervised algorithms require large quantities of labeled dataset. On the contrary unsupervised algorithm does not require any labelled data, they draw inferences from the data. Both have got its own advantages and disadvantages. The main disadvantage of supervised learning is the requirement of labeled dataset because it’s a time consuming and costly process. The scenario is worse in the medical field. Because the annotation of medical data requires time and effort of a medical expert. Hence a lot of research is being performed in the area of unsupervised learning . Generative Adversarial Networks, also widely called GANs, is one such attempt to improve unsupervised learning.
Two main statistical models in the machine learning areas are ‘Generative models’ and ‘Discriminative models’. These models are generally used to categorize data. A generative model will learn how the data is generated in-order to categorize. In other words it will learn the structure of the data. This allows the system to generate samples with similar statistical properties. Discriminative models will learn the relation between the data and the label associated with the data. Hence a discriminative model will simply categorize the input data without knowing how the data is generated. GAN exploits the concept behind both the models to get a better network architecture.
- Generative models Eg : Gaussian mixture model, Naive Bayes etc.
- Discriminative models Eg : SVM, Logistic regression, Neural network.
These two concepts are the main components of GANs.
“Generative Adversarial Network is the most interesting idea in the last ten years of machine learning”
Yann LeCun, Director, Facebook AI.
Generative Adversarial Network originated in the University of Montreal in 2014 as result of Ph.D thesis of Ian Goodfellow. He proposed a novel way to train a neural network. Basic GAN consists of two blocks, a Generator (G) and a Discriminator (D). These two blocks work in adversarial fashion, and both will get better and better over time. G will generate images from a noise and those are fed to discriminator. Discriminator is trained using real images, thereby it can distinguish between real or fake images. Generator image will be learning the qualities of a real image using the classification output of discriminator. The training of GAN can be better explained using the examples shown in the figure.
In this example, generator (G) is working hard to make fake currency notes. It has got access to only the raw materials like ink, paper etc. Using these material, generator will generate currency notes, and is then given it to bank official (discriminator) . The bank official will say whether the note is fake/real. In the initial stage our printing machine does not have a clue of how a real note looks like. It will easy for the bank official to say it is a fake note. This information is fed back to the printing machine thereby fine tuning it parameters. On the go the machine will become better and better in generating look-alike notes. In other words the machine is learning new features of the actual notes, thereby making the bank official’s job increasingly easier and less time-consuming.
GAN has got lot of interesting real world applications as well. Basic GAN setup can be directly fit into to situation where the system has to generate some sample from a distribution. Some of the examples that falls into this category are image editing and image super resolution. Ledig (a CV researcher at Twitter) et al addresses the problem of recovering finer texture detail while super-resolving at higher upscaling factors. Super-resolution GAN (SRGAN) is one such attempt, which does the optimisation on the perception loss. Perception loss is combination of content loss and adversarial loss. The adversarial loss makes the generator discriminator duo, capable of generating photo realistic super-resolved images.
There is another interesting work by Zhang (Phd Student at Rutgers university) et al where the network creates photo realistic image from a text description input. This network is created by stacking two GAN networks, hence the name stacked GAN. First GAN will create an outline of the image using text description and the second GAN will add the details to the previous input.
Isola (postdoctoral scholar at MIT) et al addresses the problem of image to image translation using conditional GAN. Compared to unconditional GAN, in this case the generator network will have input image and random noise as the input. And the discriminator will receive input image, and output image from generator or the image from dataset. Then the discriminator has to say whether second image is real or fake. The output of the generator will depend on the kind of second input to discriminator. The second image can be a color image, map image, original image etc. Based on this input, network can convert the label image to actual image, sketch to photo realistic image, aerial photos to map, black and white to color etc.
The applications of GAN is not limited to these, there are lot of other applications in video, robotics etc as well.
With the advent of deep learning network (DLN) there is a tremendous advancement in the area of computer vision, natural language processing, speech processing etc. But DLN requires a huge amount of data, this will be a roadblock in the areas where data procurement is tedious. Hence, the whole community is very much interested in the research aligned in the direction of unsupervised learning. GAN is one such attempt to address these kind of issues. In the coming days lots of research is going to happen regarding GAN to reach to their full potential.
Nirmal Jith of Aindra.
Other interesting reads :