Selection of GAN vs Adversarial Autoencoder models

Last Updated : 21 Feb, 2022

In this article, we are going to see the selection of GAN vs Adversarial Autoencoder models.

Generative Adversarial Network (GAN)

The Generative Adversarial Network, or GAN, is one of the most prominent deep generative modeling methodologies right now. The primary distinction between GAN and VAE is that GAN seeks to match the pixel level distribution rather than the data distribution, and it optimizes the model distribution to the genuine distribution in a different method.

What is the process through which GAN creates an image? Images are nothing more than vectors of pixel values. Random values, on the other hand, cannot be utilized to create a picture of an item. To make a picture of a dog seem like a dog, the pixels must have a certain value and be placed in a specific manner. As a result, we may state that the vectors must follow a specific distribution in order to resemble a dog. As a result, the goal of GAN is to take a random vector as an input and convert it into a pixel distribution that matches our intended output.

As a result, the latent variables or GAN inputs themselves have no significance other than the meaning provided by the network. Any point in the latent space will be mapped to a meaningful output by the model.

Training a GAN network:

Despite the fact that we know GAN alters the input to follow the object distribution, how does GAN optimize the network to learn the output distribution? There are “direct” and “indirect” methods for doing this. By comparing the genuine distribution with the created distribution, determining the errors, and adjusting the networks appropriately, the “direct” technique is used. The Generative Matching Network (GMN) used this method. The genuine distribution of the production, on the other hand, is likely to be complicated. Unlike a Gaussian distribution, which can be described using just the mean and variance. Explicitly expressing both the real and produced distributions would be challenging. Instead, the distributions are compared using samples from the actual and created distributions. We can estimate the distribution and examine the difference using samples of genuine and created data.

GAN, on the other hand, takes an “indirect” method, using a separate network, the discriminator, to categorize actual and produced data. In a nutshell, the GAN architecture is made up of two parts: the discriminator, which is taught to classify both genuine and produced data, and the generator, which is educated to mislead the discriminator into classifying the false samples as real.

Process of training the GANs:

Train the Discriminator by freezing the weight of the generator (only update discriminator), use the generator to create false samples (initially, this will be noise as the generator is untrained at first), sample the real samples and train the discriminator using actual and fictitious examples (using the real and fake label).
Then, train the Generator by setting the discriminator’s weight to zero (only update generator), use the generator to create false samples and then train the generator with the discriminator output using the bogus samples as input
Repeat Step 1 and 2

Adversarial Networks

The Adversarial Autoencoder (AAE) is a brilliant concept that combines the autoencoder architecture with GAN’s adversarial loss notion. It works in a similar way to the Variational Autoencoder (VAE), except instead of KL-divergence, it utilizes adversarial loss to regularize the latent code.

To fit the encoded latent coding into a normal distribution, VAE uses KL-divergence (the difference between distributions) (or any arbitrary distribution that was chosen). AAE substitutes this with adversarial loss, which adds an extra discriminator component and makes the encoder the generator. Unlike GAN, where the generator’s output is the created picture and the discriminator’s input is both genuine and false images, AAE’s generator creates a latent code and attempts to convince the discriminator that the latent code is sampled from the specified distribution. The discriminator, on the other hand, will determine if a particular latent code was created by the autoencoder (fake) or a random vector drawn from the normal distribution (real).

Three different types of encoders to choose from:

The encoder will attempt to compress the input into specified characteristics expressed as vector z, which is the same encoder used in autoencoder.
Instead of encoding each feature into a single value, the Gaussian Posterior encoder will store the gaussian distribution of each feature using two variables, mean and variance.
The characteristics are also encoded as distribution using the Universal Approximator Posterior. We do not, however, assume that the feature distribution follows a Gaussian distribution. The encoder in this example will be a function f(x, n), where x is the input and n is random noise with any possible distribution.

As a result, the AAE architecture is made up of the following elements:

The encoder will take the input and convert it into a lower-dimensional format (latent code z)
The decoder will turn the latent code z into the resulting picture.
The discriminator uses the encoded latent code z (fake) from the autoencoder as well as a random vector z selected from the specified distribution (real). It will verify whether or not the input is genuine.

The encoder and discriminator are the two fundamental distinctions between AAE and GAN, as seen in the architecture above. Unlike GAN, AAE uses an image as its input rather than a random vector z. This is accomplished by including an encoder at the beginning. Furthermore, AAE attempts to make the latent code follow a normal distribution (or your chosen distribution). This is accomplished by altering the discriminator jobs to anticipate whether a latent code z is created by the autoencoder or derived from a normal distribution. Unlike GAN, where the discriminator’s job is to predict whether a given picture is genuine or false, here the discriminator’s job is to predict whether a given image is real or fake (fake).

Applications

One of the exciting applications of Adversarial Autoencoder is in the anomaly detection and localization tasks, an unsupervised method to detect the anomaly is needed. Autoencoder can be trained to reconstruct an anomaly image to a normal image, the anomaly can be detected by calculating the difference between the reconstructed image without the anomaly and the original anomaly image.

Suggest improvement

Introduction to Speech Separation Based On Fast ICA

Understanding Reinforcement Learning in-depth

Share your thoughts in the comments