ML | Auto-Encoders

A typical use of a Neural Network is a case of supervised learning. It involves training data which contains an output label. The neural network tries to learn the mapping from the given input to the given output label. But what if the output label is replaced by the input vector itself? Then the network will try to find the mapping from the input to itself. This would be the identity function which is a trivial mapping.

But if the network is not allowed to simply copy the input, then the network will be forced to capture only the salient features. This constraint opens up a different field of applications for Neural Networks which was unknown. The primary applications are dimensionality reduction and specific data compression.

The network is first trained on the given input. The network tries to reconstruct the given input from the features it picked up and gives an approximation to the input as the output. The training step involves the computation of the error and backpropagating the error. The typical architecture of an Auto-encoder resembles a bottleneck.

The schematic structure of an autoencoder is as follows:

The encoder part of the network is used for encoding and sometimes even for data compression purposes although it is not very effective as compared to other general compression techniques like JPEG. Encoding is achieved by the encoder part of the network which has decreasing number of hidden units in each layer. Thus this part is forced to pick up only the most significant and representative features of the data. The second half of the network performs the Decoding function. This part has the increasing number of hidden units in each layer and thus tries to reconstruct the original input from the encoded data.

Thus Auto-encoders are an unsupervised learning technique.

Training of an Auto-encoder for data compression: For a data compression procedure, the most important aspect of the compression is the reliability of the reconstruction of the compressed data. This requirement dictates the structure of the Auto-encoder as a bottleneck.

Step 1: Encoding the input data

The Auto-encoder first tries to encode the data using the initialized weights and biases.

Step 2: Decoding the input data

The Auto-encoder tries to reconstruct the original input from the encoded data to test the reliability of the encoding.

Step 3: Backpropagating the error

After the reconstruction, the loss function is computed to determine the reliability of the encoding. The error generated is backpropagated.

The above-described training process is reiterated several times until an acceptable level of reconstruction is reached.

After the training process, only the encoder part of the Auto-encoder is retained to encode a similar type of data used in the training process.

The different ways to constrain the network are:-

  • Keep small Hidden Layers: If the size of each hidden layer is kept as small as possible, then the network will be forced to pick up only the representative features of the data thus encoding the data.
  • Regularization: In this method, a loss term is added to the cost function which encourages the network to train in ways other than copying the input.
  • Denoising: Another way of constraining the network is to add noise to the input and teaching the network how to remove the noise from the data.
  • Tuning the Activation Functions: This method involves changing the activation functions of various nodes so that a majority of the nodes are dormant thus effectively reducing the size of the hidden layers.

The different variations of Auto-encoders are:-

  • Denoising Auto-encoder: This type of auto-encoder works on a partially corrupted input and trains to recover the original undistorted image. As mentioned above, this method is an effective way to constrain the network from simply copying the input.
  • Sparse Auto-encoder: This type of auto-encoder typically contains more hidden units than the input but only a few are allowed to be active at once. This property is called the sparsity of the network. The sparsity of the network can be controlled by either manually zeroing the required hidden units, tuning the activation functions or by adding a loss term to the cost function.
  • Variational Auto-encoder: This type of auto-encoder makes strong assumptions about the distribution of latent variables and uses the Stochastic Gradient Variational Bayes estimator in the training process. It assumes that the data is generated by a Directed Graphical Model and tries to learn an approximation to q_{\phi}(z|x) to the conditional property q_{\theta}(z|x) where \phi and \theta are the parameters of the encoder and the decoder respectively.

My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using or mail your article to See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.

Article Tags :
Practice Tags :

Be the First to upvote.

Please write to us at to report any issue with the above content.