Autoencoders -Machine Learning
Last Updated :
06 Dec, 2023
At the heart of deep learning lies the neural network, an intricate interconnected system of nodes that mimics the human brain’s neural architecture. Neural networks excel at discerning intricate patterns and representations within vast datasets, allowing them to make predictions, classify information, and generate novel insights. Autoencoders emerge as a fascinating subset of neural networks, offering a unique approach to unsupervised learning. Autoencoders are an adaptable and strong class of architectures for the dynamic field of deep learning, where neural networks develop constantly to identify complicated patterns and representations. With their ability to learn effective representations of data, these unsupervised learning models have received considerable attention and are useful in a wide variety of areas, from image processing to anomaly detection.
What are Autoencoders?
Autoencoders are a specialized class of algorithms that can learn efficient representations of input data with no need for labels. It is a class of artificial neural networks designed for unsupervised learning. Learning to compress and effectively represent input data without specific labels is the essential principle of an automatic decoder. This is accomplished using a two-fold structure that consists of an encoder and a decoder. The encoder transforms the input data into a reduced-dimensional representation, which is often referred to as “latent space” or “encoding”. From that representation, a decoder rebuilds the initial input. For the network to gain meaningful patterns in data, a process of encoding and decoding facilitates the definition of essential features.
Architecture of Autoencoder in Deep Learning
The general architecture of an autoencoder includes an encoder, decoder, and bottleneck layer.

- Encoder
- Input layer take raw input data
- The hidden layers progressively reduce the dimensionality of the input, capturing important features and patterns. These layer compose the encoder.
- The bottleneck layer (latent space) is the final hidden layer, where the dimensionality is significantly reduced. This layer represents the compressed encoding of the input data.
- Decoder
- The bottleneck layer takes the encoded representation and expands it back to the dimensionality of the original input.
- The hidden layers progressively increase the dimensionality and aim to reconstruct the original input.
- The output layer produces the reconstructed output, which ideally should be as close as possible to the input data.
- The loss function used during training is typically a reconstruction loss, measuring the difference between the input and the reconstructed output. Common choices include mean squared error (MSE) for continuous data or binary cross-entropy for binary data.
- During training, the autoencoder learns to minimize the reconstruction loss, forcing the network to capture the most important features of the input data in the bottleneck layer.
After the training process, only the encoder part of the autoencoder is retained to encode a similar type of data used in the training process. The different ways to constrain the network are: –
- Keep small Hidden Layers: If the size of each hidden layer is kept as small as possible, then the network will be forced to pick up only the representative features of the data thus encoding the data.
- Regularization: In this method, a loss term is added to the cost function which encourages the network to train in ways other than copying the input.
- Denoising: Another way of constraining the network is to add noise to the input and teach the network how to remove the noise from the data.
- Tuning the Activation Functions: This method involves changing the activation functions of various nodes so that a majority of the nodes are dormant thus, effectively reducing the size of the hidden layers.
Types of Autoencoders
There are diverse types of autoencoders and analyze the advantages and disadvantages associated with different variation:
Denoising Autoencoder
Denoising autoencoder works on a partially corrupted input and trains to recover the original undistorted image. As mentioned above, this method is an effective way to constrain the network from simply copying the input and thus learn the underlying structure and important features of the data.
Advantages
- This type of autoencoder can extract important features and reduce the noise or the useless features.
- Denoising autoencoders can be used as a form of data augmentation, the restored images can be used as augmented data thus generating additional training samples.Â
Disadvantages
- Selecting the right type and level of noise to introduce can be challenging and may require domain knowledge.
- Denoising process can result into loss of some information that is needed from the original input. This loss can impact accuracy of the output.Â
Sparse Autoencoder
This type of autoencoder typically contains more hidden units than the input but only a few are allowed to be active at once. This property is called the sparsity of the network. The sparsity of the network can be controlled by either manually zeroing the required hidden units, tuning the activation functions or by adding a loss term to the cost function.
Advantages
- The sparsity constraint in sparse autoencoders helps in filtering out noise and irrelevant features during the encoding process.
- These autoencoders often learn important and meaningful features due to their emphasis on sparse activations.
Disadvantages
- The choice of hyperparameters play a significant role in the performance of this autoencoder. Different inputs should result in the activation of different nodes of the network.
- The application of sparsity constraint increases computational complexity.
Variational Autoencoder
Variational autoencoder makes strong assumptions about the distribution of latent variables and uses the Stochastic Gradient Variational Bayes estimator in the training process. It assumes that the data is generated by a Directed Graphical Model and tries to learn an approximation to [Tex]q_{\phi}(z|x)     Â
[/Tex]to the conditional property [Tex]q_{\theta}(z|x)     Â
[/Tex]where [Tex]\phi     Â
[/Tex]and [Tex]\theta     Â
[/Tex]are the parameters of the encoder and the decoder respectively.
Advantages
- Variational Autoencoders are used to generate new data points that resemble the original training data. These samples are learned from the latent space.
- Variational Autoencoder is probabilistic framework that is used to learn a compressed representation of the data that captures its underlying structure and variations, so it is useful in detecting anomalies and data exploration.
Disadvantages
- Variational Autoencoder use approximations to estimate the true distribution of the latent variables. This approximation introduces some level of error, which can affect the quality of generated samples.
- The generated samples may only cover a limited subset of the true data distribution. This can result in a lack of diversity in generated samples.
Convolutional Autoencoder
Convolutional autoencoders are a type of autoencoder that use convolutional neural networks (CNNs) as their building blocks. The encoder consists of multiple layers that take a image or a grid as input and pass it through different convolution layers thus forming a compressed representation of the input. The decoder is the mirror image of the encoder it deconvolves the compressed representation and tries to reconstruct the original image.
Advantages
- Convolutional autoencoder can compress high-dimensional image data into a lower-dimensional data. This improves storage efficiency and transmission of image data.
- Convolutional autoencoder can reconstruct missing parts of an image. It can also handle images with slight variations in object position or orientation.
Disadvantages
- These autoencoder are prone to overfitting. Proper regularization techniques should be used to tackle this issue.
- Compression of data can cause data loss which can result in reconstruction of a lower quality image.
Implementation of Autoencoders
We’ve created an autoencoder comprising two Dense layers: an encoder responsible for condensing the images into a 64-dimensional latent vector, and a decoder tasked with reconstructing the initial image based on this latent space.
Import necessary libraries
For the implementation, we are going to import matplotlib, numpy, pandas, sklearn and keras.
Python3
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import tensorflow as tf
from sklearn.metrics import accuracy_score, precision_score, recall_score
from sklearn.model_selection import train_test_split
from keras import layers, losses
from keras.datasets import mnist
from keras.models import Model
|
Load the MNIST dataset
Python3
(x_train, _), (x_test, _) = mnist.load_data()
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
print("Shape of the training data:", x_train.shape)
print("Shape of the testing data:", x_test.shape)
|
Output:
Shape of the training data: (60000, 28, 28)
Shape of the testing data: (10000, 28, 28)
Define a basic Autoencoder
In the following code snippet,
- SimpleAutoencoder class is defined.
- Constructor initializes the autoencoder with specified latent dimensions and data shape
- The encoder and decoder architectures is defined using Sequential model
- The call method defines the forward pass of the autoencoder, where input data is passed through the encoder to obtain encoded data and then through the decoder to obtain the decoded data.
Python3
class SimpleAutoencoder(Model):
def __init__(self,latent_dimensions , data_shape):
super(SimpleAutoencoder, self).__init__()
self.latent_dimensions = latent_dimensions
self.data_shape = data_shape
self.encoder = tf.keras.Sequential([
layers.Flatten(),
layers.Dense(latent_dimensions, activation='relu'),
])
self.decoder = tf.keras.Sequential([
layers.Dense(tf.math.reduce_prod(data_shape), activation='sigmoid'),
layers.Reshape(data_shape)
])
def call(self, input_data):
encoded_data = self.encoder(input_data)
decoded_data = self.decoder(encoded_data)
return decoded_data
input_data_shape = x_test.shape[1:]
latent_dimensions = 64
simple_autoencoder = SimpleAutoencoder(latent_dimensions, input_data_shape)
|
Compile and Fit Autoencoder
Python3
simple_autoencoder.compile(optimizer='adam', loss=losses.MeanSquaredError())
simple_autoencoder.fit(x_train, x_train,
epochs=1,
shuffle=True,
validation_data=(x_test, x_test))
|
Output:
Epoch 1/10
1875/1875 [==============================] - 12s 6ms/step - loss: 0.0243 - val_loss: 0.0091
Epoch 2/10
1875/1875 [==============================] - 16s 9ms/step - loss: 0.0069 - val_loss: 0.0054
Epoch 3/10
1875/1875 [==============================] - 15s 8ms/step - loss: 0.0051 - val_loss: 0.0046
Epoch 4/10
1875/1875 [==============================] - 8s 5ms/step - loss: 0.0045 - val_loss: 0.0043
Epoch 5/10
1875/1875 [==============================] - 8s 4ms/step - loss: 0.0043 - val_loss: 0.0041
Epoch 6/10
1875/1875 [==============================] - 9s 5ms/step - loss: 0.0042 - val_loss: 0.0041
Epoch 7/10
1875/1875 [==============================] - 7s 4ms/step - loss: 0.0041 - val_loss: 0.0040
Visualize the original and reconstructed data
Python3
encoded_imgs = simple_autoencoder.encoder(x_test).numpy()
decoded_imgs = simple_autoencoder.decoder(encoded_imgs).numpy()
n = 6
plt.figure(figsize=(8, 4))
for i in range(n):
ax = plt.subplot(2, n, i + 1)
plt.imshow(x_test[i])
plt.title("original")
plt.gray()
ax = plt.subplot(2, n, i + 1 + n)
plt.imshow(decoded_imgs[i])
plt.title("reconstructed")
plt.gray()
plt.show()
|
Output:

Conclusion
In conclusion, we have defined Autoencoders as robust and flexible classes of architectures in the dynamic land of deep learning. Their significance in tasks such as image compression and anomaly detection is underlined by the fact that they are capable of learning effective data representations, together with their ability to adapt to different autoencoder types. Autoencoders remain a major player in the area of deep learning, playing an increasingly important role as we explore and develop new approaches for understanding data patterns and representations.
Also Check:
Frequently Asked Questions (FAQs)
1. What is CNN autoencoder?
A CNN autoencoder is an autoencoder architecture that incorporates convolutional layers in both the encoder and decoder parts of the network. They are suited for handling high-dimensional input data with spatial structure, such as images.
2. How does an Autoencoder work?
The autoencoder works by encoding the input data into a lower-dimensional representation, often called the latent space or bottleneck, using the encoder. The decoder then reconstructs the input data from this lower-dimensional representation. The network is trained to minimize the difference between the input and the reconstructed output.
3. Is autoencoder supervised or unsupervised?
Autoencoders are primarily used for unsupervised learning, as they do not require labeled data during the training phase. However, they can also be adapted for semi-supervised or supervised tasks by incorporating labeled information into the training process.
4. What is the difference between autoencoders and generative models?
Autoencoders focus on learning efficient representations of input data, while generative models, like Generative Adversarial Networks (GANs) or Variational Autoencoders (VAEs), are designed to generate new data samples that resemble the training data.
5. How do denoising autoencoders work?
Denoising autoencoders are trained to reconstruct clean data from noisy input. During training, the model is exposed to input data with artificially added noise, forcing it to learn robust features and reduce sensitivity to noise.
6. What is the role of the bottleneck layer in autoencoders?
Denoising autoencoders are trained to reconstruct clean data from noisy input. During training, the model is exposed to input data with artificially added noise, forcing it to learn robust features and reduce sensitivity to noise.
Similar Reads
Deep Learning Tutorial
Deep Learning tutorial covers the basics and more advanced topics, making it perfect for beginners and those with experience. Whether you're just starting or looking to expand your knowledge, this guide makes it easy to learn about the different technologies of Deep Learning. Deep Learning is a bran
5 min read
Introduction to Deep Learning
Artificial Neural Network
Introduction to Convolution Neural Network
Introduction to Convolution Neural Network
A Convolutional Neural Network (CNN) is a type of Deep Learning neural network architecture commonly used in Computer Vision. Computer vision is a field of Artificial Intelligence that enables a computer to understand and interpret the image or visual data. When it comes to Machine Learning, Artific
10 min read
Digital Image Processing Basics
Digital Image Processing means processing digital image by means of a digital computer. We can also say that it is a use of computer algorithms, in order to get enhanced image either to extract some useful information. Digital image processing is the use of algorithms and mathematical models to proc
7 min read
Difference between Image Processing and Computer Vision
Image processing and Computer Vision both are very exciting field of Computer Science. Computer Vision: In Computer Vision, computers or machines are made to gain high-level understanding from the input digital images or videos with the purpose of automating tasks that the human visual system can do
2 min read
CNN | Introduction to Pooling Layer
Pooling layer is used in CNNs to reduce the spatial dimensions (width and height) of the input feature maps while retaining the most important information. It involves sliding a two-dimensional filter over each channel of a feature map and summarizing the features within the region covered by the fi
5 min read
CIFAR-10 Image Classification in TensorFlow
Prerequisites:Image ClassificationConvolution Neural Networks including basic pooling, convolution layers with normalization in neural networks, and dropout.Data Augmentation.Neural Networks.Numpy arrays.In this article, we are going to discuss how to classify images using TensorFlow. Image Classifi
8 min read
Implementation of a CNN based Image Classifier using PyTorch
Introduction: Introduced in the 1980s by Yann LeCun, Convolution Neural Networks(also called CNNs or ConvNets) have come a long way. From being employed for simple digit classification tasks, CNN-based architectures are being used very profoundly over much Deep Learning and Computer Vision-related t
9 min read
Convolutional Neural Network (CNN) Architectures
Convolutional Neural Network(CNN) is a neural network architecture in Deep Learning, used to recognize the pattern from structured arrays. However, over many years, CNN architectures have evolved. Many variants of the fundamental CNN Architecture This been developed, leading to amazing advances in t
11 min read
Object Detection vs Object Recognition vs Image Segmentation
Object Recognition: Object recognition is the technique of identifying the object present in images and videos. It is one of the most important applications of machine learning and deep learning. The goal of this field is to teach machines to understand (recognize) the content of an image just like
5 min read
YOLO v2 - Object Detection
In terms of speed, YOLO is one of the best models in object recognition, able to recognize objects and process frames at the rate up to 150 FPS for small networks. However, In terms of accuracy mAP, YOLO was not the state of the art model but has fairly good Mean average Precision (mAP) of 63% when
6 min read
Recurrent Neural Network
Natural Language Processing (NLP) Tutorial
Natural Language Processing (NLP) is the branch of Artificial Intelligence (AI) that gives the ability to machine understand and process human languages. Human languages can be in the form of text or audio format. Applications of NLPThe applications of Natural Language Processing are as follows: Voi
5 min read
Introduction to NLTK: Tokenization, Stemming, Lemmatization, POS Tagging
Natural Language Toolkit (NLTK) is one of the largest Python libraries for performing various Natural Language Processing tasks. From rudimentary tasks such as text pre-processing to tasks like vectorized representation of text - NLTK's API has covered everything. In this article, we will accustom o
5 min read
Word Embeddings in NLP
Word Embeddings are numeric representations of words in a lower-dimensional space, capturing semantic and syntactic information. They play a vital role in Natural Language Processing (NLP) tasks. This article explores traditional and neural approaches, such as TF-IDF, Word2Vec, and GloVe, offering i
15+ min read
Introduction to Recurrent Neural Networks
Recurrent Neural Networks (RNNs) were introduced in the 1980s by researchers David Rumelhart, Geoffrey Hinton, and Ronald J. Williams. RNNs have laid the foundation for advancements in processing sequential data, such as natural language and time-series analysis, and continue to influence AI researc
15 min read
Recurrent Neural Networks Explanation
Today, different Machine Learning techniques are used to handle different types of data. One of the most difficult types of data to handle and the forecast is sequential data. Sequential data is different from other types of data in the sense that while all the features of a typical dataset can be a
8 min read
Sentiment Analysis with an Recurrent Neural Networks (RNN)
Recurrent Neural Networks (RNN) are to the rescue when the sequence of information is needed to be captured (another use case may include Time Series, next word prediction, etc.). Due to its internal memory factor, it remembers past sequences along with current input which makes it capable to captur
7 min read
Short term Memory
In the wider community of neurologists and those who are researching the brain, It is agreed that two temporarily distinct processes contribute to the acquisition and expression of brain functions. These variations can result in long-lasting alterations in neuron operations, for instance through act
5 min read
What is LSTM - Long Short Term Memory?
LSTM excels in sequence prediction tasks, capturing long-term dependencies. Ideal for time series, machine translation, and speech recognition due to order dependence. The article provides an in-depth introduction to LSTM, covering the LSTM model, architecture, working principles, and the critical r
10 min read
Long Short Term Memory Networks Explanation
Prerequisites: Recurrent Neural Networks To solve the problem of Vanishing and Exploding Gradients in a Deep Recurrent Neural Network, many variations were developed. One of the most famous of them is the Long Short Term Memory Network(LSTM). In concept, an LSTM recurrent unit tries to "remember" al
7 min read
LSTM - Derivation of Back propagation through time
LSTM (Long short term Memory ) is a type of RNN(Recurrent neural network), which is a famous deep learning algorithm that is well suited for making predictions and classification with a flavour of the time. In this article, we will derive the algorithm backpropagation through time and find the gradi
4 min read
Text Generation using Recurrent Long Short Term Memory Network
This article will demonstrate how to build a Text Generator by building a Recurrent Long Short Term Memory Network. The conceptual procedure of training the network is to first feed the network a mapping of each character present in the text on which the network is training to a unique number. Each
5 min read