Open In App

Denoising AutoEncoders In Machine Learning

Autoencoders are types of neural network architecture used for unsupervised learning. The architecture consists of an encoder and a decoder. The encoder encodes the input data into a lower dimensional space while the decode decodes the encoded data back to the original input. The network is trained to minimize the difference between decoded data and input. Autoencoders have the risk of becoming an Identify function meaning the output equals the input which makes the whole neural network of autoecoders useless. This generally happens when there are more nodes in the hidden layer than there are inputs.

Denoising Autoencoder (DAE)

Now, a denoising autoencoder is a modification of the original autoencoder in which instead of giving the original input we give a corrupted or noisy version of input to the encoder while decoder loss is calculated concerning original input only. This results in efficient learning of autoencoders and the risk of autoencoder becoming an identity function is significantly reduced.



Architecture of DAE

The denoising autoencoder (DAE) architecture resembles a standard autoencoder and consists of two main components:

Encoder:

Decoder:

What DAE Learns?

The above architecture of using a corrupted input helps decrease the risk of overfitting and prevents the DAE from becoming an identity function.



Objective Function of DAE

The objective of DAE is to minimize the difference between the original input (clean input without the notice) and the reconstructed output. This is quantified using a reconstruction loss function. Two types of loss function are generally used depending on the type of input data.

Mean Squared Error (MSE):

If we have input image data in the form of floating pixel values i.e. values between (0 to 1) or (0 to 255) we use mse

Here,

Binary Cross-Entropy (log-loss):

If we have input image data in the form of bits pixel values i.e. values will be either 0 or 1 only then we can use binary cross entrop loss for each pixel value

Here

Training Process of DAE

The training of DAE consists of below steps:

The training is typically done through optimization algorithms like stochastic gradient descent (SGD) or its variants.

Applications of DAE

DAE architecture

Implementation of DAE

Let us implement DAE in PyTorch for MNIST dataset.

1. Import Libraries

import torch.utils.data
from torchvision import datasets, transforms
import numpy as np
import pandas as pd
 
from torch import nn, optim
 
device = 'cuda' if torch.cuda.is_available() else 'cpu'

                    

2. Define Dataloader

from torch.utils.data import DataLoader
 
transform = transforms.Compose(
    [transforms.ToTensor(), transforms.Normalize(0, 1)])
# Load the MNIST dataset
mnist_dataset_train = datasets.MNIST(
    root='./data', train=True, download=True, transform=transform)
# Load the test MNIST dataset
mnist_dataset_test = datasets.MNIST(
    root='./data', train=False, download=True, transform=transform)
 
batch_size = 128
train_loader = torch.utils.data.DataLoader(
    mnist_dataset_train, batch_size=batch_size, shuffle=True)
test_loader = torch.utils.data.DataLoader(
    mnist_dataset_test, batch_size=5, shuffle=False)

                    

3. Define our Model

class DAE(nn.Module):
    def __init__(self):
        super().__init__()
 
        self.fc1 = nn.Linear(784, 512)
        self.fc2 = nn.Linear(512, 256)
        self.fc3 = nn.Linear(256, 128)
 
        self.fc4 = nn.Linear(128, 256)
        self.fc5 = nn.Linear(256, 512)
        self.fc6 = nn.Linear(512, 784)
 
        self.relu = nn.ReLU()
        self.sigmoid = nn.Sigmoid()
 
    def encode(self, x):
 
        h1 = self.relu(self.fc1(x))
        h2 = self.relu(self.fc2(h1))
        return self.relu(self.fc3(h2))
 
    def decode(self, z):
        h4 = self.relu(self.fc4(z))
        h5 = self.relu(self.fc5(h4))
        return self.sigmoid(self.fc6(h5))
 
    def forward(self, x):
        q = self.encode(x.view(-1, 784))
 
        return self.decode(q)

                    

4. Define our train function

We define a train function that:

def train(epoch, model, train_loader, optimizer,  cuda=True):
    model.train()
    train_loss = 0
    for batch_idx, (data, _) in enumerate(train_loader):
        data.to(device)
        optimizer.zero_grad()
 
        data_noise = torch.randn(data.shape).to(device)
        data_noise = data + data_noise
 
        recon_batch = model(data_noise.to(device))
        loss = criterion(recon_batch, data.view(data.size(0), -1).to(device))
        loss.backward()
 
        train_loss += loss.item() * len(data)
        optimizer.step()
 
        if batch_idx % 100 == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(epoch, batch_idx * len(data), len(train_loader.dataset),
                                                                           100. * batch_idx /
                                                                           len(train_loader),
                                                                           loss.item()))
 
    print('====> Epoch: {} Average loss: {:.4f}'.format(
        epoch, train_loss / len(train_loader.dataset)))

                    

5. Define model, optimizer, and loss function

epochs = 10
 
model = DAE().to(device)
optimizer = optim.Adam(model.parameters(), lr=1e-2)
criterion = nn.MSELoss()

                    


6. Train our model

for epoch in range(1, epochs + 1):
    train(epoch, model, train_loader, optimizer, True)

                    

Output:

Train Epoch: 1 [0/60000 (0%)]    Loss: 0.232055
Train Epoch: 1 [12800/60000 (21%)] Loss: 0.055512
Train Epoch: 1 [25600/60000 (43%)] Loss: 0.050033
Train Epoch: 1 [38400/60000 (64%)] Loss: 0.041521
Train Epoch: 1 [51200/60000 (85%)] Loss: 0.043305
====> Epoch: 1 Average loss: 0.0509
Train Epoch: 2 [0/60000 (0%)] Loss: 0.041658
Train Epoch: 2 [12800/60000 (21%)] Loss: 0.040901
Train Epoch: 2 [25600/60000 (43%)] Loss: 0.040894
Train Epoch: 2 [38400/60000 (64%)] Loss: 0.039513
Train Epoch: 2 [51200/60000 (85%)] Loss: 0.041100
====> Epoch: 2 Average loss: 0.0407
Train Epoch: 3 [0/60000 (0%)] Loss: 0.041685
Train Epoch: 3 [12800/60000 (21%)] Loss: 0.039040
Train Epoch: 3 [25600/60000 (43%)] Loss: 0.038953
Train Epoch: 3 [38400/60000 (64%)] Loss: 0.038851
Train Epoch: 3 [51200/60000 (85%)] Loss: 0.040141

7. Performance of the model

import matplotlib.pyplot as plt
for batch_idx, (data, labels) in enumerate(test_loader):
    data.to(device)
    optimizer.zero_grad()
 
    data_noise = torch.randn(data.shape).to(device)
    data_noise = data + data_noise
 
    recon_batch = model(data_noise.to(device))
    break
 
 
plt.figure(figsize=(20, 12))
for i in range(5):
 
    print(f" Image {i} with label {labels[i]}              ", end="")
    plt.subplot(3, 5, 1+i)
    plt.imshow(data_noise[i, :, :, :].view(
        28, 28).detach().numpy(), cmap='binary')
    plt.subplot(3, 5, 6+i)
    plt.imshow(recon_batch[i, :].view(28, 28).detach().numpy(), cmap='binary')
    plt.axis('off')
    plt.subplot(3, 5, 11+i)
    plt.imshow(data[i, :, :, :].view(28, 28).detach().numpy(), cmap='binary')
    plt.axis('off')
plt.show()

                    

Output:

We see that the model is able to reconstruct our original image quite well compared to our actual image with only 10 epochs of training.

Conclusion

In this article, we saw a variation of auto encoders namely denoising auto encoders, its application and its implementation in python using MNIST dataset and PyTorch framework.


Article Tags :