Implement Convolutional Autoencoder in PyTorch with CUDA

Last Updated : 31 Jul, 2023

Autoencoders are a type of neural network architecture used for unsupervised learning tasks such as data compression, dimensionality reduction, and data denoising. The architecture consists of two main components: an encoder and a decoder. The encoder portion of the network compresses the input data into a lower-dimensional representation, while the decoder portion of the network reconstructs the original input data from this lower-dimensional representation.

A Convolutional Autoencoder (CAE) is an autoencoder a type of deep learning neural network architecture that is commonly used for unsupervised learning tasks, such as image compression and denoising. It is an extension of the traditional autoencoder architecture that incorporates convolutional layers into both the encoder and decoder portions of the network.

Same like the Autoencoder, the Convolutional Autoencoder architecture also consists of two main components: an encoder and a decoder. The encoder portion of the network processes the input image using convolutional layers and pooling operations to produce a lower-dimensional feature representation of the image. The decoder portion of the network takes this lower-dimensional feature representation and upsamples it back to the original input image size using deconvolutional layers. The final output of the network is a reconstructed image that is as close as possible to the original input image.

The training process for a Convolutional Autoencoder is similar to that of a traditional autoencoder. The network is trained to minimize the difference between the original input image and the reconstructed output image using a loss function such as mean squared error (MSE) or binary cross-entropy (BCE). Once trained, the encoder portion of the network can be used for feature extraction, and the decoder portion of the network can be used for image generation or reconstruction.

Convolutional Autoencoders have shown impressive results in a variety of computer vision tasks, including image compression, denoising, and feature extraction. They have also been used in various applications such as image retrieval, object recognition, and anomaly detection.

Implementation in Pytorch:

Algorithm

Load the dataset using PyTorch’s ImageFolder class and define a dataloader.
Define the Convolutional Autoencoder architecture by creating an Autoencoder class that contains an encoder and decoder, each with convolutional and pooling layers.
Initialize the autoencoder model and move it to the GPU if available using the to() method.
Define the loss function and optimizer to use during training. Typically, mean squared error (MSE) loss is used, and the Adam optimizer is a popular choice for deep learning tasks.
Set the number of epochs to train for and begin the training loop.
In each epoch, iterate through the batches of the dataloader, move the data to the GPU, and perform forward propagation to obtain the autoencoder’s output.
Calculate the loss between the output and the input using the loss function.
Perform backward propagation to calculate the gradients of the model parameters with respect to the loss.\
Use the optimizer to update the model parameters based on the calculated gradients.
Print the loss after each epoch to monitor the training progress.
Save the trained model to a file using the state_dict() method.

Code:

Python

import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.datasets as datasets
import torchvision.transforms as transforms
 
# Define the autoencoder architecture
class Autoencoder(nn.Module):
    def __init__(self):
        super(Autoencoder, self).__init__()
        self.encoder = nn.Sequential(
            nn.Conv2d(3, 16, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Conv2d(16, 8, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2)
        )
        self.decoder = nn.Sequential(
            nn.ConvTranspose2d(8, 16, 
                               kernel_size=3, 
                               stride=2, 
                               padding=1, 
                               output_padding=1),
            nn.ReLU(),
            nn.ConvTranspose2d(16, 3, 
                               kernel_size=3, 
                               stride=2, 
                               padding=1, 
                               output_padding=1),
            nn.Sigmoid()
        )
         
    def forward(self, x):
        x = self.encoder(x)
        x = self.decoder(x)
        return x
 
 
# Initialize the autoencoder
model = Autoencoder()
 
# Define transform
transform = transforms.Compose([
    transforms.Resize((64, 64)),
    transforms.ToTensor(),
])
 
# Load dataset
train_dataset = datasets.Flowers102(root='flowers', 
                                    split='train', 
                                    transform=transform, 
                                    download=True)
test_dataset = datasets.Flowers102(root='flowers', 
                                   split='test', 
                                   transform=transform)
# Define the dataloader
train_loader = torch.utils.data.DataLoader(dataset=train_dataset, 
                                           batch_size=128, 
                                           shuffle=True)
test_loader = torch.utils.data.DataLoader(dataset=test_dataset, 
                                          batch_size=128)
 
# Move the model to GPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(device)
model.to(device)
 
# Define the loss function and optimizer
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
 
# Train the autoencoder
num_epochs = 50
for epoch in range(num_epochs):
    for data in train_loader:
        img, _ = data
        img = img.to(device)
        optimizer.zero_grad()
        output = model(img)
        loss = criterion(output, img)
        loss.backward()
        optimizer.step()
    if epoch % 5== 0:
        print('Epoch [{}/{}], Loss: {:.4f}'.format(epoch+1, num_epochs, loss.item()))
 
# Save the model
torch.save(model.state_dict(), 'conv_autoencoder.pth')

Output:

cuda
Epoch [1/50], Loss: 0.0919
Epoch [6/50], Loss: 0.0746
Epoch [11/50], Loss: 0.0362
Epoch [16/50], Loss: 0.0239
Epoch [21/50], Loss: 0.0178
Epoch [26/50], Loss: 0.0154
Epoch [31/50], Loss: 0.0144
Epoch [36/50], Loss: 0.0124
Epoch [41/50], Loss: 0.0127
Epoch [46/50], Loss: 0.0101

Plot the original image with decoded image

Python3

with torch.no_grad():
    for data, _ in test_loader:
        data = data.to(device)
        recon = model(data)
        break
         
import matplotlib.pyplot as plt
plt.figure(dpi=250)
fig, ax = plt.subplots(2, 7, figsize=(15, 4))
for i in range(7):
    ax[0, i].imshow(data[i].cpu().numpy().transpose((1, 2, 0)))
    ax[1, i].imshow(recon[i].cpu().numpy().transpose((1, 2, 0)))
    ax[0, i].axis('OFF')
    ax[1, i].axis('OFF')
plt.show()