Open In App

Contrastive Divergence in Restricted Boltzmann Machines

Contrastive Divergence (CD) is a fundamental technique in the realm of machine learning, particularly in the field of unsupervised learning and specifically in training Restricted Boltzmann Machines (RBMs). It serves as a crucial component in the learning process by approximating the gradient needed to update the weights in these models. Initially introduced by Geoffrey Hinton in the context of training RBMs, Contrastive Divergence (CD) has since become a cornerstone in various deep-learning algorithms. This article delves into the depths of Contrastive Divergence, from its foundational concepts to its practical applications.

Restricted Boltzmann Machines (RBMs)

Stochastic Artificial Neural Networks (SANNs) refer to neural network models that incorporate stochasticity in their computations. Stochastic neural networks introduce randomness in different aspects of the network’s operation, often to improve learning dynamics, enable better generalization, or address computational challenges associated with certain types of models.



Boltzmann Machines, which include Restricted Boltzmann Machines (RBMs), are types of stochastic neural networks. These models use Gibbs sampling, a Markov Chain Monte Carlo method, to generate samples from the joint distribution of the visible and hidden units.

RBMs consist of two layers – a visible layer and a hidden layer – each with binary units. Unlike traditional neural networks, there are no connections within the visible or hidden layers. Every visible unit is connected to every hidden unit, and vice versa, forming a fully connected bipartite graph.



The RBM’s energy function determines the compatibility of a configuration of visible and hidden units in matrix notation :

The energy function defines an energy value for each possible configuration of visible and hidden unit states. The negative sign in front of the summation terms indicates that lower energy values correspond to more probable or desirable states.

Energy Based Models

Contrastive Divergence (CD) is intimately connected with Energy-Based Models (EBMs). EBMs are a class of models used in probabilistic machine learning. They define a probability distribution over a set of configurations using an energy function. RBMs are a specific type of EBM, known for their applicability in unsupervised learning tasks such as dimensionality reduction and feature learning.

Markov Chain Monte Carlo (MCMC) Methods

CD relies on concepts from MCMC methods. MCMC techniques, including Gibbs Sampling, are employed to approximate complex probability distributions by sampling from them. Gibbs Sampling, in particular, is integral to CD and involves iteratively sampling from conditional probability distributions.

Gibbs Sampling is a key component of CD. It’s an iterative algorithm used to sample from joint probability distributions by sampling from each variable’s conditional probability distribution while holding other variables fixed.

What is Contrastive Divergence?

At its core, Contrastive Divergence is an iterative algorithm employed in training RBMs, which are a type of probabilistic graphical model used for dimensionality reduction, feature learning, and collaborative filtering. The primary objective of CD is to estimate the gradient of the log-likelihood function associated with the RBM.

To comprehend CD, it’s essential to grasp RBMs briefly. RBMs consist of visible and hidden layers where nodes within layers are interconnected but do not have connections within the same layer. CD operates by updating the weights of these connections to minimize the difference between the observed data and the reconstructed data generated by the RBM.

Contrastive Divergence Algorithm in Restricted Boltzmann Machines

CD operates through several steps, beginning with initializing RBMs with random weights. It performs Gibbs Sampling to estimate the gradient needed for weight updates by contrasting the observed data and the model’s reconstructed data.

Contrastive Divergence Algorithm Implementations in Python

Implementing Contrastive Divergence in Python involves setting up an RBM and performing the CD steps.

The code provides an example of using the RBM class:

  1. np.random.seed(42): Sets the random seed for reproducibility.
  2. num_visible and num_hidden: Define the number of visible and hidden units.
  3. data: Represents the input dataset used for training (random data is generated here for demonstration purposes).
  4. rbm = RBM(num_visible, num_hidden): Creates an RBM instance.
  5. rbm.contrastive_divergence(data): Trains the RBM using the Contrastive Divergence algorithm on the provided dataset.
import numpy as np
 
class RBM:
    def __init__(self, num_visible, num_hidden):
        self.num_visible = num_visible
        self.num_hidden = num_hidden
        self.weights = np.random.randn(num_visible, num_hidden)
        self.visible_bias = np.zeros(num_visible)
        self.hidden_bias = np.zeros(num_hidden)
 
    def sigmoid(self, x):
        return 1 / (1 + np.exp(-x))
 
    def gibbs_sampling(self, visible_data, k=1):
        for _ in range(k):
            hidden_probs = self.sigmoid(np.dot(visible_data, self.weights) + self.hidden_bias)
            hidden_states = np.random.rand(len(visible_data), self.num_hidden) < hidden_probs
            visible_probs = self.sigmoid(np.dot(hidden_states, self.weights.T) + self.visible_bias)
            visible_data = np.random.rand(len(visible_data), self.num_visible) < visible_probs
        return visible_data, hidden_probs
 
    def contrastive_divergence(self, data, learning_rate=0.1, k=1, epochs=10):
        for _ in range(epochs):
            positive_hidden_probs = self.sigmoid(np.dot(data, self.weights) + self.hidden_bias)
            positive_hidden_states = np.random.rand(len(data), self.num_hidden) < positive_hidden_probs
            positive_associations = np.dot(data.T, positive_hidden_probs)
 
            recon_data, recon_hidden_probs = self.gibbs_sampling(data, k)
            negative_visible_probs = recon_data
            negative_hidden_probs = recon_hidden_probs
            negative_associations = np.dot(recon_data.T, negative_hidden_probs)
 
            self.weights += learning_rate * (positive_associations - negative_associations)
            self.visible_bias += learning_rate * np.mean(data - negative_visible_probs, axis=0)
            self.hidden_bias += learning_rate * np.mean(positive_hidden_probs - negative_hidden_probs, axis=0)
 
# Example usage
np.random.seed(42# For reproducibility
num_visible = 6
num_hidden = 3
data = np.random.rand(100, num_visible)  # Sample data
 
rbm = RBM(num_visible, num_hidden)
rbm.contrastive_divergence(data)

                    

RBM Class

The RBM class is used to define the Restricted Boltzmann Machine.

1. __init__() method:

2. sigmoid() method:

3. gibbs_sampling() method:

4. contrastive_divergence() method:

Advantages and Challenges of Contrastive Divergence

Contrastive Divergence offers several advantages in training RBMs:

However, CD also presents some challenges:

Applications of Contrastive Divergence

Conclusion

Contrastive Divergence stands as a cornerstone algorithm in training Restricted Boltzmann Machines. Despite its approximative nature, it serves as a practical and computationally efficient method for estimating gradients in RBM training. Its iterative steps enable the model to learn and optimize the weights efficiently, contributing significantly to the field of unsupervised learning and deep neural networks. Further research continues to explore variations and improvements to this fundamental technique, aiming to enhance its accuracy and applicability across diverse domains.


Article Tags :