Open In App

Applying Batch Normalization in Keras using BatchNormalization Class

Last Updated : 30 Apr, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

Training deep neural networks presents difficulties such as vanishing gradients and slow convergence. In 2015, Sergey Ioffe and Christian Szegedy introduced Batch Normalization as a powerful technique to tackle these challenges. This article will explore Batch Normalization and how it can be utilized in Keras, a well-known deep-learning framework.

What is meant by Batch Normalization in Deep Learning?

Batch Normalization is a technique used in deep learning to standardize the inputs of each layer, ensuring stable training by reducing internal covariate shifts and accelerating convergence. It involves normalizing the activations with mean and variance calculated over mini-batches, along with learnable parameters for scaling and shifting.

Applying Batch Normalization in Keras using BatchNormalization Class

The keras.layers.BatchNormalization class in Keras implements Batch Normalization, a technique used to normalize the activations of a layer in a neural network.

Syntax of BatchNormalization Class in Keras

keras.layers.BatchNormalization(
                         axis=-1, 
                         momentum=0.99, 
                         epsilon=0.001, 
                         center=True, 
                         scale=True, 
                         beta_initializer="zeros", 
                         gamma_initializer="ones", 
                         moving_mean_initializer="zeros", 
                         moving_variance_initializer="ones", 
                         beta_regularizer=None, 
                         gamma_regularizer=None, 
                         beta_constraint=None, 
                         gamma_constraint=None, 
                         synchronized=False,
                         **kwargs)

BatchNormalization Class Parameters

Here’s a breakdown of its parameters:

  • axis: Specifies the axis along which normalization is applied. By default, it normalizes along the last axis (usually the features axis).
  • momentum: A float value between 0 and 1 that represents the exponential decay rate for the moving mean and moving variance estimates. A higher momentum value means the statistics from previous batches have more influence.
  • epsilon: A small float value added to the variance to prevent division by zero.
  • center: If True, the layer will learn an offset parameter (beta). If False, this parameter is disabled.
  • scale: If True, the layer will learn a scale parameter (gamma). If False, this parameter is disabled.
  • beta_initializer: Initializer for the beta (offset) parameter.
  • gamma_initializer: Initializer for the gamma (scale) parameter.
  • moving_mean_initializer: Initializer for the moving mean parameter.
  • moving_variance_initializer: Initializer for the moving variance parameter.
  • beta_regularizer: Regularizer function applied to the beta parameter.
  • gamma_regularizer: Regularizer function applied to the gamma parameter.
  • beta_constraint: Constraint function applied to the beta parameter.
  • gamma_constraint: Constraint function applied to the gamma parameter.
  • synchronized: A boolean indicating whether Batch Normalization should be synchronized across replicas during distributed training. This is useful for distributed training setups.
  • kwargs: Additional keyword arguments accepted by the base Layer class.

These parameters allow for fine-tuning and customization of the Batch Normalization layer according to specific requirements and architectural considerations. For example, you can control whether to include learnable parameters (beta and gamma), specify the initialization and regularization methods, and adjust the axis of normalization.

Implementing BatchNormalization Class in Keras

In this section, we are going to cover all the steps required to implement Batch Normalization in Keras with help of BatchNormalization Class. Let’s discuss the steps:

Step 1: Importing Libraries

import numpy as np
from sklearn.model_selection import train_test_split
from keras.models import Sequential
from keras.layers import Dense, BatchNormalization

Step 2: Create a dummy dataset

# Generate toy dataset
np.random.seed(0)
X = np.random.randn(1000, 10)  # 1000 samples, 10 features
y = np.random.randint(2, size=(1000,))  # Binary labels

Step 3: Define the Model

A sequential model is defined using Sequential(). It consists of three dense layers. The first two layers have ReLU activation functions and Batch Normalization layers after them, and the final layer has a sigmoid activation function for binary classification.

# Define the model
model = Sequential()
model.add(Dense(64, input_shape=(10,), activation='relu'))
model.add(BatchNormalization())
model.add(Dense(32, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(1, activation='sigmoid'))

Step 4: Compiling the Model

# Train the model
model.fit(X_train, y_train, epochs=20, batch_size=32, validation_split=0.1)

Complete Implementation of Batch Normalization using Keras Library

Python3
import numpy as np
from sklearn.model_selection import train_test_split
from keras.models import Sequential
from keras.layers import Dense, BatchNormalization

# Generate toy dataset
np.random.seed(0)
X = np.random.randn(1000, 10)  # 1000 samples, 10 features
y = np.random.randint(2, size=(1000,))  # Binary labels

# Split dataset into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define the model
model = Sequential()
model.add(Dense(64, input_shape=(10,), activation='relu'))
model.add(BatchNormalization())
model.add(Dense(32, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(1, activation='sigmoid'))

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

#Print Model Summary
model.summary()

Output:

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 64)                704       
                                                                 
 batch_normalization (Batch  (None, 64)                256       
 Normalization)                                                  
                                                                 
 dense_1 (Dense)             (None, 32)                2080      
                                                                 
 batch_normalization_1 (Bat  (None, 32)                128       
 chNormalization)                                                
                                                                 
 dense_2 (Dense)             (None, 1)                 33        
                                                                 
=================================================================
Total params: 3201 (12.50 KB)
Trainable params: 3009 (11.75 KB)
Non-trainable params: 192 (768.00 Byte)

Best Practices for using BatchNormalization Class in Keras

When using Batch Normalization in Keras, several best practices can help ensure optimal performance and stability:

  1. Consistent Application: It is recommended using Batch Normalization effect across all the layers of the neural network following an element-wise activation function at each layer. It ensures that network does not leverage scaled activations and adapts to modified inputs during training and evaluation (inference).
  2. Initialization: The starting value of Batch Normalization parameters (B. e. a gamma and beta) will affect training process and fast convergence. Firstly, we would basically need to set these parameters with more appropriate values, for example gamma could be set to ‘1’ and beta to ‘0’. Such adaptations would improve training as well as speed up the convergence.
  3. Monitoring Convergence: During training, it’s essential to monitor convergence metrics such as training loss and validation accuracy. Batch Normalization can affect the training dynamics, so it’s crucial to assess its impact on convergence and adjust hyperparameters accordingly.

By following these best practices, practitioners can effectively leverage Batch Normalization in Keras to develop robust and efficient deep learning models.



Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads