Using Early Stopping to Reduce Overfitting in Neural Networks

Overfitting is a common challenge in training neural networks. It occurs when a model learns to memorize the training data rather than generalize patterns from it, leading to poor performance on unseen data. While various regularization techniques like dropout and weight decay can help combat overfitting, early stopping stands out as a simple yet effective method to prevent neural networks from overfitting. In this article, we will demonstrate how can we reduce overfitting in neural networks.

What is Early Stopping?

Early stopping is a form of regularization that halts the training process when the performance of the model on a validation dataset starts to degrade. Instead of training the model until convergence, early stopping monitors the validation error during training and stops the training process when the validation error begins to increase.

Advantages of Early Stopping

Prevents Overfitting: The primary objective of early stopping is to prevent overfitting by monitoring the model's performance on a validation dataset during training. By halting the training process when the validation error starts to increase, early stopping prevents the model from becoming excessively complex and memorizing noise in the training data.
Conserves Computational Resources: Training deep neural networks can be computationally intensive, especially with large datasets and complex architectures. Early stopping helps conserve computational resources by terminating the training process when further improvement in validation performance is unlikely. This leads to reduced training time and computational costs.
Enhances Generalization: By curbing overfitting, early stopping encourages the model to generalize better to unseen data. Models trained with early stopping demonstrate improved performance on unseen datasets or real-world applications, as they capture underlying patterns without being swayed by noise or irrelevant details.
Simple Implementation: Unlike some other regularization techniques that require tuning hyperparameters or modifying the model architecture, early stopping is straightforward to implement and requires minimal additional effort. It involves monitoring the validation error during training and halting the process when a predefined criterion, such as no improvement for a certain number of epochs, is met.

Using Early Stopping to Reduce Overfitting in Neural Networks in Python

To demonstrate the effectiveness of early stopping in reducing overfitting, let's train two neural network models on the MNIST dataset: one with early stopping and another without. We will compare their performances on both the training and validation datasets.

Step 1: Data Loading and Preprocessing

The MNIST dataset, a classic dataset for handwritten digit classification, is loaded using the mnist.load_data() function provided by TensorFlow's Keras API.
This dataset comprises 28x28 grayscale images of handwritten digits (0-9).
To facilitate effective training, pixel values of the images are normalized to the range [0, 1] by dividing them by 255.0. Normalization helps in stabilizing the training process and improving convergence.

import tensorflow as tf
from tensorflow.keras.datasets import mnist

# Load MNIST dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Preprocess data: Scale pixel values to [0, 1]
x_train, x_test = x_train / 255.0, x_test / 255.0

Step 2: Model Definition and Compilation

Two neural network models are defined using the Sequential API provided by Keras. Sequential models are created by stacking layers sequentially, making them easy to build and understand. Both models consist of the following layers:

Flatten Layer: This layer serves as the input layer and transforms the 2D input data (28x28 images) into a 1D array.
Dense Hidden Layer: A dense layer (also known as a fully connected layer) with 128 neurons and ReLU activation is employed. ReLU (Rectified Linear Unit) is a widely used activation function that introduces non-linearity into the network, enabling it to learn complex patterns.
Dropout Layer: Dropout is a regularization technique that randomly sets a fraction of input units to zero during training, effectively 'dropping out' some neurons. This helps prevent overfitting by promoting model generalization. A dropout rate of 0.2 is applied in both models.
Output Layer: The output layer consists of 10 neurons (corresponding to the 10 digit classes) with softmax activation, which outputs probabilities representing the likelihood of each class.

After defining the model architecture, both models are compiled using the Adam optimizer, which is an adaptive learning rate optimization algorithm, and sparse categorical cross-entropy loss, suitable for multi-class classification tasks where the target labels are integers.

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Dropout

# Define model architecture without early stopping
model_without_early_stopping = Sequential([
    Flatten(input_shape=(28, 28)),
    Dense(128, activation='relu'),
    Dropout(0.2),
    Dense(10, activation='softmax')
])

# Compile model without early stopping
model_without_early_stopping.compile(optimizer='adam',
                                     loss='sparse_categorical_crossentropy',
                                     metrics=['accuracy'])

Step 3: Model Training

Training of the models is conducted on the preprocessed training data. The fit() method is called on each model, specifying the input data (x_train) and target labels (y_train).
Additionally, a validation split of 0.2 is specified, meaning 20% of the training data is reserved for validation during training. This split helps monitor the model's performance on unseen data and detect overfitting.

# Train model without early stopping
history_without_early_stopping = model_without_early_stopping.fit(x_train, y_train, epochs=20, validation_split=0.2)

Step 4: Applying Early Stopping to Prevent Overfitting

For one of the models, an EarlyStopping callback is employed during training. This callback monitors the validation loss (specified as 'val_loss') and halts training if the validation loss does not decrease for a specified number of epochs, in this case, three (patience=3).
Early stopping prevents overfitting by stopping training when further improvement on the validation set is unlikely, thus promoting better generalization.

from tensorflow.keras.callbacks import EarlyStopping

# Define model architecture with early stopping
model_with_early_stopping = Sequential([
    Flatten(input_shape=(28, 28)),
    Dense(128, activation='relu'),
    Dropout(0.2),
    Dense(10, activation='softmax')
])

# Compile model with early stopping
model_with_early_stopping.compile(optimizer='adam',
                                  loss='sparse_categorical_crossentropy',
                                  metrics=['accuracy'])

# Define early stopping callback
early_stopping = EarlyStopping(monitor='val_loss', patience=3)

# Train model with early stopping
history_with_early_stopping = model_with_early_stopping.fit(x_train, y_train, epochs=20, validation_split=0.2, callbacks=[early_stopping])

Step 5: Model Evaluation

After training, both models are evaluated on the test dataset (x_test, y_test) using the evaluate() method.
This computes the loss and accuracy of the models on the unseen test data, providing a quantitative measure of their performance. The test accuracy of both models is then printed for comparison.

# Evaluate models on test data
test_loss_without_early_stopping, test_acc_without_early_stopping = model_without_early_stopping.evaluate(x_test, y_test)
test_loss_with_early_stopping, test_acc_with_early_stopping = model_with_early_stopping.evaluate(x_test, y_test)

# Print test accuracies
print("Test Accuracy without Early Stopping:", test_acc_without_early_stopping)
print("Test Accuracy with Early Stopping:", test_acc_with_early_stopping)

Complete Code

Python3

import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Dropout
from tensorflow.keras.callbacks import EarlyStopping

# Load MNIST dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Preprocess data: Scale pixel values to [0, 1]
x_train, x_test = x_train / 255.0, x_test / 255.0

# Define model architecture without early stopping
model_without_early_stopping = Sequential([
    Flatten(input_shape=(28, 28)),
    Dense(128, activation='relu'),
    Dropout(0.2),
    Dense(10, activation='softmax')
])

# Compile model without early stopping
model_without_early_stopping.compile(optimizer='adam',
                                     loss='sparse_categorical_crossentropy',
                                     metrics=['accuracy'])

# Train model without early stopping
history_without_early_stopping = model_without_early_stopping.fit(x_train, y_train, epochs=20, validation_split=0.2)

# Define model architecture with early stopping
model_with_early_stopping = Sequential([
    Flatten(input_shape=(28, 28)),
    Dense(128, activation='relu'),
    Dropout(0.2),
    Dense(10, activation='softmax')
])

# Compile model with early stopping
model_with_early_stopping.compile(optimizer='adam',
                                  loss='sparse_categorical_crossentropy',
                                  metrics=['accuracy'])

# Define early stopping callback
early_stopping = EarlyStopping(monitor='val_loss', patience=3)

# Train model with early stopping
history_with_early_stopping = model_with_early_stopping.fit(x_train, y_train, epochs=20, validation_split=0.2, callbacks=[early_stopping])

# Evaluate models on test data
test_loss_without_early_stopping, test_acc_without_early_stopping = model_without_early_stopping.evaluate(x_test, y_test)
test_loss_with_early_stopping, test_acc_with_early_stopping = model_with_early_stopping.evaluate(x_test, y_test)

# Print test accuracies
print("Test Accuracy without Early Stopping:", test_acc_without_early_stopping)
print("Test Accuracy with Early Stopping:", test_acc_with_early_stopping)

Output:

Test Accuracy without Early Stopping: 0.9782999753952026
Test Accuracy with Early Stopping: 0.9790999889373779

To validate the efficacy of early stopping, we conducted an experiment training two neural network models on the MNIST dataset: one with early stopping and another without. Despite both models achieving high accuracy on the training data, the model trained with early stopping exhibited a slightly lower test accuracy compared to the model trained without it.

At first glance, this result might seem counterintuitive. However, it's crucial to consider the broader context and the advantages of early stopping. While the model trained without early stopping may achieve marginally higher accuracy on the test dataset in this specific experiment, it is more likely to be overfitted to the training data. Conversely, the model trained with early stopping is less prone to overfitting and is more likely to generalize better to unseen data.

The slight difference in test accuracy between the two models underscores the trade-off between maximizing training performance and optimizing for generalization. In many real-world scenarios, preventing overfitting and achieving good generalization are paramount, making early stopping a valuable regularization technique despite occasional fluctuations in test accuracy.

Article Tags :

AI-ML-DS

Deep Learning

AI-ML-DS With Python

Neural Network