Train Neural Networks With Noise to Reduce Overfitting

Neural networks have revolutionized artificial intelligence but they often fall into the trap of overfitting which may potentially reduce the model’s accuracy and reliability.

To address this issue, we will be uncovering the noise-based regularization technique, that can help us to reduce overfitting.

Table of Content

Training Neural Networks With Noise
Noise Injection Techniques
Implementation: Training Neural Network with Noise

Training Neural Networks With Noise

In the context of the neural network, noise can be defined as random or unwanted data that interrupts the model’s ability to detect the target patterns or relationships. In some instances, noise can adversely impact the efficient learning capability of a model which tends to provide decreased performance and reduce the model’s accuracy.

However, adding a little noise can improve neural network performance. By introducing randomness during training, known as noise injection, acts like a magic potion for the models.

When the dataset is small we tend to have very few samples, there arises the problem of mapping input and output data, which limits the model’s ability to learn the training data and consequently leads to poor performance.

Noise Injection Techniques

Data augmentation is one of the effective techniques that is used to inject the noise into the input. Perhaps, data augmentation can significantly reduce the generalization error that often occurs in machine learning techniques.

When we have adequate training data, our machine learning model can generalize better. Certainly, in the real world in some instances, the amount of data that we have is limited, which puts the machine learning model in restriction to generalize better. To resolve this kind of issue, we introduce fake data generally known as noise to the training set.

Gaussian noise is one of the most used techniques in data augmentation to inject noise into input data which helps to reduce the overfitting. It has a zero mean and a controllable standard deviation, allowing to adjust the intensity of the noise. It's typically added to the input variables before feeding them to the network.

The type and amount of noise added are crucial hyperparameters. Too little noise has minimal impact, while too much can make learning difficult. Experimentation is needed to find the optimal settings.
Noise Injection Timing: Noise is typically only added during training. The model should be evaluated and used for predictions on clean data without any noise injection.

By generating new synthetic data points through augmentation techniques, the size of training data increases, which results in a larger dataset that provides examples for the model to learn from and helps it to capture the comprehensive representation of the data. Consequently, data augmentation helps in reducing the overfitting by injecting the noise and it also improves the overall robustness of a model.

Alternative Noise Injection Techniques

Alternatively, the Gaussian noise can be injected into input variables, activations, weights, gradients, and outputs.

Injecting noise to activations: The noise injection in the activation layer, where the noise is injected directly into the activation layer permitting the injected noise to be utilized by the network at any point in time during the forward pass through the network layer. Injecting noise into an activation layer is very helpful when we have a very deep neural network which helps the network to regularize well and prevents overfitting. The output layer can inject the noise by itself with the help of a noisy activation function.
Injecting noise to weights: In the context of recurrent neural networks, adding noise to the weights is one of the beneficial techniques to regularize the model. When the noise is injected into the weights it generally encourages the stability in the function being learned by the neural network. This is an efficient injecting method because it directly injects the noise into weights rather than injecting noise into input or output layers in the neural network.
Injecting noise to gradients: Instead of focusing on the structure of the input domain, injecting noise to the gradients primarily centers on enhancing the robustness of the optimization process. Just like gradient descent, the amount of noise can begin high while training and can also generally decrease over time. When we have a deep neural network, injecting noise into a gradient is one of the most effective methods to be noticed.

Benefits of Adding Random Noise

Prevents overfitting: When we introduce noise into the training process, it adds variability to the data, which means that the introduction of noise can cause the data points to be less distinct from each other rather than the network trying to fit into each data point correctly. This prevents the network from fitting the training samples too closely and hence it mitigates overfitting.
Low generalization error: The presence of noise discourages the network from memorizing the specific training samples and encourages the network to learn the generalizable features from the data leading to low generalization error.
Improved performance: The injection of noise during the training of a neural network can significantly improve the generalization performance of the model. In addition to that noise injection during the training of a neural network carries the regularization effect that possibly helps to improve the model’s robustness.
Serves as data augmentation: Noise injection introduces a data augmentation technique, that helps us to add random noise to the input variables during training. Since it can uniquely transform the input variables whenever it is revealed to the model, it helps the model from overfitting.

Implementation: Training Neural Network with Noise

For below example, Neural network model is trained on the MNIST dataset with noise injection for regularization starting off with Input layer with a shape of 784, representing the flattened dimension of MNIST images. During training, Gaussian noise with a standard deviation of 0.1 is added to the input data.

Training is performed on the training dataset with noisy inputs.
Validation is conducted on the testing dataset to evaluate model performance.

Python3

import tensorflow as tf
from tensorflow.keras.layers import Input, Dense, GaussianNoise
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.losses import SparseCategoricalCrossentropy

def build_model(input_shape, num_classes):
  inputs = Input(shape=input_shape)
  noisy_inputs = GaussianNoise(0.1)(inputs)
  x= Dense(128, activation='relu')(noisy_inputs)
  x= Dense(64, activation='relu')(x)
  outputs = Dense(num_classes, activation='softmax')(x)
  model = Model(inputs=inputs, outputs=outputs)
  return model

input_shape=(784,)
num_classes=10
model = build_model(input_shape, num_classes)
model.compile(optimizer=Adam(), loss=SparseCategoricalCrossentropy(), metrics=['accuracy'])
#Load the dataset
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
#Preprocessing the data
x_train = x_train.reshape(-1, 784).astype('float32')/255.0
x_test = x_test.reshape(-1, 784).astype('float32')/255.0
history= model.fit(x_train, y_train, batch_size=32, epochs=10, validation_data=(x_test, y_test))

Output:

Epoch 1/10
1875/1875 [==============================] - 13s 6ms/step - loss: 0.2555 - accuracy: 0.9247 - val_loss: 0.1313 - val_accuracy: 0.9601
Epoch 2/10
1875/1875 [==============================] - 8s 5ms/step - loss: 0.1173 - accuracy: 0.9643 - val_loss: 0.0953 - val_accuracy: 0.9702
Epoch 3/10
1875/1875 [==============================] - 10s 5ms/step - loss: 0.0847 - accuracy: 0.9740 - val_loss: 0.0919 - val_accuracy: 0.9728
Epoch 4/10
1875/1875 [==============================] - 9s 5ms/step - loss: 0.0688 - accuracy: 0.9780 - val_loss: 0.0803 - val_accuracy: 0.9745
Epoch 5/10
1875/1875 [==============================] - 9s 5ms/step - loss: 0.0563 - accuracy: 0.9825 - val_loss: 0.0771 - val_accuracy: 0.9768
Epoch 6/10
1875/1875 [==============================] - 9s 5ms/step - loss: 0.0483 - accuracy: 0.9844 - val_loss: 0.0843 - val_accuracy: 0.9746
Epoch 7/10
1875/1875 [==============================] - 8s 4ms/step - loss: 0.0423 - accuracy: 0.9859 - val_loss: 0.0796 - val_accuracy: 0.9756
Epoch 8/10
1875/1875 [==============================] - 9s 5ms/step - loss: 0.0363 - accuracy: 0.9875 - val_loss: 0.0860 - val_accuracy: 0.9766
Epoch 9/10
1875/1875 [==============================] - 9s 5ms/step - loss: 0.0353 - accuracy: 0.9884 - val_loss: 0.0740 - val_accuracy: 0.9790
Epoch 10/10
1875/1875 [==============================] - 8s 4ms/step - loss: 0.0302 - accuracy: 0.9900 - val_loss: 0.0715 - val_accuracy: 0.9811

As we can see that our model's output shows that validation loss and accuracy are not diverging significantly from the training metrics, by this we get to know that our model is not overfitting to the training data since it has the ability to generalize well to unseen validation data.

We can also see that there is a consistent improvement in both training and validation metrics which suggests the model to learn the meaningful pattern instead of memorizing the training data.

Conclusion

By injecting the noise during training, we can improve the model generalization and robustness while training helps the model to control its complexity and prevents it from fitting the training data too closely and potentially reduces the risk of overfitting.

Article Tags :

AI-ML-DS

Machine Learning

AI-ML-DS With Python