Open In App

Train Neural Networks With Noise to Reduce Overfitting

Neural networks have revolutionized artificial intelligence but they often fall into the trap of overfitting which may potentially reduce the model’s accuracy and reliability.

To address this issue, we will be uncovering the noise-based regularization technique, that can help us to reduce overfitting.

Training Neural Networks With Noise

In the context of the neural network, noise can be defined as random or unwanted data that interrupts the model’s ability to detect the target patterns or relationships. In some instances, noise can adversely impact the efficient learning capability of a model which tends to provide decreased performance and reduce the model’s accuracy.

However, adding a little noise can improve neural network performance. By introducing randomness during training, known as noise injection, acts like a magic potion for the models.

When the dataset is small we tend to have very few samples, there arises the problem of mapping input and output data, which limits the model’s ability to learn the training data and consequently leads to poor performance.

Noise Injection Techniques

Data augmentation is one of the effective techniques that is used to inject the noise into the input. Perhaps, data augmentation can significantly reduce the generalization error that often occurs in machine learning techniques.

When we have adequate training data, our machine learning model can generalize better. Certainly, in the real world in some instances, the amount of data that we have is limited, which puts the machine learning model in restriction to generalize better. To resolve this kind of issue, we introduce fake data generally known as noise to the training set.

Gaussian noise is one of the most used techniques in data augmentation to inject noise into input data which helps to reduce the overfitting. It has a zero mean and a controllable standard deviation, allowing to adjust the intensity of the noise. It's typically added to the input variables before feeding them to the network.

By generating new synthetic data points through augmentation techniques, the size of training data increases, which results in a larger dataset that provides examples for the model to learn from and helps it to capture the comprehensive representation of the data. Consequently, data augmentation helps in reducing the overfitting by injecting the noise and it also improves the overall robustness of a model.

Alternative Noise Injection Techniques

Alternatively, the Gaussian noise can be injected into input variables, activations, weights, gradients, and outputs.

Benefits of Adding Random Noise

Implementation: Training Neural Network with Noise

For below example, Neural network model is trained on the MNIST dataset with noise injection for regularization starting off with Input layer with a shape of 784, representing the flattened dimension of MNIST images. During training, Gaussian noise with a standard deviation of 0.1 is added to the input data.

import tensorflow as tf
from tensorflow.keras.layers import Input, Dense, GaussianNoise
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.losses import SparseCategoricalCrossentropy

def build_model(input_shape, num_classes):
  inputs = Input(shape=input_shape)
  noisy_inputs = GaussianNoise(0.1)(inputs)
  x= Dense(128, activation='relu')(noisy_inputs)
  x= Dense(64, activation='relu')(x)
  outputs = Dense(num_classes, activation='softmax')(x)
  model = Model(inputs=inputs, outputs=outputs)
  return model

input_shape=(784,)
num_classes=10
model = build_model(input_shape, num_classes)
model.compile(optimizer=Adam(), loss=SparseCategoricalCrossentropy(), metrics=['accuracy'])
#Load the dataset
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
#Preprocessing the data
x_train = x_train.reshape(-1, 784).astype('float32')/255.0
x_test = x_test.reshape(-1, 784).astype('float32')/255.0
history= model.fit(x_train, y_train, batch_size=32, epochs=10, validation_data=(x_test, y_test))

Output:

Epoch 1/10
1875/1875 [==============================] - 13s 6ms/step - loss: 0.2555 - accuracy: 0.9247 - val_loss: 0.1313 - val_accuracy: 0.9601
Epoch 2/10
1875/1875 [==============================] - 8s 5ms/step - loss: 0.1173 - accuracy: 0.9643 - val_loss: 0.0953 - val_accuracy: 0.9702
Epoch 3/10
1875/1875 [==============================] - 10s 5ms/step - loss: 0.0847 - accuracy: 0.9740 - val_loss: 0.0919 - val_accuracy: 0.9728
Epoch 4/10
1875/1875 [==============================] - 9s 5ms/step - loss: 0.0688 - accuracy: 0.9780 - val_loss: 0.0803 - val_accuracy: 0.9745
Epoch 5/10
1875/1875 [==============================] - 9s 5ms/step - loss: 0.0563 - accuracy: 0.9825 - val_loss: 0.0771 - val_accuracy: 0.9768
Epoch 6/10
1875/1875 [==============================] - 9s 5ms/step - loss: 0.0483 - accuracy: 0.9844 - val_loss: 0.0843 - val_accuracy: 0.9746
Epoch 7/10
1875/1875 [==============================] - 8s 4ms/step - loss: 0.0423 - accuracy: 0.9859 - val_loss: 0.0796 - val_accuracy: 0.9756
Epoch 8/10
1875/1875 [==============================] - 9s 5ms/step - loss: 0.0363 - accuracy: 0.9875 - val_loss: 0.0860 - val_accuracy: 0.9766
Epoch 9/10
1875/1875 [==============================] - 9s 5ms/step - loss: 0.0353 - accuracy: 0.9884 - val_loss: 0.0740 - val_accuracy: 0.9790
Epoch 10/10
1875/1875 [==============================] - 8s 4ms/step - loss: 0.0302 - accuracy: 0.9900 - val_loss: 0.0715 - val_accuracy: 0.9811

As we can see that our model's output shows that validation loss and accuracy are not diverging significantly from the training metrics, by this we get to know that our model is not overfitting to the training data since it has the ability to generalize well to unseen validation data.

We can also see that there is a consistent improvement in both training and validation metrics which suggests the model to learn the meaningful pattern instead of memorizing the training data.

Conclusion

By injecting the noise during training, we can improve the model generalization and robustness while training helps the model to control its complexity and prevents it from fitting the training data too closely and potentially reduces the risk of overfitting.

Article Tags :