Open In App

Noise injection for training artificial neural networks

Last Updated : 13 Apr, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

In the training of artificial neural networks, noise injection is a technique used to improve the generalization capabilities of a model. By deliberately adding randomness to the input data or internal components during the training phase, the model becomes more robust to slight variations and noise in real-world data.

In this tutorial will delve into the concept of noise injection, explore its benefits, and provide a detailed guide on implementing this technique in advanced deep learning models.

What is Noise Injection?

The noise injection is the process of adding random noise to input data during the training process. Noise injection can be considered a form of regularization, similar to techniques like dropout or L2 regularization. However, instead of modifying the network structure or weights directly, noise injection introduces randomness into the input data or hidden layers. This randomness helps prevent the model from overfitting to the noise-free training data, encouraging the network to learn more meaningful, generalizable patterns.

Noise injection relates to the concept of robust optimization by training a model on slightly perturbed versions of the data, the neural network is forced to find solutions that are not only good for the training dataset but also for variations of it. This can be particularly useful in applications where the data is expected to be noisy or when the model needs to perform well under varying operational conditions.

Types of noise injection

There are several ways to implement noise injection:

  • Input noise: The input noise is the type of noise that is added to input data during the training. This is very useful because it avoids overfitting. Examples of input noise are Gaussian noise, uniform noise, etc. The input noise forces the model to learn the patterns of training data and the addition of noise makes the training data hard to remember by heart.
  • Weight noise: The weight noise is the type of noise that is added to the model’s weight. This is usually done by adding Gaussian noise, uniform noise, etc. It helps the model by preventing it from depending too much on any weight or feature. This also makes the model more robust to changes in input data.
  • Activation noise: The activation noise is the type of noise that is added to the output of each layer during the training phase. This also helps the model to be more robust to the changes in input data. The addition of noise is done to activation before it is passed to the next layer. It helps the model learn complex patterns of the data.
  • Gradient noise: As the name suggests this type of noise is added to the gradients of the model during the optimization phase. This works by introducing randomness to the process and the addition of noise is done before the weights are updated. This also improves the generalization capabilities of the model.

Each method targets a different aspect of the network, providing unique advantages in training dynamics and outcomes. and distinct patterns and can increase the performance of the model on unseen data during the testing process.

Benefits of Noise Injection

  • Enhanced Generalization: By training with noisy data, models are less likely to overfit and more likely to generalize to unseen data.
  • Robustness to Input Perturbations: Models trained with noise are typically more robust to slight changes or disturbances in input data.
  • Prevents Co-adaptation of Features: Similar to dropout, noise injection can prevent neurons from co-adapting too specifically to the training data, promoting independent feature learning.

Training Artificial Neural Network with Noise : Implementation

The implementation of an artificial neural network (ANN) with noise injection has been shown with MNIST dataset, which consists of handwritten digits.

Implementing Input Noise Injection

Gaussian noise with mean 0 and standard deviation 1 is added to the training data with a noise factor of 0.5. The np.random.normal function generates random noise with the same shape as x_train.

Python3
import tensorflow as tf
import numpy as np

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0  # Normalize data

# Add noise to the training data
noise_factor = 0.5
x_train_noisy = x_train + noise_factor * np.random.normal(loc=0.0, scale=1.0, size=x_train.shape)
x_train_noisy = np.clip(x_train_noisy, 0., 1.)
model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(x_train_noisy, y_train, epochs=10, validation_data=(x_test, y_test))

Output:

Epoch 1/10
1875/1875 [==============================] - 10s 5ms/step - loss: 0.6604 - accuracy: 0.7868 - val_loss: 0.3084 - val_accuracy: 0.9086
Epoch 2/10
1875/1875 [==============================] - 10s 5ms/step - loss: 0.4074 - accuracy: 0.8719 - val_loss: 0.2309 - val_accuracy: 0.9339
Epoch 3/10
1875/1875 [==============================] - 10s 5ms/step - loss: 0.3202 - accuracy: 0.8979 - val_loss: 0.2030 - val_accuracy: 0.9453
Epoch 4/10
1875/1875 [==============================] - 9s 5ms/step - loss: 0.2629 - accuracy: 0.9157 - val_loss: 0.1951 - val_accuracy: 0.9504
Epoch 5/10
1875/1875 [==============================] - 9s 5ms/step - loss: 0.2266 - accuracy: 0.9267 - val_loss: 0.1996 - val_accuracy: 0.9518
Epoch 6/10
1875/1875 [==============================] - 11s 6ms/step - loss: 0.1962 - accuracy: 0.9350 - val_loss: 0.1985 - val_accuracy: 0.9541
Epoch 7/10
1875/1875 [==============================] - 12s 6ms/step - loss: 0.1719 - accuracy: 0.9425 - val_loss: 0.2167 - val_accuracy: 0.9555
Epoch 8/10
1875/1875 [==============================] - 12s 6ms/step - loss: 0.1555 - accuracy: 0.9474 - val_loss: 0.2192 - val_accuracy: 0.9562
Epoch 9/10
1875/1875 [==============================] - 8s 4ms/step - loss: 0.1379 - accuracy: 0.9527 - val_loss: 0.2275 - val_accuracy: 0.9574
Epoch 10/10
1875/1875 [==============================] - 11s 6ms/step - loss: 0.1255 - accuracy: 0.9567 - val_loss: 0.2342 - val_accuracy: 0.9571

Implementing Weight Noise Injection

Weight noise can be added directly to the weights during training by defining a custom training loop or using callbacks. The implementation of using an callback to add Gaussian noise to the weights. The callback introduces Gaussian noise to the weights of each layer (if the layer has weights) at the end of every batch.

With WeightNoise callback In below implementation a custom keras layer is added. WeightNoise Callback Class: The WeightNoise class inherits from tf.keras.callbacks.Callback. It takes two parameters:

  • noise_stddev: Standard deviation of the Gaussian noise to be added to the weights.
  • apply_freq: Frequency at which the noise should be applied, typically measured in batches.

At the end of each batch during training the code checks if the current batch number is a multiple. If it is, the callback iterates through each layer of the model. For layers with trainable weights (kernels), it adds Gaussian noise to the weights using tf.random.normal. This is achieved by directly modifying the kernel attribute of the layer.

Python3
class WeightNoise(tf.keras.callbacks.Callback):
    def __init__(self, noise_stddev=0.01, apply_freq=1):
        super().__init__()
        self.noise_stddev = noise_stddev
        self.apply_freq = apply_freq

    def on_batch_end(self, batch, logs=None):
        if batch % self.apply_freq == 0:
            for layer in self.model.layers:
                if hasattr(layer, 'kernel'):
                    layer.kernel.assign_add(tf.random.normal(shape=layer.kernel.shape, stddev=self.noise_stddev))

# Use the callback during training
model.fit(x_train, y_train, epochs=10, callbacks=[WeightNoise()])

Output:

Epoch 1/10
1875/1875 [==============================] - 35s 19ms/step - loss: 0.4687 - accuracy: 0.8956
Epoch 2/10
1875/1875 [==============================] - 17s 9ms/step - loss: 0.7162 - accuracy: 0.8543
Epoch 3/10
1875/1875 [==============================] - 16s 9ms/step - loss: 0.8129 - accuracy: 0.8335
Epoch 4/10
1875/1875 [==============================] - 18s 10ms/step - loss: 0.8251 - accuracy: 0.8248
Epoch 5/10
1875/1875 [==============================] - 17s 9ms/step - loss: 0.8330 - accuracy: 0.8125
Epoch 6/10
1875/1875 [==============================] - 17s 9ms/step - loss: 0.8875 - accuracy: 0.8037
Epoch 7/10
1875/1875 [==============================] - 18s 10ms/step - loss: 0.8900 - accuracy: 0.8065
Epoch 8/10
1875/1875 [==============================] - 17s 9ms/step - loss: 0.8669 - accuracy: 0.8116
Epoch 9/10
1875/1875 [==============================] - 23s 12ms/step - loss: 0.8927 - accuracy: 0.8028
Epoch 10/10
1875/1875 [==============================] - 18s 10ms/step - loss: 0.9117 - accuracy: 0.7948

Gradient Noise Injection

Gradient noise involves adding noise directly to the gradients during the backpropagation process. This can help in escaping local minima and provide a form of stochastic gradient descent that potentially results in a more robust convergence, especially in complex and rugged loss landscapes.

  • Custom layer called GradientNoise is used, which inherits from tf.keras.layers.Layer. This layer adds Gaussian noise to the gradients during backpropagation in ANN model using the Sequential API, consisting of a Flatten layer, a Dense layer with ReLU activation, a Dropout layer for regularization, and a Dense output layer with softmax activation.
  • Later, GradientNoise layer is added after each Dense layer’s gradient calculation to inject noise into the gradients.
Python3
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Dropout
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.losses import SparseCategoricalCrossentropy

class GradientNoise(tf.keras.layers.Layer):
    def __init__(self, stddev):
        super(GradientNoise, self).__init__()
        self.stddev = stddev

    def call(self, gradients):
        noise = tf.random.normal(tf.shape(gradients), stddev=self.stddev)
        return gradients + noise

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
model = Sequential([
    Flatten(input_shape=(28, 28)),
    Dense(128, activation='relu'),
    Dropout(0.2),
    Dense(10, activation='softmax')
])

# Add Gradient Noise Injection after each layer's gradient calculation
for layer in model.layers:
    if isinstance(layer, Dense):
        layer.kernel_regularizer = GradientNoise(stddev=0.1)

model.compile(optimizer=Adam(), loss=SparseCategoricalCrossentropy(), metrics=['accuracy'])
model.fit(x_train, y_train, epochs=10, validation_data=(x_test, y_test))

Output:

Epoch 1/10
1875/1875 [==============================] - 14s 7ms/step - loss: 0.2937 - accuracy: 0.9148 - val_loss: 0.1451 - val_accuracy: 0.9581
Epoch 2/10
1875/1875 [==============================] - 11s 6ms/step - loss: 0.1426 - accuracy: 0.9574 - val_loss: 0.0971 - val_accuracy: 0.9705
Epoch 3/10
1875/1875 [==============================] - 9s 5ms/step - loss: 0.1065 - accuracy: 0.9679 - val_loss: 0.0862 - val_accuracy: 0.9731
Epoch 4/10
1875/1875 [==============================] - 10s 5ms/step - loss: 0.0873 - accuracy: 0.9729 - val_loss: 0.0745 - val_accuracy: 0.9767
Epoch 5/10
1875/1875 [==============================] - 10s 5ms/step - loss: 0.0756 - accuracy: 0.9768 - val_loss: 0.0772 - val_accuracy: 0.9755
Epoch 6/10
1875/1875 [==============================] - 10s 5ms/step - loss: 0.0643 - accuracy: 0.9794 - val_loss: 0.0669 - val_accuracy: 0.9787
Epoch 7/10
1875/1875 [==============================] - 9s 5ms/step - loss: 0.0574 - accuracy: 0.9816 - val_loss: 0.0726 - val_accuracy: 0.9779
Epoch 8/10
1875/1875 [==============================] - 10s 5ms/step - loss: 0.0528 - accuracy: 0.9829 - val_loss: 0.0783 - val_accuracy: 0.9776
Epoch 9/10
1875/1875 [==============================] - 10s 5ms/step - loss: 0.0467 - accuracy: 0.9844 - val_loss: 0.0685 - val_accuracy: 0.9798
Epoch 10/10
1875/1875 [==============================] - 10s 5ms/step - loss: 0.0429 - accuracy: 0.9859 - val_loss: 0.0719 - val_accuracy: 0.9798

Implementing Activation Noise Injection

Activation noise injection modifies the activations of neurons after they have been computed. Activation Noise Injection involves adding random noise to the output of the activation functionThis can help in promoting non-linear relationships and robust feature extraction, especially beneficial in layers that are prone to saturation or dead neurons. The level of noise injected into the activation output is a hyperparameter that needs to be tuned during model training. Too much noise may degrade the model’s performance, while too little noise may not provide sufficient regularization benefits.

In this model, we replace the typical ReLU activation with a custom NoisyActivation layer that adds Gaussian noise to the activations. NoisyActivation Layer is a custom layer called NoisyActivation, which inherits from tf.keras.layers.Layer. It takes two parameters: activation, which specifies the activation function to be used (like 'relu') and noise_level, which indicates the standard deviation of the Gaussian noise to be added to the output of the activation function.

Python3
class NoisyActivation(tf.keras.layers.Layer):
    def __init__(self, activation, noise_level=0.1):
        super(NoisyActivation, self).__init__()
        self.activation = tf.keras.activations.get(activation)
        self.noise_level = noise_level

    def call(self, inputs):
        return self.activation(inputs) + tf.random.normal(tf.shape(inputs), stddev=self.noise_level)

# Define the model using the NoisyActivation layer
model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28)),
    tf.keras.layers.Dense(128),
    NoisyActivation('relu', noise_level=0.05),  # Replacing standard ReLU with noisy ReLU
    tf.keras.layers.Dropout(0.2),
    tf.keras.layers.Dense(10, activation='softmax')
])

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=10, validation_data=(x_test, y_test))

Output:

Epoch 1/10
1875/1875 [==============================] - 15s 7ms/step - loss: 0.2997 - accuracy: 0.9120 - val_loss: 0.1411 - val_accuracy: 0.9580
Epoch 2/10
1875/1875 [==============================] - 9s 5ms/step - loss: 0.1411 - accuracy: 0.9580 - val_loss: 0.0949 - val_accuracy: 0.9726
Epoch 3/10
1875/1875 [==============================] - 11s 6ms/step - loss: 0.1048 - accuracy: 0.9684 - val_loss: 0.0817 - val_accuracy: 0.9747
Epoch 4/10
1875/1875 [==============================] - 11s 6ms/step - loss: 0.0861 - accuracy: 0.9730 - val_loss: 0.0793 - val_accuracy: 0.9774
Epoch 5/10
1875/1875 [==============================] - 11s 6ms/step - loss: 0.0735 - accuracy: 0.9769 - val_loss: 0.0802 - val_accuracy: 0.9760
Epoch 6/10
1875/1875 [==============================] - 11s 6ms/step - loss: 0.0637 - accuracy: 0.9793 - val_loss: 0.0678 - val_accuracy: 0.9791
Epoch 7/10
1875/1875 [==============================] - 10s 5ms/step - loss: 0.0561 - accuracy: 0.9819 - val_loss: 0.0672 - val_accuracy: 0.9805
Epoch 8/10
1875/1875 [==============================] - 9s 5ms/step - loss: 0.0521 - accuracy: 0.9828 - val_loss: 0.0667 - val_accuracy: 0.9793
Epoch 9/10
1875/1875 [==============================] - 10s 5ms/step - loss: 0.0475 - accuracy: 0.9845 - val_loss: 0.0666 - val_accuracy: 0.9809
Epoch 10/10
1875/1875 [==============================] - 11s 6ms/step - loss: 0.0444 - accuracy: 0.9852 - val_loss: 0.0742 - val_accuracy: 0.9795

Conclusion

The training of artificial neural networks with noise injection can be as effective as the other methods and provides good model accuracy. This method can also be effective in reducing the overfitting in the ann model. By experimenting with different types and amounts of noise, practitioners can significantly enhance model performance, especially in noisy environments. As with any regularization technique, the key is to find the right balance between too little and too much noise, which can typically be achieved through cross-validation and other model tuning strategies.



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads