This article aims to implement a deep neural network with an arbitrary number of hidden layers each containing different numbers of neurons. We will be implementing this neural net using a few helper functions and at last, we will combine these functions to make the L-layer neural network model.
L – layer deep neural network structure (for understanding)
The model’s structure is [LINEAR -> tanh](L-1 times) -> LINEAR -> SIGMOID. i.e., it has L-1 layers using the hyperbolic tangent function as activation function followed by the output layer with a sigmoid activation function.
More about activation functions
Step by step implementation of the neural network:
- Initialize the parameters for the L layers
- Implement the forward propagation module
- Compute the loss at the final layer
- Implement the backward propagation module
- Finally, update the parameters
- Train the model using existing training dataset
- Use trained parameters to test model
Naming conventions followed in the article to prevent confusion:
- Each layer in the network is represented by a set of two parameters W matrix (weight matrix) and b matrix (bias matrix). For layer, i these parameters are represented as Wi and bi respectively.
- The linear output of layer, i is represented as Zi, and the output after activation is represented as Ai. The dimensions of Zi and Ai are the same.
Dimensions of the weights and bias matrices.
The input layer is of the size (x, m) where m is the number of images.
|Layer number||Shape of W||Shape of b||Linear Output||Shape of Activation|
|Layer L – 1|
Code: Importing all the required python libraries.
- We will use random initialization for the weight matrices( to avoid identical output from all neurons in the same layer).
- Zero initialization for the biases.
- The number of neurons in each layer is stored in the layer_dims dictionary with keys as layer number.
Forward propagation module:
The Forward propagation module will be completed in three steps. We will complete three functions in this order:
- linear_forward (to compute linear output Z for any layer)
- linear_activation_forward where activation will be either tanh or Sigmoid.
- L_model_forward [LINEAR -> tanh](L-1 times) -> LINEAR -> SIGMOID (whole model)
The linear forward module (vectorized over all the examples) computes the following equations:
We will be using this cost function which will measure the cost for the output layer for all training data.
Backward Propagation Module:
Similar to the forward propagation module, we will be implementing three functions in this module too.
- linear_backward (to compute linear output Z for any layer)
- linear_activation_backward where activation will be either tanh or Sigmoid.
- L_model_backward [LINEAR -> tanh](L-1 times) -> LINEAR -> SIGMOID (whole model backward propagation)
For layer i, the linear part is: Zi = Wi * A(i – 1) + bi
Denoting dZi = we can get dWi, dbi and dA(i – 1) as –
These eqautions are formulated using differential calculus and keeping the dimensions of matrices appropriate for matrix dot multiplication using
Code: Python code for Implementation
Here we will be calculating derivative of sigmoid and tanh functions.Understanding derivation of activation functions
Recall that when you implemented the L_model_forward function, at each iteration, you stored a cache that contains (X, W, b, and Z). In the backpropagation module, you will use those variables to compute the gradients.
bi = bi – a*dbi
(where a is an appropriate constant known as learning rate)
Code: Training the model
Now it is time to accumulate all the functions written before to form the final L-layered neural network model. The argument X in L_layer_model will be the training dataset and Y being the corresponding labels.
Code: Implementing the predict function to test the image provided.
Provided layers_dims = [12288, 20, 7, 5, 1] when this model is trained with an appropriate amount of training dataset it is up to 80% accurate on test data.
The parameters are found after training with an appropriate amount of training dataset.
Testing a custom image
Output with learnt parameters:
y = 1, your L-layer model predicts a Cat picture.
- ML - List of Deep Learning Layers
- Deep Neural net with forward and back propagation from scratch - Python
- DeepPose: Human Pose Estimation via Deep Neural Networks
- Implementing Artificial Neural Network training process in Python
- Introduction to Convolution Neural Network
- Introduction to Artificial Neural Network | Set 2
- A single neuron neural network in Python
- Applying Convolutional Neural Network on mnist dataset
- Introduction to Recurrent Neural Network
- Importance of Convolutional Neural Network | ML
- Neural Network Advances
- ML - Neural Network Implementation in C++ From Scratch
- Choose optimal number of epochs to train a neural network in Keras
- Why For loop is not preferred in Neural Network Problems?
- Implementation of Artificial Neural Network for AND Logic Gate with 2-bit Binary Input
- Implementation of Artificial Neural Network for OR Logic Gate with 2-bit Binary Input
- Implementation of Artificial Neural Network for NAND Logic Gate with 2-bit Binary Input
- Implementation of Artificial Neural Network for NOR Logic Gate with 2-bit Binary Input
- Implementation of Artificial Neural Network for XOR Logic Gate with 2-bit Binary Input
- Implementation of Artificial Neural Network for XNOR Logic Gate with 2-bit Binary Input
If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to email@example.com. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.