Deep Neural Network With L – Layers

This article aims to implement a deep neural network with an arbitrary number of hidden layers each containing different numbers of neurons. We will be implementing this neural net using a few helper functions and at last, we will combine these functions to make the L-layer neural network model.

L – layer deep neural network structure (for understanding)

L-layer model

L – layer neural network

The model’s structure is [LINEAR -> tanh](L-1 times) -> LINEAR -> SIGMOID. i.e., it has L-1 layers using the hyperbolic tangent function as activation function followed by the output layer with a sigmoid activation function.
More about activation functions

Step by step implementation of the neural network:

  • Initialize the parameters for the L layers
  • Implement the forward propagation module
  • Compute the loss at the final layer
  • Implement the backward propagation module
  • Finally, update the parameters
  • Train the model using existing training dataset
  • Use trained parameters to test model

Naming conventions followed in the article to prevent confusion:



  • Each layer in the network is represented by a set of two parameters W matrix (weight matrix) and b matrix (bias matrix). For layer, i these parameters are represented as Wi and bi respectively.
  • The linear output of layer, i is represented as Zi, and the output after activation is represented as Ai. The dimensions of Zi and Ai are the same.

Dimensions of the weights and bias matrices.
The input layer is of the size (x, m) where m is the number of images.

Layer number Shape of W Shape of b Linear Output Shape of Activation
Layer 1 (n[1], x) (n[1], 1) Z[1] = W[1]X + b[1] (n[1], m)
Layer 2 (n[2], n[1]) (n[2], 1) Z[2] = W[2]A[1] + b[2] (n[2], m)
: : : : :
Layer L – 1 (n[L - 1], n[L - 2]) (n[L - 1], 1) Z[L - 1] = W[L - 1]A[L - 2] + b[L - 1] (n[L - 1], m)
Layer L (n[L], n[L - 1]) (n[L], 1) Z[L] = W[L]A[L - 1] + b[L] (n[L], m)

Code: Importing all the required python libraries.

filter_none

edit
close

play_arrow

link
brightness_4
code

import time
import numpy as np
import h5py
import matplotlib.pyplot as plt
import scipy
from PIL import Image
from scipy import ndimage

chevron_right


Initialization:

  • We will use random initialization for the weight matrices( to avoid identical output from all neurons in the same layer).
  • Zero initialization for the biases.
  • The number of neurons in each layer is stored in the layer_dims dictionary with keys as layer number.

Code:

filter_none

edit
close

play_arrow

link
brightness_4
code

def initialize_parameters_deep(layer_dims):
    # 0th layer is the input layer with number
    # of columns stored in layer_dims.
    parameters = {}
  
    # number of layers in the network
    L = len(layer_dims)            
  
    for l in range(1, L):
        parameters['W' + str(l)] = np.random.randn(layer_dims[l], 
                                        layer_dims[l - 1])*0.01
        parameters['b' + str(l)] = np.zeros((layer_dims[l], 1))
  
    return parameters

chevron_right


Forward propagation module:
The Forward propagation module will be completed in three steps. We will complete three functions in this order:

  • linear_forward (to compute linear output Z for any layer)
  • linear_activation_forward where activation will be either tanh or Sigmoid.
  • L_model_forward [LINEAR -> tanh](L-1 times) -> LINEAR -> SIGMOID (whole model)

The linear forward module (vectorized over all the examples) computes the following equations:

Zi = Wi * A(i – 1) + bi Ai = activation_func(Zi)

Code:

filter_none

edit
close

play_arrow

link
brightness_4
code

def linear_forward(A_prev, W, b):
  
    # cache is stored to be used in backward propagation module
    Z = np.dot(W, A_prev) + b
    cache = (A, W, b)
    return Z, cache

chevron_right


filter_none

edit
close

play_arrow

link
brightness_4
code

def sigmoid(Z):
  
    A = 1/(1 + np.exp(-Z))
    return A, {'Z' : Z}
  
def tanh(Z):
  
    A = np.tanh(Z)
    return A, {'Z' : Z}
  
def linear_activation_forward(A_prev, W, b, activation):
  
    # cache is stored to be used in backward propagation module
    if activation == "sigmoid":
        Z, linear_cache = linear_forward(A_prev, W, b)
        A, activation_cache = sigmoid(Z)
    elif activation == "tanh":
        Z, linear_cache = linear_forward(A_prev, W, b)
        A, activation_cache = tanh(Z)
    cache = (linear_cache, activation_cache)
  
    return A, cache

chevron_right


filter_none

edit
close

play_arrow

link
brightness_4
code

def L_model_forward(X, parameters):
    """
    Arguments:
    X -- data, numpy array of shape (input size, number of examples)
    parameters -- output of initialize_parameters_deep()
      
    Returns:
    AL -- last post-activation value
    caches -- list of caches containing:
                every cache of linear_activation_forward() 
           (there are L-1 of them, indexed from 0 to L-1)
    """
  
    caches = []
    A = X
  
    # number of layers in the neural network
    L = len(parameters) // 2                  
      
    # Implement [LINEAR -> TANH]*(L-1). Add "cache" to the "caches" list.
    for l in range(1, L):
        A_prev =
        A, cache = linear_activation_forward(A_prev,
                           parameters['W' + str(l)], 
                   parameters['b' + str(l)], 'tanh')
  
        caches.append(cache)
      
    # Implement LINEAR -> SIGMOID. Add "cache" to the "caches" list.
    AL, cache = linear_activation_forward(A, parameters['W' + str(L)],
                                  parameters['b' + str(L)], 'sigmoid')
    caches.append(cache)
  
    return AL, caches

chevron_right




     \[   {\Huge J = \frac{1}{m}\sum_{i=1}^{\m}y^{(i)}log(a^{[L][i]}) + (1 - y^{(i)})log(1 - a^{[L][i]})} \]

We will be using this cost function which will measure the cost for the output layer for all training data.

Code:

filter_none

edit
close

play_arrow

link
brightness_4
code

def compute_cost(AL, Y):
    """
    Implement the cost function defined by the equation.
    m = Y.shape[1]
    cost = (-1 / m)*(np.dot(np.log(AL), Y.T)+np.dot(np.log((1-AL)), (1 - Y).T))
  
    # To make sure your cost's shape is what we 
    # expect (e.g. this turns [[20]] into 20).
    cost = np.squeeze(cost)      
  
    return cost

chevron_right


Backward Propagation Module:
Similar to the forward propagation module, we will be implementing three functions in this module too.

  • linear_backward (to compute linear output Z for any layer)
  • linear_activation_backward where activation will be either tanh or Sigmoid.
  • L_model_backward [LINEAR -> tanh](L-1 times) -> LINEAR -> SIGMOID (whole model backward propagation)

For layer i, the linear part is: Zi = Wi * A(i – 1) + bi
Denoting dZi = \frac{dJ}{dZi} we can get dWi, dbi and dA(i – 1) as –


 {\Huge dWi = \frac{dJ}{dWi} = \frac{1}{m}dZi * transpose(A(i - 1))}
 {\Huge dbi = \frac{dJ}{dbi} = \frac{1}{m}\sum_{i=1}^{\m}dZi}
 {\Huge dA(i - 1) = \frac{dJ}{dA(i - 1)} = transpose(Wi)*dZi }

These eqautions are formulated using differential calculus and keeping the dimensions of matrices appropriate for matrix dot multiplication using np.dot() function.

Code: Python code for Implementation

filter_none

edit
close

play_arrow

link
brightness_4
code

def linear_backward(dZ, cache):
  
    A_prev, W, b = cache
    m = A_prev.shape[1]
    dW = (1 / m)*np.dot(dZ, A_prev.T)
    db = (1 / m)*np.sum(dZ, axis = 1, keepdims = True)
    dA_prev = np.dot(W.T, dZ)
      
    return dA_prev, dW, db

chevron_right


Here we will be calculating derivative of sigmoid and tanh functions.Understanding derivation of activation functions
Code:

filter_none

edit
close

play_arrow

link
brightness_4
code

def sigmoid_backward(dA, activation_cache):
  
    Z = activation_cache['Z']
    A = sigmoid(Z)
    return dA * (A*(1 - A))    # A*(1 - A) is the derivative of sigmoid function
  
def tanh_backward(dA, activation_cache):
  
    Z = activation_cache['Z']
    A = sigmoid(Z)
    return dA * (1 -np.power(A, 2))   
    # A*(1 - A ^ 2) is the derivative of tanh function
  
def linear_activation_backward(dA, cache, activation):
  
    linear_cache, activation_cache = cache
      
    if activation == "tanh":
        dZ = tanh_backward(dA, activation_cache)
        dA_prev, dW, db = linear_backward(dZ, linear_cache)
    elif activation == "sigmoid":
        dZ = sigmoid_backward(dA, activation_cache)
        dA_prev, dW, db = linear_backward(dZ, linear_cache)
      
    return dA_prev, dW, db

chevron_right


L-model-backward:
Recall that when you implemented the L_model_forward function, at each iteration, you stored a cache that contains (X, W, b, and Z). In the backpropagation module, you will use those variables to compute the gradients.

filter_none

edit
close

play_arrow

link
brightness_4
code

def L_model_backward(AL, Y, caches):
    """
    AL -- probability vector, output of the forward propagation (L_model_forward())
    Y -- true "label" vector (containing 0 if non-cat, 1 if cat)
    caches -- list of caches containing:
                every cache of linear_activation_forward() with "tanh" 
                (it's caches[l], for l in range(L-1) i.e l = 0...L-2)
                the cache of linear_activation_forward() with "sigmoid"
                 (it's caches[L-1])
      
    Returns:
    grads -- A dictionary with the gradients
             grads["dA" + str(l)] = ... 
             grads["dW" + str(l)] = ...
             grads["db" + str(l)] = ... 
    """
    grads = {}
    L = len(caches) # the number of layers
    m = AL.shape[1]
    Y = Y.reshape(AL.shape) # after this line, Y is the same shape as AL
      
    # Initializing the backpropagation
    # derivative of cost with respect to AL
   
    dAL = - (np.divide(Y, AL) - np.divide(1 - Y, 1 - AL)) 
      
    # Lth layer (SIGMOID -> LINEAR) gradients. Inputs: "dAL, current_cache".
    # Outputs: "grads["dAL-1"], grads["dWL"], grads["dbL"]
    current_cache = caches[L - 1]
    grads["dA" + str(L-1)], grads["dW" + str(L)], grads["db" + str(L)] = \
                  linear_activation_backward(dAL, current_cache, 'sigmoid')
      
    # Loop from l = L-2 to l = 0
    for l in reversed(range(L-1)):
        current_cache = caches[l]
        dA_prev_temp, dW_temp, db_temp = linear_activation_backward(
                    grads['dA' + str(l + 1)], current_cache, 'tanh')
        grads["dA" + str(l)] = dA_prev_temp
        grads["dW" + str(l + 1)] = dW_temp
        grads["db" + str(l + 1)] = db_temp
  
    return grads

chevron_right


Update Parameters:

Wi = Wi – a*dWi
bi = bi – a*dbi

(where a is an appropriate constant known as learning rate)

filter_none

edit
close

play_arrow

link
brightness_4
code

def update_parameters(parameters, grads, learning_rate):
    L = len(parameters) // 2 # number of layers in the neural network
  
    # Update rule for each parameter. Use a for loop.
    for l in range(L):
        parameters["W" + str(l + 1)] = parameters["W" + str(l + 1)] - learning_rate * grads['dW' + str(l + 1)]
        parameters["b" + str(l + 1)] = parameters['b' + str(l + 1)] - learning_rate * grads['db' + str(l + 1)]
  
    return parameters

chevron_right


Code: Training the model

Now it is time to accumulate all the functions written before to form the final L-layered neural network model. The argument X in L_layer_model will be the training dataset and Y being the corresponding labels.

filter_none

edit
close

play_arrow

link
brightness_4
code

def L_layer_model(X, Y, layers_dims, learning_rate = 0.0075, num_iterations = 3000, print_cost = False):
    """
    Arguments:
    X -- data, numpy array of shape (num_px * num_px * 3, number of examples)
    Y -- true "label" vector (containing 0 if cat, 1 if non-cat),
                                 of shape (1, number of examples)
    layers_dims -- list containing the input size and each layer size,
                                      of length (number of layers + 1).
    learning_rate -- learning rate of the gradient descent update rule
    num_iterations -- number of iterations of the optimization loop
    print_cost -- if True, it prints the cost every 100 steps
      
    Returns:
    parameters -- parameters learned by the model. They can then be used to predict.
    """
  
    np.random.seed(1)
    costs = []                         # keep track of cost
  
    parameters = initialize_parameters_deep(layers_dims)
      
    # Loop (gradient descent)
    for i in range(0, num_iterations):
  
        # Forward propagation: [LINEAR -> TANH]*(L-1) -> LINEAR -> SIGMOID.
        AL, caches = L_model_forward(X, parameters)
  
        # Compute cost.
        cost = compute_cost(AL, Y)
      
        # Backward propagation.
        grads = L_model_backward(AL, Y, caches)
  
        # Update parameters.
        parameters = update_parameters(parameters, grads, learning_rate)
                  
        # Print the cost every 100 training example
        if print_cost and i % 100 == 0:
            print ("Cost after iteration % i: % f" %(i, cost))
        if print_cost and i % 100 == 0:
            costs.append(cost)
              
    # plot the cost
    plt.plot(np.squeeze(costs))
    plt.ylabel('cost')
    plt.xlabel('iterations (per hundreds)')
    plt.title("Learning rate =" + str(learning_rate))
    plt.show()
      
    return parameters

chevron_right


Code: Implementing the predict function to test the image provided.

filter_none

edit
close

play_arrow

link
brightness_4
code

def predict(parameters, path_image):
  
    my_image = path_image
    image = np.array(ndimage.imread(my_image, flatten = False))
    my_image = scipy.misc.imresize(image, 
                  size =(num_px, num_px)).reshape((
                          num_px * num_px * 3, 1))
  
    my_image = my_image / 255.
    output, cache = L_model_forward(my_image, parameters)
    output = np.squeeze(output)
    prediction = round(output)
    if(prediction == 1):
        label = "Cat picture"
    else:
        label = "Non-Cat picture"   # If the model is trained to recognize a cat image.
    print ("y = " + str(prediction) + ", your L-layer model predicts a \"" + label)

chevron_right


Provided layers_dims = [12288, 20, 7, 5, 1] when this model is trained with an appropriate amount of training dataset it is up to 80% accurate on test data.
The parameters are found after training with an appropriate amount of training dataset.

filter_none

edit
close

play_arrow

link
brightness_4
code

{'W1': array([[ 0.01672799, -0.00641608, -0.00338875, ..., -0.00685887,
        -0.005937830.01060475],
       [ 0.013958080.00407498, -0.0049068, ...,  0.01317046,
         0.002213260.00930175],
       [-0.00123843, -0.005972040.00472214, ...,  0.00101904,
        -0.00862638, -0.00505112],
       ..., 
       [ 0.00140823, -0.001377110.0163992, ..., -0.00846451,
        -0.00761603, -0.00149162],
       [-0.00168698, -0.00618577, -0.01023935, ...,  0.02050705,
        -0.004281850.00149319],
       [-0.01770891, -0.00678360.00756873, ...,  0.01730701,
         0.01297081, -0.00322241]]), 'b1': array([[  3.85542520e-03],
       8.18087056e-03],
       6.52138546e-03],
       2.85633678e-03],
       6.01081275e-03],
       8.17122684e-04],
       3.72986493e-04],
       7.05992009e-04],
       4.36344692e-04],
       1.90827285e-03],
       [ -6.51686461e-03],
       6.97258125e-03],
       [ -1.08988113e-03],
       5.40858776e-03],
       8.16752511e-03],
       [ -1.05298871e-02],
       [ -9.05267219e-05],
       [ -5.13240993e-04],
       1.42355924e-03],
       [ -2.40912130e-03]]), 'W2': array([[  2.02109232e-01-3.08645240e-01-3.77620591e-01,
         -4.02563039e-02,   5.90753267e-02,   1.23345558e-01,
          3.08047246e-01,   4.71201576e-02,   5.29892230e-02,
          1.34732883e-01,   2.15804697e-01-6.34295948e-01,
         -1.56081006e-01,   1.01905466e-01-1.50584386e-01,
          5.31219819e-02,   1.14257132e-01,   4.20697960e-01,
          1.08551174e-01-2.18735332e-01],
       3.57091131e-01-1.40997155e-01,   3.70857247e-01,
          2.53207014e-01-1.12596978e-01-3.15179195e-01,
         -2.48100731e-01,   4.72723584e-01-7.71870940e-02,
          5.39834663e-01-1.17927181e-02,   6.45463019e-02,
          2.73704423e-02,   4.30157714e-01,   1.59318390e-01,
         -6.48089126e-01-1.71894333e-01,   1.77933527e-01,
          1.54736463e-01-7.26815274e-02],
       2.96501527e-01,   2.43056424e-01-1.22400000e-02,
          2.69275366e-02,   3.76041647e-01-1.70245407e-01,
         -2.95343754e-02-7.35716150e-02-1.80179693e-01,
         -5.77515859e-03-6.38323383e-01,   6.94950669e-02,
          7.66137263e-02,   3.66599261e-01,   5.40904716e-02,
         -1.51814996e-01-2.61672559e-01,   1.35946854e-01,
          4.21086332e-01-2.71073484e-01],
       1.42186042e-01-2.66789439e-01,   4.57188131e-01,
          2.84732743e-02-5.49143391e-02-3.96786581e-02,
         -1.68668726e-01-1.46525541e-01,   3.25325993e-03,
         -1.13045329e-01,   4.03935681e-01-3.92214264e-01,
          5.25325051e-04-3.69642647e-01-1.15812921e-01,
          1.32695899e-01,   3.20810624e-01,   1.88127350e-01,
         -4.82784806e-02-1.48816756e-01],
       [ -1.65469406e-01,   4.24741323e-01-5.76900900e-01,
          1.58084434e-01-2.90965849e-01,   3.40124014e-02,
         -2.62189635e-01,   2.66917709e-01,   4.77530579e-01,
         -1.73491365e-01-1.48434710e-01-6.91270097e-02,
          5.42923817e-03-2.85173244e-01,   6.40701002e-02,
         -7.33126171e-02,   1.43543481e-01,   7.82250247e-02,
         -1.47535352e-01-3.99073661e-01],
       [ -2.05468389e-01,   1.66914752e-01,   2.15918881e-01,
          2.21774761e-01,   2.52527888e-01,   2.64464223e-01,
         -3.07796263e-02-3.06999665e-01,   3.45835418e-01,
          1.05973413e-01-3.47687682e-01,   9.13383273e-02,
          3.97150339e-02-3.14285982e-01,   2.22363710e-01,
         -3.93921988e-01-9.70224337e-02-3.03701358e-01,
          1.40075127e-01-4.56621577e-01],
       2.06819296e-01-2.39537245e-01-4.06133490e-01,
          5.92692802e-02,   8.95374287e-02-3.27700300e-01,
         -6.89856027e-02-6.13447906e-01,   1.89927573e-01,
         -1.42814095e-01,   1.77958823e-03-1.34407806e-01,
          9.34036862e-02-2.00549616e-02,   9.01789763e-02,
          3.81627943e-01,   3.30416268e-01-1.76566228e-02,
          9.28388267e-02-1.16167106e-01]]), 'b2': array([[-0.00088887],
       [ 0.02357712],
       [ 0.01858614],
       [-0.00567557],
       [ 0.00636179],
       [ 0.02362429],
       [-0.00173074]]), 'W3': array([[ 0.209397860.219774780.77135171, -1.07520777, -0.64307173,
        -0.24097649, -0.15626735],
       [-0.579976180.30851841, -0.03802324, -0.134899750.23488207,
         0.76248961, -0.34515092],
       [ 0.159902950.51639690.152843810.42790606, -0.05980168,
         0.87865156, -0.01031899],
       [ 0.529082820.938824711.23044256, -0.014812860.41024244,
         0.18731983, -0.01414658],
       [-0.96753783, -0.304920020.54060558, -0.18776932, -0.39245146,
         0.20654634, -0.58863038]]), 'b3': array([[ 0.8623361 ],
       [-0.00826002],
       [-0.01151116],
       [-0.06844291],
       [-0.00833715]]), 'W4': array([[-0.830459670.184188240.858853521.410241150.12713131]]), 'b4': array([[-1.73123633]])}

chevron_right


Testing a custom image


Cat image

filter_none

edit
close

play_arrow

link
brightness_4
code

my_image = "https://www.pexels.com / photo / adorable-animal-blur-cat-617278/"
predict(parameters, my_image)

chevron_right


Output with learnt parameters:

y = 1, your L-layer model predicts a Cat picture.



My Personal Notes arrow_drop_up

I am an enthusiastic learner I like learning about new technologies a lot and thinking about the positive changes that they could make in our life For me learning alone is never enough I always enjoy a lot sharing my learnings Thats why I am here )

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.