Open In App

Numpy Gradient – Descent Optimizer of Neural Networks

Improve
Improve
Like Article
Like
Save
Share
Report

NumPy Gradient Descent Optimizer is a commonly used optimization algorithm in neural network training that is based on the gradient descent algorithm. It is used to minimize the cost function of a neural network model, by adjusting the model’s weights and biases through a series of iterations.

The basic steps of NumPy Gradient Descent Optimizer are as follows:

Initialize the model’s weights and biases to small random values.

  1. Calculate the output of the model for a given input using the forward propagation algorithm.
  2. Calculate the error between the predicted output and the actual output using a cost function.
  3. Calculate the gradient of the cost function with respect to the weights and biases using backpropagation algorithm.
  4. Update the weights and biases using the gradient and a learning rate parameter.
  5. Repeat steps 2-5 for a number of iterations or until convergence.
  6. The advantage of using NumPy Gradient Descent Optimizer is that it is a simple, yet effective algorithm for minimizing the cost function of a neural network model. It can handle large amounts of data, is easy to implement, and can be applied to different types of neural network models.

However, there are some potential disadvantages to using this algorithm. One potential disadvantage is that it can be slow to converge, especially if the learning rate is set too low. Another potential disadvantage is that it may get stuck in local minima, resulting in suboptimal solutions. To mitigate these issues, several variations of gradient descent algorithm such as Stochastic Gradient Descent, Mini-batch Gradient Descent and Adam optimization have been developed.

Advantages of NumPy Gradient Descent Optimizer:

  1. Simple and easy to implement: NumPy Gradient Descent Optimizer is a simple algorithm that is easy to implement, making it a popular choice for optimizing neural network models.
  2. Can handle large datasets: NumPy Gradient Descent Optimizer is efficient at handling large datasets, making it suitable for training deep neural network models with a large number of parameters.
  3. Can be applied to different neural network architectures: NumPy Gradient Descent Optimizer can be applied to different neural network architectures, including feedforward, convolutional, and recurrent neural networks.
  4. Can be parallelized: The computation involved in NumPy Gradient Descent Optimizer can be easily parallelized, allowing for faster training on multi-core CPUs and GPUs.

Disadvantages of NumPy Gradient Descent Optimizer:

  1. Can be slow to converge: NumPy Gradient Descent Optimizer can be slow to converge, especially if the learning rate is set too low, which can result in longer training times.
  2. May get stuck in local minima: NumPy Gradient Descent Optimizer can get stuck in local minima, which can result in suboptimal solutions.
  3. Requires careful hyperparameter tuning: The performance of NumPy Gradient Descent Optimizer depends on the choice of hyperparameters, such as the learning rate, batch size, and number of iterations, which requires careful tuning.
  4. Sensitive to feature scaling: NumPy Gradient Descent Optimizer can be sensitive to feature scaling, which requires normalization or standardization of input features to improve the convergence speed and accuracy of the algorithm.

In differential calculus, the derivative of a function tells us how much the output changes with a small nudge in the input variable. This idea can be extended to multivariable functions as well. This article shows the implementation of the Gradient Descent Algorithm using NumPy. The idea is very simple- start with an arbitrary starting point and move towards the minimum (that is -ve of gradient value), and return a point that is as close to the minimum.

GD() is a user-defined function employed for this purpose. It takes the following parameters:

  • gradient is a function which or it can be a python callable object which takes a vector & returns the gradient of a function which we are trying to minimize.
  • start is the arbitrary starting point which we give to the function, it is a single independent variable. It can also be a list, Numpy array for multivariable.
  • learn_rate controls the magnitude by which the vectors get updated.
  • n_iter is the number of iterations the operation should run.
  • tol is the tolerance level that specifies the minimum movement in each iteration.

Given below is the implementation to produce out required functionality.

Example:

Python3




import numpy as np
 
 
def GD(f, start, lr, n_iter=50, tol=1e-05):
    res = start
     
    for _ in range(n_iter):
       
        # gradient is calculated using the np.gradient
        # function.
        new_val = -lr * np.gradient(f)
        if np.all(np.abs(new_val) <= tol):
            break
        res += new_val
         
    # we return a vector as the gradient can be
    # multivariable function. if the function has 1
    # dependent variable then it returns a scalar value.
    return res
 
 
# Example 1
f = np.array([1, 2, 4, 7, 11, 16], dtype=float)
print(f"The vector notation of global minima:{GD(f,10,0.01)}")
 
# Example 2
f = np.array([2, 4], dtype=float)
print(f'The vector notation of global minima: {GD(f,10,0.1)}')


Output: 

The vector notation of global minima:[9.5  9.25 8.75 8.25 7.75 7.5 ]

The vector notation of global minima: [2.0539126e-15 2.0539126e-15]

Lets see relevant concepts used in this function in detail.

Tolerance Level Application

The below line of code enables GD() to terminate early and return before n_iter is completed if the update is less than or equal to tolerance level this particularly speeds up the process when we reach a local minimum or a saddle point where the increment movement is very slow due to very low gradient thus it speeds up the convergence rate.

Python3




if np.all(np.abs(new_val) <= tol):
   break


Learning Rate Usage (Hyper-parameter)

  • The learning rate is a very crucial hyper-parameter as it affects the behavior of the gradient descent algorithm. For example, if we change the learning rate from 0.2 to 0.7 we get another solution that’s very close to 0, but because of the high learning rate there is a large change in x and i.e it passes the minimum value multiple times, hence it oscillates before settling to zero. This oscillation increases the convergence time of the entire algorithm.
  • A small learning rate can lead to slow convergence and to make the matter worst if the no of iterations is limiting small then the algorithm might even return before it finds the minimum.

Given below is an example to show how learning rate affects out result.

Example:

Python3




import numpy as np
 
 
def GD(f, start, lr, n_iter=50, tol=1e-05):
    res = start
    for _ in range(n_iter):
        # gradient is calculated using the np.gradient function.
        new_val = -lr * np.gradient(f)
        if np.all(np.abs(new_val) <= tol):
            break
        res += new_val
 
    # we return a vector as the gradient can be multivariable function.
    # if the function has 1 dependent variable then it returns a scalar value.
    return res
 
 
f = np.array([2, 4], dtype=float)
# low learning rate doesn't allow to converge at global minima
print(f'The vector notation of global minima: {GD(f,10,0.001)}')


Output

[9.9 9.9]

The value returned by the algorithm is not even close to 0. This indicates that our algorithm returns before converging to global minima.



Last Updated : 29 Mar, 2023
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads