Skip to content
Related Articles

Related Articles

Improve Article
Numpy Gradient – Descent Optimizer of Neural Networks
  • Difficulty Level : Hard
  • Last Updated : 16 Mar, 2021

In differential calculus, the derivative of a function tells us how much the output changes with a small nudge in the input variable. This idea can be extended to multivariable functions as well. This article shows the implementation of the Gradient Descent Algorithm using NumPy. The idea is very simple- start with an arbitrary starting point and move towards the minimum (that is -ve of gradient value), and return a point that is as close to the minimum.

GD() is a user-defined function employed for this purpose. It takes the following parameters:

  • gradient is a function which or it can be a python callable object which takes a vector & returns the gradient of a function which we are trying to minimize.
  • start is the arbitrary starting point which we give to the function, it is a single independent variable. It can also be a list, Numpy array for multivariable.
  • learn_rate controls the magnitude by which the vectors get updated.
  • n_iter is the number of iterations the operation should run.
  • tol is the tolerance level that specifies the minimum movement in each iteration.

Given below is the implementation to produce out required functionality.

Example:

Python3






import numpy as np
  
  
def GD(f, start, lr, n_iter=50, tol=1e-05):
    res = start
      
    for _ in range(n_iter):
        
        # graident is calculated using the np.gradient 
        # function.
        new_val = -lr * np.gradient(f)
        if np.all(np.abs(new_val) <= tol):
            break
        res += new_val
          
    # we return a vector as the gradient can be
    # multivariable function. if the function has 1
    # dependent variable then it returns a scalar value.
    return res
  
  
# Example 1
f = np.array([1, 2, 4, 7, 11, 16], dtype=float)
print(f"The vector notation of global minima:{GD(f,10,0.01)}")
  
# Example 2
f = np.array([2, 4], dtype=float)
print(f'The vector notation of global minima: {GD(f,10,0.1)}')

Output: 

The vector notation of global minima:[9.5  9.25 8.75 8.25 7.75 7.5 ]

The vector notation of global minima: [2.0539126e-15 2.0539126e-15]

Lets see relevant concepts used in this function in detail.

Tolerance Level Application

The below line of code enables GD() to terminate early and return before n_iter is completed if the update is less than or equal to tolerance level this particularly speeds up the process when we reach a local minimum or a saddle point where the increment movement is very slow due to very low gradient thus it speeds up the convergence rate.

Python3




if np.all(np.abs(new_val) <= tol):
   break

Learning Rate Usage (Hyper-parameter)

  • The learning rate is a very crucial hyper-parameter as it affects the behavior of the gradient descent algorithm. For example, if we change the learning rate from 0.2 to 0.7 we get another solution that’s very close to 0, but because of the high learning rate there is a large change in x and i.e it passes the minimum value multiple times, hence it oscillates before settling to zero. This oscillation increases the convergence time of the entire algorithm.
  • A small learning rate can lead to slow convergence and to make the matter worst if the no of iterations is limiting small then the algorithm might even return before it finds the minimum.

Given below is an example to show how learning rate affects out result.

Example:

Python3




import numpy as np
  
  
def GD(f, start, lr, n_iter=50, tol=1e-05):
    res = start
    for _ in range(n_iter):
        # gradient is calculated using the np.gradient function.
        new_val = -lr * np.gradient(f)
        if np.all(np.abs(new_val) <= tol):
            break
        res += new_val
  
    # we return a vector as the gradient can be multivariable function.
    # if the function has 1 dependent variable then it returns a scalar value.
    return res
  
  
f = np.array([2, 4], dtype=float)
# low learing rate doesn't allow to converge at global minima
print(f'The vector notation of global minima: {GD(f,10,0.001)}')

Output

[9.9 9.9]

The value returned by the algorithm is not even close to 0. This indicates that our algorithm returns before converging to global minima.

machine-learning-img




My Personal Notes arrow_drop_up
Recommended Articles
Page :