Vectorization Of Gradient Descent

In Machine Learning, Regression problems can be solved in the following ways:

1. Using Optimization Algorithms – Gradient Descent

Batch Gradient Descent.
Stochastic Gradient Descent.
Mini-Batch Gradient Descent
Other Advanced Optimization Algorithms like ( Conjugate Descent … )

2. Using the Normal Equation :

Using the concept of Linear Algebra.

Let’s consider the case for Batch Gradient Descent for Univariate Linear Regression Problem.

The cost function for this Regression Problem is :

Goal:

In order to solve this problem, we can either go for a Vectorized approach ( Using the concept of Linear Algebra ) or unvectorized approach (Using for-loop).

1. Unvectorized Approach:

Here in order to solve the below mentioned mathematical expressions, We use for loop.

The above mathematical expression is a part of Cost Function.

The above Mathematical Expression is the hypothesis.

Code: Python Implementation of Unvectorzed Grad

# Import required modules. 

from sklearn.datasets import make_regression 

import matplotlib.pyplot as plt 

import numpy as np 

import time 

# Create and plot the data set. 

x, y = make_regression(n_samples = 100, n_features = 1, 

                       n_informative = 1, noise = 10, random_state = 42) 

plt.scatter(x, y, c = 'red') 

plt.xlabel('Feature') 

plt.ylabel('Target_Variable') 

plt.title('Training Data') 
plt.show() 

# Convert y from 1d to 2d array. 

y = y.reshape(100, 1) 

# Number of Iterations for Gradient Descent 

num_iter = 1000

# Learning Rate 

alpha = 0.01

# Number of Training samples. 

m = len(x) 

# Initializing Theta. 

theta = np.zeros((2, 1),dtype = float) 

# Variables 

t0 = t1 = 0

Grad0 = Grad1 = 0

# Batch Gradient Descent. 

start_time = time.time() 

for i in range(num_iter): 

    # To find Gradient 0. 

    for j in range(m): 

        Grad0 = Grad0 + (theta[0] + theta[1] * x[j]) - (y[j]) 

    # To find Gradient 1. 

    for k in range(m): 

        Grad1 = Grad1 + ((theta[0] + theta[1] * x[k]) - (y[k])) * x[k] 

    t0 = theta[0] - (alpha * (1/m) * Grad0) 

    t1 = theta[1] - (alpha * (1/m) * Grad1) 

    theta[0] = t0 

    theta[1] = t1 

    Grad0 = Grad1 = 0

# Print the model parameters.     

print('model parameters:',theta,sep = '\n') 

# Print Time Take for Gradient Descent to Run. 

print('Time Taken For Gradient Descent in Sec:',time.time()- start_time) 

# Prediction on the same training set. 

h = [] 

for i in range(m): 

    h.append(theta[0] + theta[1] * x[i]) 

# Plot the output. 
plt.plot(x,h) 

plt.scatter(x,y,c = 'red') 

plt.xlabel('Feature') 

plt.ylabel('Target_Variable') 

plt.title('Output')

Output:

model parameters:
[[ 1.15857049]
 [44.42210912]]
 
Time Taken For Gradient Descent in Sec: 2.482538938522339

2. Vectorized Approach:

Here in order to solve the below mentioned mathematical expressions, We use Matrix and Vectors (Linear Algebra).

The above mathematical expression is a part of Cost Function.

The above Mathematical Expression is the hypothesis.

Batch Gradient Descent :

Concept To Find Gradients Using Matrix Operations:

Code: Python implementation of vectorized Gradient Descent approach

# Import required modules. 

from sklearn.datasets import make_regression 

import matplotlib.pyplot as plt 

import numpy as np 

import time 

# Create and plot the data set. 

x, y = make_regression(n_samples = 100, n_features = 1, 

                       n_informative = 1, noise = 10, random_state = 42) 

plt.scatter(x, y, c = 'red') 

plt.xlabel('Feature') 

plt.ylabel('Target_Variable') 

plt.title('Training Data') 
plt.show() 

# Adding x0=1 column to x array. 

X_New = np.array([np.ones(len(x)), x.flatten()]).T 

# Convert y from 1d to 2d array. 

y = y.reshape(100, 1) 

# Number of Iterations for Gradient Descent 

num_iter = 1000

# Learning Rate 

alpha = 0.01

# Number of Training samples. 

m = len(x) 

# Initializing Theta. 

theta = np.zeros((2, 1),dtype = float) 

# Batch-Gradient Descent. 

start_time = time.time() 

for i in range(num_iter): 

    gradients = X_New.T.dot(X_New.dot(theta)- y) 

    theta = theta - (1/m) * alpha * gradients 

# Print the model parameters.     

print('model parameters:',theta,sep = '\n') 

# Print Time Take for Gradient Descent to Run. 

print('Time Taken For Gradient Descent in Sec:',time.time() - start_time) 

# Hypothesis. 

h = X_New.dot(theta) # Prediction on training data itself. 

# Plot the Output. 

plt.scatter(x, y, c = 'red') 
plt.plot(x ,h) 

plt.xlabel('Feature') 

plt.ylabel('Target_Variable') 

plt.title('Output')

Output:

model parameters:
[[ 1.15857049]
 [44.42210912]]
 
Time Taken For Gradient Descent in Sec: 0.019551515579223633

Observations:

Implementing a vectorized approach decreases the time taken for execution of Gradient Descent( Efficient Code ).
Easy to debug.

Article Tags :

Machine Learning

Python