Open In App

Applications of Gradient Descent in TensorFlow

Last Updated : 21 Mar, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

To reduce a model’s cost function, machine learning practitioners frequently employ the gradient descent optimization procedure. It entails incrementally changing the model’s parameters in the direction of the cost function’s steepest decline. A free machine learning package called TensorFlow has built-in support for gradient descent optimization. In this article, we will examine the uses of gradient descent in TensorFlow, as well as how to use TensorFlow’s integrated optimizers to achieve gradient descent.

Gradient Descent:

For determining a function’s minimal value, an iterative optimization process called gradient descent is performed. For training machine learning models, it is frequently employed.

The approach works by incrementally changing a model’s parameters in the direction of the cost function’s steepest descent with respect to those parameters. The cost function, which is a mathematical function, calculates the discrepancy between the model’s projected and actual outputs.
Mathematically speaking, the generic update rule for gradient descent is:

θ = θ - α ∇J(θ)

where:

  1. θ is the parameter vector that has to be optimized.
  2. The size of the step in each iteration is determined by the learning rate, which is α.
  3. The cost function J’s gradient vector, ∇J(θ), shows the cost function’s steepest descent in relation to.

Iteratively updating until a minimum of J is attained is the algorithm’s aim. An essential hyperparameter that affects convergence stability and speed is the learning rate. The method may overshoot the minimum and fail to converge if the learning rate is too high. The approach may take a very long time to converge if the learning rate is too low.

The methods used to compute the gradient and update the parameters vary across several types of gradient descent, including batch gradient descent, stochastic gradient descent, and mini-batch gradient descent.

Implementations

We wish to use gradient descent to optimize a straightforward linear regression model. Finding the slope and intercept parameter values that reduce the model’s mean squared error on a certain batch of training data is the aim of the optimization process. This is how we can use TensorFlow’s built-in optimizers to achieve gradient descent:

Step 1: Import the necessary libraries

Python3




# Import the necessary libraries
import tensorflow as tf
import matplotlib.pyplot as plt


Step 2: Generate some random dataset

Python3




# Generate some random data
tf.random.set_seed(23)
x = tf.random.uniform(
    shape = (100,1),
    minval=0,
    maxval=100,
    dtype=tf.dtypes.float32,
)
  
y =2*x + tf.random.normal(shape = (100,1),
                     mean=50.0
                     stddev=20
                     dtype=tf.dtypes.float32
                             )
  
plt.scatter(x,y)
plt.show()


Output:

Input Data - Geeksforgeeks

Input Data

Step 3:Define the weight and bias for model

Python3




# Define the weight and bias for model
W = tf.Variable(tf.random.normal([1]), name="weight")
b = tf.Variable(tf.random.normal([1]), name="bias")
print('Weight :',W)
print('Bias   :',b)


Output:

Weight : <tf.Variable 'weight:0' shape=(1,) dtype=float32, numpy=array([0.26008585], dtype=float32)>
Bias   : <tf.Variable 'bias:0' shape=(1,) dtype=float32, numpy=array([0.31952116], dtype=float32)>

Step 4: Define the linear Regression

Python3




# Define linear
def linear_regression(x):
    return W * x + b


Step  5:Define the mean squared error

Python3




# Define the cost function
def mean_squared_error(y_true, y_pred):
    return tf.reduce_mean(tf.square(y_true - y_pred))


Step 6: Define Optimizer or gradient descent

An optimization approach called gradient descent is used in machine learning to reduce the discrepancy between a model’s expected and actual output. The model’s weights and biases are iteratively adjusted depending on the gradient of the loss function relative to the parameters.

By computing the gradients of the error with respect to the parameters and traveling in the direction of the negative gradients, gradient descent updates the weight and bias. The learning rate determines the size of the update. The objective is to locate the model’s error surface’s lowest point.

Here we are using a 0.00001 as the learning rate, we define the stochastic gradient descent (SGD) optimizer.

Python3




# Define the optimizer
optimizer = tf.optimizers.SGD(learning_rate=0.00001)


Step 7: Define the Training Loop

tf.GradientTape() records the automatic differentiation operations. tape.gradient() release the gradient and the optimizer applies this gradient to weight W and bias b. The training loop is described as a function that receives a batch of data, calculates the gradients of the cost function with respect to the model parameters, and modifies the model parameters via the optimizer. 

Python3




# Define the training loop
def train_step(x, y):
    with tf.GradientTape() as tape:
        y_pred = linear_regression(x)
        loss = mean_squared_error(y, y_pred)
    gradients = tape.gradient(loss, [W, b])
    optimizer.apply_gradients(zip(gradients, [W, b]))
    return loss


Step 8: Train the model and plot weight, bias, and loss over the iterations

Iteration vs Weight: Each gradient descent iteration updates the model’s weight. Depending on the gradient and learning rate, the weight may go up or down.

Iteration vs Bias: In addition, each gradient descent iteration updates the model’s bias. Depending on the gradient and learning rate, the bias may rise or decrease.

Iteration vs Loss: The discrepancy between the output that was expected and what actually occurred is represented by the loss function. In gradient descent, the loss function is minimized.

Python3




# Train the model
# plt.figure(figsize=(15,7))
fig1, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4), dpi=500)
fig2, (ax) = plt.subplots(1, figsize=(7, 5))
for i in range(50):
    loss = train_step(x, y)
    ax1.plot(i, W, 'b*')
    ax2.plot(i, b, 'g+')
    ax.plot(i, loss, 'ro')
  
ax1.set_title('Weight over iterations')
ax1.set_xlabel('iterations')
ax1.set_ylabel('Weight')
  
ax2.set_title('Bias over iterations')
ax2.set_xlabel('iterations')
ax2.set_ylabel('Bias')
  
ax.set_title('Losses over iterations')
ax.set_xlabel('iterations')
ax.set_ylabel('Losses')
  
plt.show()


Output:

Loss optimization - Geeksforgeeks

Loss optimization

Step 9: Plot the regression line with input data

Python3




print('Weight :',W)
print('Bias :',b)
  
plt.scatter(x, y)
plt.plot(x, W * x + b, color='red')
plt.title('Regression Line')
plt.xlabel('Input')
plt.ylabel('Target')
plt.show()


Output:

Weight : <tf.Variable 'weight:0' shape=(1,) dtype=float32, numpy=array([2.6064723], dtype=float32)>
Bias : <tf.Variable 'bias:0' shape=(1,) dtype=float32, numpy=array([0.36663133], dtype=float32)>
Regression Line - Geeksforgeeks

Regression Line

The training data are represented by the blue dots in this image, the optimal linear regression model by the red line, and the actual linear regression model by the green line (which is unknown to the model). It is evident that the optimized model comes quite near to the genuine model.

Full Code:

Python3




# Import the necessary libraries
import tensorflow as tf
import matplotlib.pyplot as plt
  
# Generate some random data
tf.random.set_seed(23)
x = tf.random.uniform(
    shape=(100, 1),
    minval=0,
    maxval=100,
    dtype=tf.dtypes.float32,
)
  
y = 2*x + tf.random.normal(shape=(100, 1),
                           mean=50.0,
                           stddev=20,
                           dtype=tf.dtypes.float32
                           )
  
# Define the weight and bias for model
W = tf.Variable(tf.random.normal([1]), name="weight")
b = tf.Variable(tf.random.normal([1]), name="bias")
  
# Define linear
  
  
def linear_regression(x):
    return W * x + b
  
# Define the cost function
  
  
def mean_squared_error(y_true, y_pred):
    return tf.reduce_mean(tf.square(y_true - y_pred))
  
  
# Define the optimizer
optimizer = tf.optimizers.SGD(learning_rate=0.00001)
  
# Define the training loop
  
  
def train_step(x, y):
    with tf.GradientTape() as tape:
        y_pred = linear_regression(x)
        loss = mean_squared_error(y, y_pred)
    gradients = tape.gradient(loss, [W, b])
    optimizer.apply_gradients(zip(gradients, [W, b]))
    return loss
  
  
# Train the model
# plt.figure(figsize=(15,7))
fig1, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4), dpi=500)
fig2, (ax) = plt.subplots(1, figsize=(7, 5))
for i in range(50):
    loss = train_step(x, y)
    ax1.plot(i, W, 'b*')
    ax2.plot(i, b, 'g+')
    ax.plot(i, loss, 'ro')
  
ax1.set_title('Weight over iterations')
ax1.set_xlabel('iterations')
ax1.set_ylabel('Weight')
  
ax2.set_title('Bias over iterations')
ax2.set_xlabel('iterations')
ax2.set_ylabel('Bias')
  
ax.set_title('Losses over iterations')
ax.set_xlabel('iterations')
ax.set_ylabel('Losses')
  
plt.show()
print('Weight :', W)
print('Bias :', b)
  
plt.scatter(x, y)
plt.plot(x, W * x + b, color='red')
plt.show()


Output:

Loss optimization -Geeksforgeeks

Loss optimization

Weight : <tf.Variable 'weight:0' shape=(1,) dtype=float32, numpy=array([2.6111314], dtype=float32)>
Bias : <tf.Variable 'bias:0' shape=(1,) dtype=float32, numpy=array([-0.5400178], dtype=float32)>
Regression Line - Geeksforgeeks

Regression Line 

Conclusion:

An optimization approach called gradient descent is used in machine learning to reduce the discrepancy between a model’s expected and actual output. Plotting the iteration vs. weight, bias, loss, and accuracy allows us to see how gradient descent operates. We may reduce the loss and raise the model’s accuracy by repeatedly modifying the weights and biases of the model depending on the gradient of the loss function with respect to the parameters.

TensorFlow supports a number of gradient descent optimization variants, including:

  1. Mini-batch gradient descent: At each iteration, the model parameters are updated using a random portion of the training data.
  2. Utilizes a moving average of previous gradients to hasten convergence and prevent local maxima.
  3. Optimizes the learning rate adaptively by taking into account the gradients of the cost function.

Moreover, TensorFlow supports more sophisticated optimization methods like Adam, Adagrad, and RMSprop. These methods integrate momentum, adaptive learning rates, and other elements to enhance the optimization process’s convergence and stability.



Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads