tf.GradientTape in TensorFlow

Last Updated : 20 Feb, 2024

TensorFlow is an open-source library for data science and machine learning. It provides various tools and APIs for building, training, and deploying models. One of the core features of TensorFlow is automatic differentiation (autodiff). Autodiff is the process of computing the gradients of a function with respect to its inputs. Gradients are the slopes or rates of change of a function. They are useful for optimizing the parameters of a model, such as weights and biases. TensorFlow provides the tf.GradientTape API for autodiff.

What is tf.GradientTape in TensorFlow?

The tf.GradientTape class in TensorFlow is a Python tool used for calculating the gradients of a computation concerning certain inputs, typically tf.Variables. TensorFlow keeps track of relevant operations executed within the scope of a tf.GradientTape instance, recording them onto a “tape”. Upon calling the gradient() method on the tape, TensorFlow calculates the gradients of the recorded operations with respect to the specified inputs.

tf.GradientTape is a context manager that records the operations performed on tensors. Tensors are the basic data structures in TensorFlow, similar to arrays or matrices. A tensor can have any number of dimensions, shape, and data type. tf.GradientTape can track any tensor that is watched, either explicitly or implicitly.

Variables:

By default, tf.GradientTape will automatically watch any trainable variables accessed while the tape is active. Trainable variables are the variables that can be modified by the optimizer during training. They are usually created by tf.Variable with the argument trainable=True. Non-trainable variables, such as constants, are not watched by default. To watch a non-trainable tensor, we can use the tape.watch() method. To stop watching a tensor, we can use the tape.stop_recording() method.

To compute the gradient of a function with respect to a tensor, we can use the tape.gradient() method. The tape.gradient() method takes two arguments: the output tensor and the input tensor. It returns the gradient tensor, which has the same shape as the input tensor.

The tape.gradient() method can only be called once on a non-persistent tape. A non-persistent tape will release its resources after the gradient is computed. To compute multiple gradients or higher-order derivatives, we need to create a persistent tape. A persistent tape will keep its resources until it is explicitly deleted. To create a persistent tape, we can pass the argument persistent=True to the tf.GradientTape constructor.

To delete a persistent tape, we can use the tape.delete() method

Important Terminologies

Let’s understand some of the terminologies related to tf.GradientTape.

Tape – A tape is a data structure that records the operations executed inside the context of a tf.GradientTape onto a “tape”.
Gradient – A gradient is a vector that represents the direction and magnitude of the steepest ascent of a function.
Jacobian – A jacobian is a matrix that represents the partial derivatives of a vector-valued function with respect to its inputs.
Persistent Tape – A persistent tape is a tape that can be used multiple times to compute multiple gradients. By default, a tape is not persistent and can only be used once.
Tensor – A watched variable or tensor is a variable or tensor that the tape will track its gradient. By default, the tape will automatically watch any trainable variables accessed while the tape is active, but you can also manually watch any variable or tensor using the tape.watch() method.

Where to use tf.GradientTape?

For compute Jacobian Matrix:

tf.GradientTape can compute the Jacobian matrix, which is the matrix of partial derivatives of a vector-valued function. The Jacobian matrix can be used for vector-Jacobian products, which are useful for reverse-mode autodiff. To compute the Jacobian matrix, we can use the tape.jacobian() method. The tape.jacobian() method takes two arguments: the output tensor and the input tensor. It returns the Jacobian tensor, which has the shape of the output tensor concatenated with the shape of the input tensor

For batch Jacobian Matrix:

tf.GradientTape can also compute the batch Jacobian matrix, which is the Jacobian matrix for a batch of outputs and inputs. The batch Jacobian matrix can be used for batch vector-Jacobian products, which are useful for parallelizing autodiff. To compute the batch Jacobian matrix, we can use the tape.batch_jacobian() method. The tape.batch_jacobian() method takes two arguments: the output tensor and the input tensor. It returns the batch Jacobian tensor, which has the shape of the output tensor without the first dimension concatenated with the shape of the input tensor.

For higher-order derivatives:

tf.GradientTape can be nested to compute higher-order derivatives.

For example, we are going to compute the first and second order derivative of the function $y = x^3$ with respect ot the input x.

The constant is defined with the value ‘5.0’.
We create a nested with block for two instances of tf.GradientTape. The outer tape is responsible for computing the second-order derivative, while the inner tape computes the first-order derivative. The inner tape block defines the function, and the watch method ensures that the tape tracks the variable ‘x’.
We will compute the first order derivative of y w.r.t. x using the gradient method of inner_tape.
We will exit the inner tape block and move to outer tape block. inside the outer tape, compute the second order derivative of y w.r.t x by again calling gradient method.

Python

import tensorflow as tf
 
x = tf.constant(5.0)
 
with tf.GradientTape() as outer_tape:
    outer_tape.watch(x)
     
    with tf.GradientTape() as inner_tape:
        inner_tape.watch(x)
        y = x **3 #the function is x cube 
     
    dy_dx = inner_tape.gradient(y, x)  # First-order derivative of y with respect to x (dy_dx = 2 * x)
 
d2y_dx2 = outer_tape.gradient(dy_dx, x)  # Second-order derivative of y with respect to x (d2y_dx2 = 2)
 
print("First-order derivative (dy_dx):", dy_dx)
print("Second-order derivative (d2y_dx2):", d2y_dx2)

Output:

First-order derivative (dy_dx): tf.Tensor(75.0, shape=(), dtype=float32)
Second-order derivative (d2y_dx2): tf.Tensor(30.0, shape=(), dtype=float32)

The output of the code is two tensors: dy_dx and d2y_dx2. The values of these tensors depend on the function f(x) and the initial value of x. For f(x) = x**3 and x = 5.0 the output is:

For custom training loops, gradients and layers:

tf.GradientTape can be used for custom training loops, custom gradients, and custom layers. Custom training loops are more flexible and transparent than the built-in tf.keras methods. Custom gradients are useful for modifying or overriding the default gradients of an operation. Custom layers are user-defined layers that can be reused and combined with other layers

Basically, “tf.GradientTape” is a TensorFlow API for automatic differentiation, which means computing the gradient of a computation with respect to some inputs, usually tf.Variable. It is useful for implementing gradient-based optimization algorithms, such as gradient descent or backpropagation.

Implementation of tf.GradientTape

To use tf.GradientTape effectively, you need to follow these basic steps:

Define your variables – Define your variables and tensors that you want to compute gradients with respect to, and optionally mark them as trainable or watch them manually.
Create a tf.GradientTape – Create a tf.GradientTape context and define your function or model inside it. The tape will record the operations on the watched variables and tensors.
Call the tape.gradient() method – Call the tape.gradient() or tape.jacobian() method to compute the gradient or jacobian of your function or model with respect to your variables or tensors. You can also specify a source tensor to multiply with the gradient or jacobian, which is useful for implementing chain rule or custom gradients.
Use the computed gradient – Use the computed gradient or jacobian to print the result, update your variables or tensors, or perform other calculations. You can also create a persistent tape to reuse it for multiple gradients or jacobians, but remember to delete it manually when you are done.

Here are some exapmles of using tf.GradientTape to compute gradients and jacobians:

Example 1: Computing the gradient of a scalar function with respect to a scalar variable

Using this example let’s understand how to compute the gradient of a scalar function $y = scalar^2$ with respect to a scalar variable scalar using TensorFlow’s tf.GradientTape functionality.

In the first step, a scalar variable scalar is defined and initialized with a value of 3.0.
Next, a tf.GradientTape is created within a context block. This tape records all operations that involve the defined variable scalar. Within the context of the tape, the scalar variable scalar is squared, and the result is stored in the variable y.
After defining the computation, the tape.gradient() method is called to compute the gradient of the variable y with respect to the scalar variable scalar.

Python

# Importing tensorflow
import tensorflow as tf
 
# Step 1: Define your variables
# Defining a scalar variable
scalar = tf.Variable(3.0)
 
# Step 2: Create a tf.GradientTape
# Creating a tape
with tf.GradientTape() as tape:
  y = scalar**2
 
# Step 3: Call the tape.gradient() method  
# Calling the tape.gradient() for computing the gradient of y with respect to scalar
dy_dx = tape.gradient(y, scalar) 
 
# Step 4: Use the computed gradient
# Use the compute gradient for printing the gradient as an output
print(dy_dx) 

Output:

tf.Tensor(6.0, shape=(), dtype=float32)

Example 2: Computing the jacobian of a vector function with respect to a vector variable

Let us calculate the Jacobian matrix of a vector-valued function using TensorFlow’s tf.GradientTape.

Firstly, a vector-valued function my_function is defined, which takes a 1D input x and returns a 2D output containing the square of the first element and the sine of the second element. Then, input values x are defined as a constant tensor. Next, a tf.GradientTape context is initiated with the option persistent=True to enable multiple gradient computations. Inside the tape context, the function my_function is called with the input x, and the jacobian() method is used to compute the Jacobian matrix of the function with respect to x.

Python

import tensorflow as tf
 
# Define the vector-valued function
def my_function(x):
    return tf.stack([x[0] ** 2, tf.sin(x[1])], axis=0)
 
# Define the input values
x = tf.constant([1.0, 2.0, 3.0])
 
# Use tf.GradientTape() to compute Jacobian matrix
with tf.GradientTape(persistent=True) as tape:
    tape.watch(x)
    y = my_function(x)
 
# Compute Jacobian matrix
jacobian = tape.jacobian(y, x)
 
print("Input values (x):", x.numpy())
print("Function values (y):", y.numpy())
print("Jacobian matrix:\n", jacobian.numpy())

Output:

Input values (x): [1. 2. 3.]
Function values (y): [1.        0.9092974]
Jacobian matrix:
 [[ 2.          0.          0.        ]
 [ 0.         -0.41614684  0.        ]]

Suggest improvement

Higher-Order gradients in TensorFlow

Share your thoughts in the comments

tf.GradientTape in TensorFlow

What is tf.GradientTape in TensorFlow?

Variables:

Important Terminologies

Where to use tf.GradientTape?

For compute Jacobian Matrix:

For batch Jacobian Matrix:

For higher-order derivatives:

Python

For custom training loops, gradients and layers:

Implementation of tf.GradientTape

Example 1: Computing the gradient of a scalar function with respect to a scalar variable

Python

Example 2: Computing the jacobian of a vector function with respect to a vector variable

Python

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?