Open In App

Higher-Order gradients in TensorFlow

Last Updated : 20 Feb, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

Higher order gradients are one of the important topics in the domains of machine learning. TensorFlow has a function named tf.GradientTape that will help us to be familiar with higher order gradients. In this article, we will be understanding first and higher order derivatives. Then, we will discuss an example of Input gradient regularization.

First-Order and Higher-Order Gradients in TensorFlow

In machine learning, deeper understanding of the gradients of a function will help you to make sure that your model has an optimal performance. tf.GradientTape is a versatile tool of TensorFlow that will help us in both first order and higher order gradients. Therefore, let’s discuss First-Order and Higher-Order Gradients.

First-Order Gradients

Consider that you are working on building a neural network model and it is similar to climbing up a hill. As you know, the steeper the slope, the faster we would want to ascend. Therefore, in similar way First-order gradients, often referred to simply as gradients (∇), guide us in this ascent. Mathematically, for a scalar function f(x), the first-order gradient is given by:

[Tex]\nabla f(x) = \left [ \frac{\partial f}{\partial x_1} , \frac{\partial f}{\partial x_2}, …, \frac{\partial f }{\partial x_n}\right ][/Tex]

You need not worry about how to solve the equation. tf.GradientTape allows us to do that easily by recording operations as they happen during the forward pass. This basic capability also ensures that our models learn and adapt with each iteration.

Python

import tensorflow as tf
 
x = tf.Variable(5.0)
y = tf.Variable(2.0)
 
def f(x, y):
  return 2* x**3 + 5 * y**2 + 11 * x + 5
 
# Calculate derivative w.r.t. x
with tf.GradientTape() as tape:
  z = f(x, y)
dx = tape.gradient(z, x)
 
# Calculate derivative w.r.t. y (create a new tape)
with tf.GradientTape() as tape:
  z = f(x, y)
dy = tape.gradient(z, y)
 
print("Partial derivative of f with respect to x:", dx.numpy())
print("Partial derivative of f with respect to y:", dy.numpy())

Output:

Partial derivative of f with respect to x: 161.0
Partial derivative of f with respect to y: 20.0

Higher-Order Gradients

Since you have enough understanding of first order gradients, now we will take a look at higher-order gradients. We know that first-order derivatives tell us about the slope of our hill. Whereas higher order derivatives provide us insights into the curvature. It is truly remarkable that TensorFlow allows not only the first but also the second, third, and nth derivatives seamlessly. Mathematically, the second-order gradient (Hessian matrix) for f(x) is given by:

[Tex]\nabla^2 f(x) = \begin{bmatrix} \frac{\partial^2f}{\partial x_{1}^{2}}& \frac{\partial^2f}{\partial x_1 \partial x_2} & \cdots & \frac{\partial^2 f }{\partial x_1 \partial x_n} \\ \frac{\partial^2f}{\partial x_2 \partial x_1}& \frac{\partial^2 f}{\partial x_{2}^{2}} & \cdots & \frac{\partial^2 f}{\partial x_2 \partial x_n }\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial^2f}{\partial x_n \partial x_1}& \frac{\partial^2 f}{\partial x_n \partial x_2} & \cdots & \frac{\partial^2 f}{\partial x_{n}^{2}} \end{bmatrix}[/Tex]

Therefore, it suggests that we can not only measure how steep our hill is, but also how its steepness is changing. Thus, it allows us to intricate computations and advanced optimization techniques. TensorFlow’s tf.GradientTape can help us in refining the model architecture as well as exploring sophisticated algorithms.

For automatic distinction, the actions within the context manager tf.GradientTape are recorded. The gradient computation is also recorded, in case if the gradients are computed in that environment. Therefore, the same API holds true for gradients of higher order as well.

Consider the following example:

Python3

import tensorflow as tf
 
# Define symbolic variables for x and y
x = tf.Variable(5.0)
y = tf.Variable(2.0)
 
# Define the function
def f(x, y):
  return 2* x**3 + 5 * y**2 + 11*x*y + 5
 
# Create a persistent GradientTape for all calculations
with tf.GradientTape(persistent=True) as tape:
  z = f(x, y)
 
  # Calculate all derivatives within the persistent tape
  dx = tape.gradient(z, x)  # first-order partial derivative w.r.t. x
  dy = tape.gradient(z, y)  # first-order partial derivative w.r.t. y
  dxx = tape.gradient(dx, x)  # second-order partial derivative w.r.t. x
  dyy = tape.gradient(dy, y)  # second-order partial derivative w.r.t. y
  dxy = tape.gradient(dx, y)  # mixed derivative
  dyx = tape.gradient(dy, x)  # mixed derivative
 
# Evaluate them at specific values
print("Partial derivative of f w.r.t. x:", dx.numpy())  #6*x**2 +11y
print("Partial derivative of f w.r.t. y:", dy.numpy())  #10*y+11x
print("Second-order derivative of f w.r.t. x (d^2f/dx^2):", dxx.numpy()) # 12*x
print("Second-order derivative of f w.r.t. y (d^2f/dy^2):", dyy.numpy())  #10
print("Mixed derivative (d^2f/dxdy):", dxy.numpy())   #11
print("Mixed derivative (d^2f/dydx):", dyx.numpy())   #11
 
 
# Delete the tape explicitly to avoid memory leaks
del tape

Output:

Partial derivative of f w.r.t. x: 172.0
Partial derivative of f w.r.t. y: 75.0
Second-order derivative of f w.r.t. x (d^2f/dx^2): 60.0
Second-order derivative of f w.r.t. y (d^2f/dy^2): 10.0
Mixed derivative (d^2f/dxdy): 11.0
Mixed derivative (d^2f/dydx): 11.0

Even if the tf.GradientTape.gradient only computes the gradient of a scalar, this pattern does not generalize to build a Hessian matrix. It holds true even if you do get the second derivative of a scalar function from it.

When you want to calculate a scalar from a gradient and use that scalar as a source for another gradient computation, such in the example below, “Nested calls to tf.GradientTape.gradient” is a useful pattern.

Input gradient regularization

In the ever-challenging landscape of machine learning, models face adversaries aiming to disrupt their accuracy. Adversarial examples, a notorious set of methods, manipulate a model’s input to confuse its output. One simple yet powerful technique in defending against such attacks is “Input Gradient Regularization.”

Imagine the model’s prediction process as a journey through a landscape. Input gradient regularization seeks to make this landscape less prone to adversarial manipulation by diminishing the impact of input perturbations on the model’s output. In simpler terms, a slight change in the input should result in a minimal change in the output.

Implementation Using tf.GradientTape

Let’s delve into a practical implementation of input gradient regularization using TensorFlow’s tf.GradientTape. The goal is to determine the output’s gradient concerning the input, gauge the magnitude of this input gradient, and then adjust it to enhance model robustness.

The provided code snippet defines a simple linear model and then applies a regularization technique on the input gradients during the training step. The regularization method used here calculates the L2 norm of the input gradients and applies it as a regularization term during optimization.

Python3

import tensorflow as tf
import numpy as np
 
# Create a simple linear model for illustration
class SimpleModel(tf.keras.Model):
    def __init__(self):
        super(SimpleModel, self).__init__()
        self.dense = tf.keras.layers.Dense(units=1, activation='linear')
 
    def call(self, inputs):
        return self.dense(inputs)
 
# Generate synthetic input data
input_data = np.array([[1.0, 2.0, 3.0]])
 
# Instantiate the model and input data
model = SimpleModel()
 
# Define the input gradient regularization function
def input_gradient_regularization(model, input_data):
    with tf.GradientTape(persistent=True) as tape:
        tape.watch(input_data)
        output = model(input_data)
        magnitude = tf.norm(tape.gradient(output, input_data))
 
    gradients = [tape.gradient(magnitude, var) for var in model.trainable_variables]
 
    return gradients
 
# Get the gradients and print the output
gradients = input_gradient_regularization(model, tf.convert_to_tensor(input_data, dtype=tf.float32))
 
# Simulate a training step (updating the model parameters)
optimizer = tf.keras.optimizers.Adam(learning_rate=0.01)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
 
# Print the updated model parameters
print("Updated Model Parameters:")
for var in model.trainable_variables:
    print(f"{var.name}: {var.numpy()}")

Output:

Updated Model Parameters: simple_model/dense/kernel: [[-0.01001287] [ 0.5474412 ] [ 1.2799671 ]] simple_model/dense/bias: [-0.01001287]


This example showcases a simple linear model being updated using input gradient regularization. Keep in mind that the effectiveness of this technique becomes more apparent in complex models and with real-world data. Adjust the model and data according to your specific use case.



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads