Multivariate Optimization – Gradient and Hessian

Last Updated : 24 Sep, 2021

In a multivariate optimization problem, there are multiple variables that act as decision variables in the optimization problem.
z = f(x₁, x₂, x₃…..x_n)

So, when you look at these types of problems a general function z could be some non-linear function of decision variables x₁, x₂, x₃ to x_n. So, there are n variables that one could manipulate or choose to optimize this function z. Notice that one could explain univariate optimization using pictures in two dimensions that is because in the x-direction we had the decision variable value and in the y-direction, we had the value of the function. However, if it is multivariate optimization then we have to use pictures in three dimensions and if the decision variables are more than 2 then it is difficult to visualize.

Gradient:
Before explaining gradient let us just contrast with the necessary condition of univariate case. So in case of uni-variate optimization the necessary condition for x to be the minimizer of the function f(x) is:
First-order necessary condition: f'(x) = 0

So, the derivative in a single-dimensional case becomes what we call as a gradient in the multivariate case.

According to the first-order necessary condition in univariate optimization e.g f'(x) = 0 or one can also write it as df/dx. However, since there are many variables in the case of multivariate and we have many partial derivatives and the gradient of the function f is a vector such that in each component one can compute the derivative of the function with respect to the corresponding variable. So, for example, $\partial f/ \partial x_1$ is the first component, $\partial f/ \partial x_2$ is the second component and $\partial f/ \partial x_n$ is the last component.
$Gradient = \nabla f = \begin{bmatrix} \partial f/ \partial x_1\\ \partial f/ \partial x_2\\ ...\\ ...\\ \partial f/ \partial x_n\\ \end{bmatrix}$

Note: Gradient of a function at a point is orthogonal to the contours.

Hessian:
Similarly in case of uni-variate optimization the sufficient condition for x to be the minimizer of the function f(x) is:
Second-order sufficiency condition: f”(x) > 0 or d²f/dx² > 0

And this is replaced by what we call a Hessian matrix in the multivariate case. So, this is a matrix of dimension n*n, and the first component is $\partial ^2f/ \partial x_1^2$ , the second component is $\partial ^2f/\partial x_1 \partial x_2$ and so on.

$Hessian = \nabla ^2 f = \begin{bmatrix} \partial ^2f/ \partial x_1^2 & \partial ^2f/\partial x_1 \partial x_2 & ... & \partial ^2f/ \partial x_1 \partial x_n\\ \partial ^2f/\partial x_2 \partial x_1 & \partial ^2f/ \partial x_2^2 & ... & \partial ^2f/ \partial x_2 \partial x_n\\ ... & ... & ... & ...\\ ... & ... & ... & ...\\ \partial ^2f/\partial x_n \partial x_1 & \partial ^2f/\partial x_n \partial x_2 & ... & \partial ^2f/ \partial x_n^2\\ \end{bmatrix}$

Note:

Hessian is a symmetric matrix.
Hessian matrix is said to be positive definite at a point if all the eigenvalues of the Hessian matrix are positive.

Suggest improvement

Multivariate Optimization - KKT Conditions

Share your thoughts in the comments

Multivariate Optimization – Gradient and Hessian

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?