Skip to content
Related Articles

Related Articles

Multivariate Optimization – Gradient and Hessian
  • Last Updated : 17 Jul, 2020

In a multivariate optimization problem, there are multiple variables that act as decision variables in the optimization problem.

z = f(x1, x2, x3…..xn)

So, when you look at these types of problems a general function z could be some non-linear function of decision variables x1, x2, x3 to xn. So, there are n variables that one could manipulate or choose to optimize this function z. Notice that one could explain univariate optimization using pictures in two dimensions that is because in the x-direction we had the decision variable value and in the y-direction, we had the value of the function. However, if it is multivariate optimization then we have to use pictures in three dimensions and if the decision variables are more than 2 then it is difficult to visualize.

Gradient:
Before explaining gradient let us just contrast with the necesaary condition of univariate case. So in case of uni-variate optimization the necessary condition for x to be the minimizer of the function f(x) is:

First-order necessary condition: f'(x) = 0

So, the derivative in a single-dimensional case becomes what we call as a gradient in the multivariate case.

According to the first-order necessary condition in univariate optimization e.g f'(x) = 0 or one can also write it as df/dx. However, since there are many variables in the case of multivariate and we have many partial derivatives and the gradient of the function f is a vector such that in each component one can compute the derivative of the function with respect to the corresponding variable. So, for example, \partial f/ \partial x_1 is the first component, \partial f/ \partial x_2 is the second component and \partial f/ \partial x_n is the last component.

 Gradient = \nabla f = \begin{bmatrix} \partial f/ \partial x_1\\ \partial f/ \partial x_2\\ ...\\ ...\\ \partial f/ \partial x_n\\ \end{bmatrix}



Note: Gradient of a function at a point is orthogonal to the contours.

Hessian:
Similarly in case of uni-variate optimization the sufficient condition for x to be the minimizer of the function f(x) is:
Second-order sufficiency condition: f”(x) > 0 or d2f/dx2 > 0

And this is replaced by what we call a Hessian matrix in the multivariate case. So, this is a matrix of dimension n*n, and the first component is \partial ^2f/ \partial x_1^2, the second component is \partial ^2f/\partial x_1 \partial x_2 and so on.

 Hessian = \nabla ^2 f = \begin{bmatrix} \partial ^2f/ \partial x_1^2 & \partial ^2f/\partial x_1 \partial x_2 & ... & \partial ^2f/ \partial x_1 \partial x_n\\ \partial ^2f/\partial x_2 \partial x_1 & \partial ^2f/ \partial x_2^2 & ... & \partial ^2f/ \partial x_2 \partial x_n\\ ... & ... & ... & ...\\ ... & ... & ... & ...\\ \partial ^2f/\partial x_n \partial x_1 & \partial ^2f/\partial x_n \partial x_2 & ... & \partial ^2f/ \partial x_n^2\\ \end{bmatrix}

Note:

  • Hessian is a symmetric matrix.
  • Hessian matrix is said to be positive definite at a point if all the eigenvalues of the Hessian matrix are positive.

machine-learning

My Personal Notes arrow_drop_up
Recommended Articles
Page :