Skip to content
Related Articles

Related Articles

Improve Article

Multivariate Optimization – Gradient and Hessian

  • Last Updated : 24 Sep, 2021
Geek Week

In a multivariate optimization problem, there are multiple variables that act as decision variables in the optimization problem. 
z = f(x1, x2, x3…..xn

So, when you look at these types of problems a general function z could be some non-linear function of decision variables x1, x2, x3 to xn. So, there are n variables that one could manipulate or choose to optimize this function z. Notice that one could explain univariate optimization using pictures in two dimensions that is because in the x-direction we had the decision variable value and in the y-direction, we had the value of the function. However, if it is multivariate optimization then we have to use pictures in three dimensions and if the decision variables are more than 2 then it is difficult to visualize. 

Before explaining gradient let us just contrast with the necessary condition of univariate case. So in case of uni-variate optimization the necessary condition for x to be the minimizer of the function f(x) is: 
First-order necessary condition: f'(x) = 0 

So, the derivative in a single-dimensional case becomes what we call as a gradient in the multivariate case. 

According to the first-order necessary condition in univariate optimization e.g f'(x) = 0 or one can also write it as df/dx. However, since there are many variables in the case of multivariate and we have many partial derivatives and the gradient of the function f is a vector such that in each component one can compute the derivative of the function with respect to the corresponding variable. So, for example, \partial f/ \partial x_1 is the first component, \partial f/ \partial x_2 is the second component and \partial f/ \partial x_n is the last component. 
Gradient = \nabla f = \begin{bmatrix} \partial f/ \partial x_1\\ \partial f/ \partial x_2\\ ...\\ ...\\ \partial f/ \partial x_n\\ \end{bmatrix}

Note: Gradient of a function at a point is orthogonal to the contours

Similarly in case of uni-variate optimization the sufficient condition for x to be the minimizer of the function f(x) is: 
Second-order sufficiency condition: f”(x) > 0 or d2f/dx2 > 0 

And this is replaced by what we call a Hessian matrix in the multivariate case. So, this is a matrix of dimension n*n, and the first component is \partial ^2f/ \partial x_1^2 , the second component is \partial ^2f/\partial x_1 \partial x_2 and so on. 

Hessian = \nabla ^2 f = \begin{bmatrix} \partial ^2f/ \partial x_1^2 & \partial ^2f/\partial x_1 \partial x_2 & ... & \partial ^2f/ \partial x_1 \partial x_n\\ \partial ^2f/\partial x_2 \partial x_1 & \partial ^2f/ \partial x_2^2 & ... & \partial ^2f/ \partial x_2 \partial x_n\\ ... & ... & ... & ...\\ ... & ... & ... & ...\\ \partial ^2f/\partial x_n \partial x_1 & \partial ^2f/\partial x_n \partial x_2 & ... & \partial ^2f/ \partial x_n^2\\ \end{bmatrix}



  • Hessian is a symmetric matrix.
  • Hessian matrix is said to be positive definite at a point if all the eigenvalues of the Hessian matrix are positive.


Attention reader! Don’t stop learning now. Get hold of all the important Machine Learning Concepts with the Machine Learning Foundation Course at a student-friendly price and become industry ready.

My Personal Notes arrow_drop_up
Recommended Articles
Page :