Multivariate Optimization – Gradient and Hessian

In a multivariate optimization problem, there are multiple variables that act as decision variables in the optimization problem.

z = f(x1, x2, x3…..xn)

So, when you look at these types of problems a general function z could be some non-linear function of decision variables x1, x2, x3 to xn. So, there are n variables that one could manipulate or choose to optimize this function z. Notice that one could explain univariate optimization using pictures in two dimensions that is because in the x-direction we had the decision variable value and in the y-direction, we had the value of the function. However, if it is multivariate optimization then we have to use pictures in three dimensions and if the decision variables are more than 2 then it is difficult to visualize.

Before explaining gradient let us just contrast with the necesaary condition of univariate case. So in case of uni-variate optimization the necessary condition for x to be the minimizer of the function f(x) is:

First-order necessary condition: f'(x) = 0

So, the derivative in a single-dimensional case becomes what we call as a gradient in the multivariate case.

According to the first-order necessary condition in univariate optimization e.g f'(x) = 0 or one can also write it as df/dx. However, since there are many variables in the case of multivariate and we have many partial derivatives and the gradient of the function f is a vector such that in each component one can compute the derivative of the function with respect to the corresponding variable. So, for example, \partial f/ \partial x_1 is the first component, \partial f/ \partial x_2 is the second component and \partial f/ \partial x_n is the last component.

 Gradient = \nabla f = \begin{bmatrix} \partial f/ \partial x_1\\ \partial f/ \partial x_2\\ ...\\ ...\\ \partial f/ \partial x_n\\ \end{bmatrix}

Note: Gradient of a function at a point is orthogonal to the contours.

Similarly in case of uni-variate optimization the sufficient condition for x to be the minimizer of the function f(x) is:
Second-order sufficiency condition: f”(x) > 0 or d2f/dx2 > 0

And this is replaced by what we call a Hessian matrix in the multivariate case. So, this is a matrix of dimension n*n, and the first component is \partial ^2f/ \partial x_1^2, the second component is \partial ^2f/\partial x_1 \partial x_2 and so on.

 Hessian = \nabla ^2 f = \begin{bmatrix} \partial ^2f/ \partial x_1^2 & \partial ^2f/\partial x_1 \partial x_2 & ... & \partial ^2f/ \partial x_1 \partial x_n\\ \partial ^2f/\partial x_2 \partial x_1 & \partial ^2f/ \partial x_2^2 & ... & \partial ^2f/ \partial x_2 \partial x_n\\ ... & ... & ... & ...\\ ... & ... & ... & ...\\ \partial ^2f/\partial x_n \partial x_1 & \partial ^2f/\partial x_n \partial x_2 & ... & \partial ^2f/ \partial x_n^2\\ \end{bmatrix}


  • Hessian is a symmetric matrix.
  • Hessian matrix is said to be positive definite at a point if all the eigenvalues of the Hessian matrix are positive.
My Personal Notes arrow_drop_up

Technical Content Engineer at GeeksForGeeks

If you like GeeksforGeeks and would like to contribute, you can also write an article using or mail your article to See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.

Article Tags :
Practice Tags :

Be the First to upvote.

Please write to us at to report any issue with the above content.