We find extrema by eliminating all the points that cannot be extrema and then examining the remaining candidates. A function F(x,y,z) cannot have an extremum at a point where it has a nonzero (partial) derivative. All other points in the domain of F are called critical points, because they are candidates for extrema. So critical points are points in the domain of the function where the derivative is zero or is undefined. All the boundary points of the domain are critical points, because the derivative of a function does not exist at a boundary point of its domain (even though a one-sided derivative may exist).
H = [fxx fxy fxz] [fyx fyy fyz] [fzx fzy fzz]is positive definite, which means that the principal minors (i.e. the determinants of the matrices formed by deleting the last n rows and columns, where n varies from 0 to 2, are all positive) and you have a local maximum if the matrix H is negative definite, i.e., if the matrix -H is positive definite.
For a nice visual explanation of Lagrange multipliers (including multiple constraints), you can look at Steuard Jensen's tutorial.
In a constrained optimization problem you try to maximize a function on the solution set of an equation. For example, you might wish to maximize the function f(x,y) = 3*x + 4*y subject to the constraint that g(x,y) = x2 + y2=25.
The old-fashioned way to solve this problem would be to solve the constraint equation for y to get a function y(x), and then try to maximize the function h(x)=f(x,y(x)). This works for this problem, but it is ugly because you have to deal with derivatives of square roots. And if g were more complicated you might not be able to solve for y analytically at all.
The new-fangled, elegant way is to use Lagrange multipliers. The recipe is simple: to maximize (or minimize) the function f(x,y) subject to the contraint g(x,y)=g(P), you set the gradient of f equal to an (unknown) multiple λ of the gradient of g. So you solve the system
fx = λ gx , fy = λ gy , g(x,y) = g(P)for the unknowns x, y, and λ.
For the example above, this system is:
3 = λ2x , 4 = λ2y , x2 + y2=25.This system has two solutions: (x=3, y=4, λ=1/2) and (x=-3, y=-4, λ=-1/2). These are the only points where a minimum or maximum can exist. So if the maximum and minimum exist, all we have to do to find them is to evaluate the function at these critical points and compare the values. How can we know if the maximum and minimum exist? The following theorem is the most generic way to guarantee it:
Theorem (existence of extrema of a continuous function on a closed, bounded set). Let S be a set of points (e.g. the solution set of a constraint equation). If S is closed (i.e. contains all its boundary points) and is bounded (i.e. the points in S have a maximum distance from the origin), then any continuous function f has both a maximum and a minimum on the set S.
In our example, the solution set of the constraint equation is a circle, which is closed and bounded. So f must have a max and a min on this set. The only possible points where f can have an extremum are the two critical points. The values of the function at the two critical points are f(3,4)=25 and f(-3,-4)=-25. So these values must be the max and the min. The point (or set of points) where the global maximum occurs (in this case the point (3,4)) is called the argument of the maximum, or argmax for short, and the point (or set of points) where the global minimum occurs (in this case the point (-3,-4)) is called the argument of the minimum, or argmin for short.
Remark: Many people prefer to write the constraint equation in the form g(x,y)=0, rather than leaving a constant on one side, g(x,y)=g(P). To see why, define the auxiliary function h(x,y,λ) := f(x,y) - λ*g(x,y). Now look for (unconstrained) extrema of h. Setting the gradient equal to zero gives
0 = hx = fx - λ gx, 0 = hy = fy - λ gy, 0 = hλ = -g(x,y),which is precisely the system of equations that you have to solve to find extrema of f subject to the constraint g=0. So the unconstrained extrema of h are the extrema of f on the zero level-set of the constraint function g.
Below are two (basically equivalent) ways to see why Lagrange's recipe works. I recommend that you draw pictures and sketches of gradient vectors and level sets and try to understand what I write in terms of these diagrams.
Suppose f(r) has an extremum at the point P subject to the constraint g(r)=g(P).
To say that f(r) has a (local) extremum at P subject to the constraint g(r)=g(P) means that for any test path r(t) that lies in the solution set of the constraint equation and passes through P, the function f(r(t)) has a (local) extremum at P. For such a pather, dtf(r(t))=0 where dt denotes the derivative with respect to time. In other words:
r(0) = P and g(r(t))=g(P) for all t imply that dtf(r(t))=0.
Differentiating the constraint equation with respect to time, using the chain rule, and evaluating at time zero tells us that
r(0) = P and r'(0)•∇g(P)=0 imply that r'(0)•∇f(P)=0.That is, the constraint equation restricts allowed test paths to be perpendicular to the gradient of the constraint function, because these are the only directions in which the directional derivative of the constraint function is zero. In summary, where f(r) has an extremum subject to the constraint g(r)=g(P), the directional derivative of f must be zero in any direction in which the directional derivative of g is zero. To finish up the derivation, we use vector projection to write the the gradient of f as the sum of a vector parallel to g and a vector perpendicular to g: there exists a scalar λ and a vector u such that
∇f = λ ∇g + u, where u•∇g = 0.Now let r(t) be a path through P such that r'(0)=u and r(0) = P. (You should convince yourself that such a (test) path exists.) Then r'(0)•∇g(P)=0, but u•∇f = u•(λ∇g+u) = u•u, which is nonzero if and only if u≠0. So, the only points where an extremum can occur are points where u is zero, i.e., where we can write
∇f = λ ∇g.
To understand why Lagrange multipliers work, we first note some guiding ideas:
Putting these ideas together gives us a general principle: If the linearization of f at P subject to the linearization of the constraint equation at P is not constant, then f subject to the constraint equation cannot have an extremum at P. We will show that if the gradient of f is not parallel to the gradient of the constraint equation, then the linearization of f subject to the linearization of the constraint equation is not constant. The linear approximations of f and g near the point P are the functions
fL(r) := f(P) + (r-P)•∇f(P), gL(r) := g(P) + (r-P)•∇g(P).Recall that one can use projection to write any vector (e.g. ∇f) as the sum of components parallel and perpendicular to another vector (e.g. ∇g). That is, we can write ∇f = λ∇g+u for some λ and some u where u•∇g=0. Now consider a point r=P+su (where s is any nonzero value, e.g., 1). Then (r-P)•∇g=0, so r satisfies the linearized constraint equation gL(r)=g(P). But
f(r) = f(P) + (r-P)•∇f = f(P) + su•∇f = f(P) + su•(λ∇g+u) = f(P) + su•u = f(P) if and only if u = 0.In other words, the linearization of the function f is constant when restricted to the solutions of the linearized constraint equation if and only if the gradient of f and the gradient of the constraint function are parallel, which is what we needed to verify.
Exercise: generalize one of the arguments for Lagrange multipliers to work for multiple constraint functions.