Applications of Multivariable Differential Calculus

Notation for Partial derivatives

We use subscripts to denote partial derivatives: F_x is the partial derivative of F with respect to x. F_xx is the second partial derivative of F with respect to x. F_xy is the partial derivative with respect to y of the partial derivative with respect to x of F.
The order of partial differentiation does not matter: F_xy=F_yx.

Linearization

The linearization of the function F(x,y,z) near the point P=(x₀, y₀, z₀) is the function L(x,y,z) = (x-x₀)*F_x(P) + (y-y₀)*F_y(P) + (z-z₀)*F_z(P).
The linearization of the equation F(x,y,z)=0 near the point P is L(x,y,z)=0, which, solved for z, says that
z=z₀ - (x-x₀)*F_x(P)/F_z(P) - (y-y₀)*F_y(P)/F_z(P).
The equation F(x,y,z)=0 implicitly defines the function z(x,y). The partial derivatives of the implicit function z(x,y) are equal to the partial derivatives of its linearization: z_x = -F_x/F_z and z_y = -F_y/F_z (evaluated at P). Remember the minus sign!

Extrema

Maxima and minima can only occur at critical points (e.g. where the derivative (i.e. the gradient) is zero). (Recall that the gradient is the vector of partial derivatives.) A more precise explanation follows.
We find extrema by eliminating all the points that cannot be extrema and then examining the remaining candidates. A function F(x,y,z) cannot have an extremum at a point where it has a nonzero (partial) derivative. All other points in the domain of F are called critical points, because they are candidates for extrema. So critical points are points in the domain of the function where the derivative is zero or is undefined. All the boundary points of the domain are critical points, because the derivative of a function does not exist at a boundary point of its domain (even though a one-sided derivative may exist).
Not every critical point is an extremum!
To find a global maximum, we evaluate the function at the critical points and compare the values to find the maximum value.
Not every function has extrema.
To test whether we have a local extremum we look at the discriminant, f_xxf_yy- f_xy².
Fact: the discriminant does not change if we rotate coordinates, and there is a system of coordinates in which f_xy is zero. This gives us the second derivative test:
- A critical point is a local extremum if the discriminant is positive. (How can you tell if it is a maximum or a minimum?) (At such a point the second-order Taylor series is an ellipsoid, see below.)
- A critical point is a saddle point (neither a maximum nor a minimum) if the discriminant is negative. (At such a point the second-order Taylor Series is a hyperbolic paraboloid.)
- The second derivative(s) test is inconclusive (or indeterminate) if the discriminant is zero. (At such a point the second-order Taylor Series is a horizontal plane.)
Remark: For a function of three variables f(x,y,z), the local extremum test says that you have a local minimum if the Hessian matrix of second partial derivatives
```
 H = [f_xx  f_xy  f_xz]
     [f_yx  f_yy  f_yz]
     [f_zx  f_zy  f_zz]
   
```
is positive definite, which means that the principal minors (i.e. the determinants of the matrices formed by deleting the last n rows and columns, where n varies from 0 to 2, are all positive) and you have a local maximum if the matrix H is negative definite, i.e., if the matrix -H is positive definite.
Remark: If the second derivative test is inconclusive you may be able to use a higher-order Taylor expansion to determine if you have a local extremum.

Taylor series

A Taylor series for a function f(x,y) near the point P is a polynomial approximation for f whose (nonzero) partial derivatives agree with the partial derivatives of f at the point P.
Taylor expansion is discussed in section 14.10 in the book. Study this section and make sure that you can do the example problem (without peeking at the solution!).
For practice I'll suggest problem 9 (or problems 1 and 5 if you have not already done them) in the exercises for section 14.10.
Here is a short description of Taylor series. The Taylor series approximation T of the function f expanded around r₀=(0, 0) is just the polynomial whose partial derivatives agree with those of f at (0, 0):
T(x,y) = f(0,0) +x f_x +y f_y +(x²/2) f_xx +xy f_xy +(y²/2) f_yy +(x³/6) f_xxx +(x²y/2) f_xxy +(x y²/2) f_xyy +(y³/6) f_yyy + ...,
where the partial derivatives are all evaluated at (0, 0). To check your expansion (or the result above), all you have to do is to check that each partial derivative of the expansion agrees with the corresponding partial derivative of the function. For example, if in the generic expansion above you want to check the f_xxy term, just differentiate (x²y/2) f_xxy twice with respect to x and once with respect to y (remembering that f_xxy stands for f_xxy(0,0) and is therefore just a constant), and you will discover that T_xxy(0,0)= f_xxy(0,0), as required.
A first-order Taylor series of f is the first-order polynomial approximation: T₁(x,y) := f(0,0) +x f_x +y f_y. So the first-order Taylor series is the linear approximation. The second-order Tayler series is the second-order polynomial approximation: T₂(x,y) := f(0,0) +x f_x +y f_y +(x²/2) f_xx +xy f_xy +(y²/2) f_yy. (The second derivative test uses the second-order Taylor series to determine if the function is bowl-shaped (an elliptic paraboloid) or saddle-shaped (a hyperbolic paraboloid).) In general, an nth-order Taylor series of a function is its nth order polynomial approximation.
For a Taylor series expanded around the point (a, b), in the discussion above replace x with (x-a) and y with (y-b) and evaluate all functions and derivatives at the point (a, b) instead of the point (0, 0). So the Taylor series approximation T of the function f expanded around r₀=(a, b) is just the polynomial whose partial derivatives agree with those of f at (a, b):
T(x,y) = f(a,b) +(x-a) f_x +(y-b) f_y +((x-a)²/2) f_xx +(x-a)(y-b) f_xy +((y-b)²/2) f_yy +((x-a)³/6) f_xxx +((x-a)²y/2) f_xxy +((x-a) (y-b)²/2) f_xyy +((y-b)³/6) f_yyy + ...,
where the partial derivatives are all evaluated at (a, b). To check your expansion (or the result above), all you have to do is to check that each partial derivative of the expansion agrees with the corresponding partial derivative of the function. For example, if in the generic expansion above you want to check the f_xxy term, just differentiate ((x-a)²(y-b)/2) f_xxy twice with respect to x and once with respect to y (remembering that f_xxy stands for f_xxy(a,b) and is therefore just a constant), and you will discover that T_xxy(a,b)= f_xxy(a,b), as required.

Exercise: generalize the formulas above for Taylor series expansion to functions of three variables f(x,y,z).

Constrained optimization.

For a nice visual explanation of Lagrange multipliers (including multiple constraints), you can look at Steuard Jensen's tutorial.

In a constrained optimization problem you try to maximize a function on the solution set of an equation. For example, you might wish to maximize the function f(x,y) = 3*x + 4*y subject to the constraint that g(x,y) = x² + y²=25.

The old-fashioned way to solve this problem would be to solve the constraint equation for y to get a function y(x), and then try to maximize the function h(x)=f(x,y(x)). This works for this problem, but it is ugly because you have to deal with derivatives of square roots. And if g were more complicated you might not be able to solve for y analytically at all.

The new-fangled, elegant way is to use Lagrange multipliers. The recipe is simple: to maximize (or minimize) the function f(x,y) subject to the contraint g(x,y)=g(P), you set the gradient of f equal to an (unknown) multiple λ of the gradient of g. So you solve the system


   f_x = λ g_x ,
   f_y = λ g_y ,
   g(x,y) = g(P)

for the unknowns x, y, and λ.

For the example above, this system is:


   3 = λ2x ,
   4 = λ2y ,
   x² + y²=25.

This system has two solutions: (x=3, y=4, λ=1/2) and (x=-3, y=-4, λ=-1/2). These are the only points where a minimum or maximum can exist. So if the maximum and minimum exist, all we have to do to find them is to evaluate the function at these critical points and compare the values. How can we know if the maximum and minimum exist? The following theorem is the most generic way to guarantee it:

Theorem (existence of extrema of a continuous function on a closed, bounded set). Let S be a set of points (e.g. the solution set of a constraint equation). If S is closed (i.e. contains all its boundary points) and is bounded (i.e. the points in S have a maximum distance from the origin), then any continuous function f has both a maximum and a minimum on the set S.

In our example, the solution set of the constraint equation is a circle, which is closed and bounded. So f must have a max and a min on this set. The only possible points where f can have an extremum are the two critical points. The values of the function at the two critical points are f(3,4)=25 and f(-3,-4)=-25. So these values must be the max and the min. The point (or set of points) where the global maximum occurs (in this case the point (3,4)) is called the argument of the maximum, or argmax for short, and the point (or set of points) where the global minimum occurs (in this case the point (-3,-4)) is called the argument of the minimum, or argmin for short.

Remark: Many people prefer to write the constraint equation in the form g(x,y)=0, rather than leaving a constant on one side, g(x,y)=g(P). To see why, define the auxiliary function h(x,y,λ) := f(x,y) - λ*g(x,y). Now look for (unconstrained) extrema of h. Setting the gradient equal to zero gives


   0 = h_x = f_x - λ g_x,
   0 = h_y = f_y - λ g_y,
   0 = h_λ = -g(x,y),

which is precisely the system of equations that you have to solve to find extrema of f subject to the constraint g=0. So the unconstrained extrema of h are the extrema of f on the zero level-set of the constraint function g.

Below are two (basically equivalent) ways to see why Lagrange's recipe works. I recommend that you draw pictures and sketches of gradient vectors and level sets and try to understand what I write in terms of these diagrams.

Derivation of Lagrange's rule using directional derivatives.
Suppose f(r) has an extremum at the point P subject to the constraint g(r)=g(P).
To say that f(r) has a (local) extremum at P subject to the constraint g(r)=g(P) means that for any test path r(t) that lies in the solution set of the constraint equation and passes through P, the function f(r(t)) has a (local) extremum at P. For such a pather, d_tf(r(t))=0 where d_t denotes the derivative with respect to time. In other words:
```
    r(0) = P and g(r(t))=g(P) for all t imply that d_tf(r(t))=0.
  
```
Differentiating the constraint equation with respect to time, using the chain rule, and evaluating at time zero tells us that
```
    r(0) = P and r'(0)•∇g(P)=0 imply that r'(0)•∇f(P)=0.
  
```
That is, the constraint equation restricts allowed test paths to be perpendicular to the gradient of the constraint function, because these are the only directions in which the directional derivative of the constraint function is zero. In summary, where f(r) has an extremum subject to the constraint g(r)=g(P), the directional derivative of f must be zero in any direction in which the directional derivative of g is zero. To finish up the derivation, we use vector projection to write the the gradient of f as the sum of a vector parallel to g and a vector perpendicular to g: there exists a scalar λ and a vector u such that
```
      ∇f = λ ∇g + u, where u•∇g = 0.
    
```
Now let r(t) be a path through P such that r'(0)=u and r(0) = P. (You should convince yourself that such a (test) path exists.) Then r'(0)•∇g(P)=0, but u•∇f = u•(λ∇g+u) = u•u, which is nonzero if and only if u≠0. So, the only points where an extremum can occur are points where u is zero, i.e., where we can write
```
      ∇f = λ ∇g.
    
```
Derivation of Lagrange's rule using local linearization.
To understand why Lagrange multipliers work, we first note some guiding ideas:
1. We find extrema by eliminating points that cannot be extrema,
2. calculus works by approximating smooth functions with flat (linear) functions, and
3. the only linear functions that have a maximum or a minimum are constant functions.
Putting these ideas together gives us a general principle: If the linearization of f at P subject to the linearization of the constraint equation at P is not constant, then f subject to the constraint equation cannot have an extremum at P. We will show that if the gradient of f is not parallel to the gradient of the constraint equation, then the linearization of f subject to the linearization of the constraint equation is not constant. The linear approximations of f and g near the point P are the functions
```
    f_L(r) := f(P) + (r-P)•∇f(P),
    g_L(r) := g(P) + (r-P)•∇g(P).
 
```
Recall that one can use projection to write any vector (e.g. ∇f) as the sum of components parallel and perpendicular to another vector (e.g. ∇g). That is, we can write ∇f = λ∇g+u for some λ and some u where u•∇g=0. Now consider a point r=P+su (where s is any nonzero value, e.g., 1). Then (r-P)•∇g=0, so r satisfies the linearized constraint equation g_L(r)=g(P). But
```
 
     f(r) = f(P) + (r-P)•∇f
          = f(P) + su•∇f
          = f(P) + su•(λ∇g+u)
          = f(P) + su•u
          = f(P) if and only if u = 0.
   
```
In other words, the linearization of the function f is constant when restricted to the solutions of the linearized constraint equation if and only if the gradient of f and the gradient of the constraint function are parallel, which is what we needed to verify.

Exercise: generalize one of the arguments for Lagrange multipliers to work for multiple constraint functions.