Home page for accesible maths 3 Functions of two or more variables 3.4 The chain rule and applications 4 Double integrals: general regions and change of variable

Style control - access keys in brackets

Font (2 3) - + Letter spacing (4 5) - + Word spacing (6 7) - + Line spacing (8 9) - +

3.5 Constrained maxima and minima

Problem: Find the nearest point to $(0,0)$ on the curve $x^{2}+8xy+7y^{2}=225$ . In other words, find the point on the curve where $x^{2}+y^{2}$ is smallest.

This is one example of problems of the following type:

Find maxima and minima of a function $f(x,y)$ on the curve defined by $g(x,y)=0$ .

Two possible methods are as follows.

First method. It may be easy to solve $g(x,y)=0$ to express $y$ as $y(x)$ . For example, if $g(x,y)=xy-1=0$ , then $y=1/x$ . In this case, we can just substitute $y=y(x)$ and look for extrema of $f[x,y(x)]$ , a function of $x$ only. However, in most cases, this isn’t easy!

Example of the first method. Find the minimum of $x^{3}+3y^{2}$ on the curve $xy=1$ , $x>0$ .

Answer:

Second method. Instead of actually solving $g(x,y)=0$ to give $y=y(x)$ , just imagine that it has been done (it’s at least possible in principle!). This means that $g[x,y(x)]=0$ for all $x$ . Differentiating by the chain rule (with “ $t$ ” equal to $x$ ), we deduce that

g_{x}+g_{y}\;y^{\prime}(x)=0\qquad\mbox{for all }x.

(1)

As we saw above, we are looking for stationary points of $f[x,y(x)]$ . Again by the chain rule, such points satisfy

f_{x}+f_{y}\;y^{\prime}(x)=0.

(2)

To eliminate $y^{\prime}(x)$ , multiply (1) by $f_{y}$ and (2) by $g_{y}$ and subtract. We obtain $f_{y}g_{x}-g_{y}f_{x}=0$ , or:

f_{x}g_{y}=f_{y}g_{x}.

(3)

We solve (3) together with $g(x,y)=0$ to find the points. We don’t need to know $y(x)$ explicitly!

Aside on geometry

Equation (3) has a simple geometrical interpretation. It says that $(f_{x}\;\;f_{y})=\lambda(g_{x}\;\;g_{y})$ for some $\lambda\in{\mathbb{R}}$ (assuming $(f_{x}\;\;f_{y})\neq(0\;\;0)$ ). In other words, $\nabla f$ and $\nabla g$ are parallel. Hence at the points in question, the curves $g(x,y)=0$ and $f(x,y)=c$ (for the appropriate $c$ ) touch each other. When you think about it, this is what you would expect:

Changing the parameter $c$ moves the curve $f(x,y)=c$ around. In the diagram above, we might start with the value $c^{\prime}$ for $f(x,y)$ , and see how far we can increase or decrease $c^{\prime}$ . We want to do so in such a way that the curves $f(x,y)=c^{\prime}$ and $g(x,y)=0$ still intersect - the point being that if these two curves intersect, it means that there exist $x, y$ satisfying $g(x,y)=0$ and $f(x,y)=c$ (that is, $c$ is a value of $f(x,y)$ which occurs on the curve $g(x,y)=0$ ). Suppose we keep on increasing or decreasing $c^{\prime}$ until we reach a value $c$ such that if we carry on moving in the same direction then the curves $f(x,y)=c$ and $g(x,y)=0$ no longer have any points of intersection. This is demonstrated by the curve $f(x,y)=c$ in the diagram above. Then we have pushed (either increased or decreased) $c$ as far as it will go; locally, $c$ is an extreme (either maximum or minimum) value of $f(x,y)$ on the curve $g(x,y)=0$ . This is what it means for the curves $f(x,y)=c$ and $g(x,y)=0$ to have a common tangent.

The auxiliary function

Let’s look a little bit more at the formulation $\nabla f=\lambda\nabla g$ . Consider the function:

\Lambda(x,y,\lambda)=f(x,y)-\lambda g(x,y)

Think of $\lambda$ as a variable, and let’s try to find maxima and minima of $\Lambda(x,y,\lambda)$ as $x, y$ and $\lambda$ vary. Then all partial derivates are zero, hence: $f_{x}-\lambda g_{x}=f_{y}-\lambda g_{y}=0$ , and $g(x,y)=0$ . The final equation is just the constraint that we want to impose on $x$ and $y$ from the outset; the first two equations together imply $\nabla f=\lambda\nabla g$ , which is exactly what we want. This indicates a useful method for finding the (constrained) maxima and minima of $f(x,y)$ , by introducing the Lagrange multiplier $\lambda$ and looking for stationary points of the auxiliary function $\Lambda(x,y,\lambda)$ . In a moment we will see some examples that should make things a bit clearer; however, first we will see that a very similar method allows us to find constrained maxima and minima for functions of three or more variables.

Note that this method identifies stationary points, but gives no test to determine whether they are maxima or minima. Just as for a function of one variable, we can find stationary points which are neither maxima nor minima. To see whether a point is a local maximum or a local minimum, you have to use the special features of any particular problem.

Three variables. To find extreme values of $f(x,y,z)$ subject to the condition $g(x,y,z)=0$ .

In principle, the equation $g(x,y,z)=0$ can be solved to give $z=z(x,y)$ . This means that

g\left[x,y,z(x,y)\right]=0

for all $x, y$ . Taking partial derivatives with respect to $x$ and $y$ by the chain rule, we obtain

g_{x}+g_{z}\,z_{x}=0,\qquad g_{y}+g_{z}\,z_{y}=0.

(4)

We are looking for stationary points of $f[x,y,z(x,y)]$ (note that this is a function of just $x$ and $y$ ). At such points, the partial derivatives with respect to $x$ and $y$ will be 0, in other words:

f_{x}+f_{z}\,z_{x}=0,\qquad f_{y}+f_{z}\,z_{y}=0.

(5)

From the $x$ -equations in (4) and (5), eliminating the second term as before, we obtain $f_{x}g_{z}=f_{z}g_{x}$ . From the $y$ -equations, we obtain $f_{y}g_{z}=f_{z}g_{y}$ . Taken together, these two identities say

\frac{f_{x}}{g_{x}}=\frac{f_{y}}{g_{y}}=\frac{f_{z}}{g_{z}},

and we denote this common value by $\lambda$ . We can rewrite this as

f_{x}=\lambda g_{x},\qquad f_{y}=\lambda g_{y},\qquad f_{z}=\lambda g_{z}.

(6)

or more concisely as $\nabla f=\lambda\nabla g$ .

Just as for functions of two variables, we can find the solutions by looking for the stationary points of the auxiliary function $\Lambda(x,y,z,\lambda)=f(x,y,z)-\lambda g(x,y,z)$ .

Geometric aside. For functions of two variables, we interpreted the statement: $\nabla f=\lambda\nabla g$ to mean that the curves $f(x,y)=c$ and $g(x,y)=0$ just touched each other (they have parallel normal vectors) at the stationary point. In three variables we have a similar statement: since $\nabla f$ and $\nabla g$ are parallel, this means that the surfaces $g(x,y,z)=0$ and $f(x,y,z)=c$ have the same tangent plane. The value of $\lambda$ itself may or may not be of interest.

Let’s now apply this method of Lagrange multipliers to the example given in the beginning of this section.

Example 3.12

Find the least value of $x^{2}+y^{2}$ on the curve with equation $x^{2}+8xy+7y^{2}-225=0$ .

The auxiliary equation is $\Lambda(x,y,\lambda)=x^{2}+y^{2}-\lambda(x^{2}+8xy+7y^{2}-225)$ . Now find the partial derivatives:

\Lambda_{x}=\hskip 113.811024pt\;\;\Lambda_{y}=\hskip 113.811024pt

and $\Lambda_{\lambda}=-(x^{2}+8xy+7y^{2}-225)$ . So if $(x,y)$ is a stationary point, then

Thus $y(9\lambda-1)(\lambda+1)=0$ , and so we have 3 cases: (i) $y=0$ $\,$ (ii) $\lambda=1/9$ $\,$ and (iii) $\lambda=-1$ .

Case (i): $y=0$ .

Then, since $\Lambda_{\lambda}=0$ , we have $x^{2}=225$ , and hence $x=\pm 15$ . However, plugging $(15,0)$ or $(-15,0)$ into $\Lambda_{x}=0$ gives $\lambda=1$ , and into $\Lambda_{y}=0$ gives $\lambda=0$ , a contradiction.

Case (ii): $\lambda=1/9$ .

Then $(1-\lambda)x=$

So $y=2x$ . Now we can determine possible values of $x$ and $y$ using the final equation $\Lambda_{\lambda}$ , that is, $g(x,y)=x^{2}+8xy+7y^{2}-225=0$ . We obtain

$\quad$

and hence $x=\pm\sqrt{5}$ . Thus we obtain two stationary points for $\lambda=1/9$ : $(\sqrt{5},2\sqrt{5})$ and $(-\sqrt{5},-2\sqrt{5})$ . For both points, $x^{2}+y^{2}=25$ .

Case (iii): $\lambda=-1$ .

Then $(1-\lambda)x=$

So $x=-2y$ . Now $g(x,y)=$

hence $-5y^{2}=225$ , which has no solution.

We have therefore found all of the stationary points of $\Lambda(x,y,\lambda)$ : the two points $\pm(\sqrt{5},2\sqrt{5})$ . We don’t yet know whether they are maxima or minima. An ad hoc way to see whether they are maxima or minima is: the curve $x^{2}+8xy+7y^{2}-225=0$ passes through the point $(15,0)$ , for which $x^{2}+y^{2}=225$ . Thus $25$ must be the minimum (not maximum) value of $x^{2}+y^{2}$ . From a strict mathematical standpoint, this argument isn’t really good enough! As mentioned above, not all stationary points are maxima or minima: so perhaps it’s possible that there are some points on the curve $x^{2}+8xy+y^{2}=225$ for which $x^{2}+y^{2}>25$ , and other points for which $x^{2}+y^{2}<25$ ? However, for the purposes of this course, such ad hoc arguments will be considered acceptable. (We don’t want you to get bogged down in details of the nature of the stationary points.)

For those who are interested, there is (in this case) an easy trick to be really sure that $25$ is a minimum value: suppose there exist $x, y$ such that $x^{2}+8xy+7y^{2}=225$ and $x^{2}+y^{2}=d<25$ . Then $9(x^{2}+y^{2})-(x^{2}+8xy+7y^{2})=9d-225<0$ . But $9(x^{2}+y^{2})-(x^{2}+8xy+7y^{2})=8x^{2}-8xy+2y^{2}=2(2x-y)^{2}\geq 0$ , hence this is impossible.

Example 3.13

Consider an open-top box with side lengths $x, y, z$ and volume $4a^{3}$ . Find the minimum possible surface area of the box.

First of all, let’s write the equations down:

xyz-4a^{3}=0,\;\;\;A(x,y,z)=2xy+2xz+yz

There are three ways to find the minimum value of $A$ .

First method:

Since $xyz=4a^{3}$ , we can replace $z$ by $4a^{3}/xy$ , hence $A(x,y,z)=2xy+8a^{3}/y+4a^{3}/x$ . Now look for the stationary points of $B(x,y)=2xy+8a^{3}/y+4a^{3}/x$ ; we have

hence there is one stationary point: $(a,2a)$ . We get a value of $12a^{2}$ for $B(a,2a)$ . To see that this is a minimum value, one could check the double derivatives of $B(x,y)$ .

Second method:

The second method is based on the following useful result:

Proposition: Let $a_{1}$ , $a_{2},\ldots,a_{n}$ be positive numbers. Let

A=\frac{1}{n}(a_{1}+\cdots+a_{n})\quad\mbox{(the average, or ``arithmetic mean")},

G=(a_{1}a_{2}\ldots a_{n})^{1/n}\quad\mbox{(the ``geometric mean").}

Then $G\leq A$ . Equivalently, $\;a_{1}a_{2}...a_{n}\leq A^{n}$ .

Proof

Index the numbers so that $a_{1}\geq a_{2}\geq....\geq a_{n}$ . Unless they are all equal, we have $a_{1}>A>a_{n}$ . Form a new set of numbers as follows: replace $a_{1}$ and $a_{n}$ by $A$ and $a_{1}+a_{n}-A$ . The sum (hence the arithmetic mean) is unchanged, since $A+(a_{1}+a_{n}-A)=a_{1}+a_{n}$ . However, the product is increased, since

A(a_{1}+a_{n}-A)-a_{1}a_{n}=(a_{1}-A)(A-a_{n})>0.

Order the new set of numbers $b_{1}\geq b_{2}\geq....\geq b_{n}$ . At least one of them is $A$ . Unless they are all equal, repeat the process. The product is increased again, and at least two of the numbers are now $A$ . Continue: after at most $n-1$ processes, the numbers are all equal to $A$ , so their product (which is more than the original product $a_{1}a_{2}...a_{n}$ ) is $A^{n}$ . So we have shown that $a_{1}a_{2}\ldots a_{n}\leq A^{n}$ . $\square$

Consider $2xy$ , $2xz$ , $y z$ . These are positive numbers, hence their arithmetic mean is greater than their geometric mean:

\frac{2xy+2xz+yz}{3}\geq(4x^{2}y^{2}z^{2})^{1/3}=(64a^{6})^{1/3}=4a^{2}

Thus $2xy+2xz+yz\geq 12a^{2}$ , and equality holds when $2xy=2xz=yz$ (i.e. for $(x,y,z)=(a,2a,2a)$ ). This is by far the easiest of the three methods! It also has the advantage that we know instantly that $(a,2a,2a)$ is a minimum. However, this method can only be used in certain cases. Keep your eyes peeled for cases where it does work! Similarly, there are some cases where the Cauchy-Schwarz inequality quickly gives us a minimum (constrained) value. The method of Lagrange multipliers will also work, but will take more time and effort.

Third method:

We can use Lagrange multipliers to find the minimum value of $2xy+2xz+yz$ . In this case we have the auxiliary function $\Lambda(x,y,z,\lambda)=2xy+2xz+yz-\lambda(xyz-4a^{3})$ . Thus we would need to find the points where the four partial derivatives are zero. These partial derivatives are: $2y+2z-\lambda yz$ , $2x+z-\lambda xz$ , $2x+y-\lambda yz$ and $xyz-4a^{3}$ . Without going any further, you can probably already see that this will be more difficult than the first method, and much more arduous than the second! Lagrange multipliers are very important for cases where substitution isn’t possible, and neither Cauchy-Schwarz, nor the theorem on arithmetic and geometric means can be used.