Home page for accesible maths 3 Functions of two or more variables 3.3 Expressing

f

\nabla\phi

Style control - access keys in brackets

Font (2 3) - + Letter spacing (4 5) - + Word spacing (6 7) - + Line spacing (8 9) - +

3.4 The chain rule and applications

First version of the chain rule. Let $f(x,y)$ be a function in $x$ and $y$ . Suppose we move along a path described by the parametrized curve $(x(t),y(t))=\gamma(t)$ . What rate of change (relative to $t$ ) do we experience? By (*),

	$\displaystyle\delta f$	$\displaystyle\approx$	$\displaystyle f_{x}\>\delta x+f_{y}\>\delta y$
		$\displaystyle\approx$	$\displaystyle\left(f_{x}\frac{\delta x}{\delta t}+f_{y}\frac{\delta y}{\delta t% }\right)\>\delta t.$

Hence, in the limit,

\frac{df}{dt}=f_{x}\,\frac{dx}{dt}+f_{y}\,\frac{dy}{dt}=\frac{\partial f}{% \partial x}\,\frac{dx}{dt}+\frac{\partial f}{\partial y}\,\frac{dy}{dt}=\nabla f% \cdot\gamma^{\prime}(t).

This is the simplest form of the chain rule. Similarly if there are three variables $x,\,y,\,z$ . It is valid if all the derivatives exist and are continuous.

Note. Another way to say this is: if $F(t)=f(\gamma(t))$ then $F^{\prime}(t)=\nabla f\cdot\gamma^{\prime}(t)$ . But we should be a little bit careful here: $\nabla f$ is a function which depends on $(x,y)\in{\mathbb{R}}^{2}$ , not on $t\in{\mathbb{R}}$ . More specifically, then: $F^{\prime}(t)=\nabla f(\gamma(t))\cdot\gamma^{\prime}(t)$ . For careful proofs, this distinction is essential. However, the practice of using the same symbol for both (that is, writing $f(t)$ rather than $F(t)$ ) is standard in applied mathematics, where it will represent some physical quantity (e.g. temperature, perhaps denoted by $T$ ).

Application of the chain rule: directional derivatives. Let $f$ be a function of $x,\,y$ . Let $\mbox{\boldmath$n$}=(n_{1}\;\;n_{2})$ be a unit vector. The line through the point $\mbox{\boldmath$r$}_{0}$ in the direction $𝒏$ is described by $\mbox{\boldmath$r$}=\mbox{\boldmath$r$}_{0}+t\mbox{\boldmath$n$}$ . The directional derivative of $f$ at $\mbox{\boldmath$r$}_{0}$ in the direction $𝒏$ is the rate of change as one moves along this line, that is:

\frac{d}{dt}f(\mbox{\boldmath$r$}_{0}+t\mbox{\boldmath$n$})=\frac{d}{dt}f(x_{0% }+tn_{1},y_{0}+tn_{2}).

By the chain rule, this is

n_{1}\frac{\partial f}{\partial x}+n_{2}\frac{\partial f}{\partial y}=(\nabla f% ).\mbox{\boldmath$n$}.

$\quad$

Note that this is largest (with value $|\nabla f|$ ) when $𝒏$ is in the direction of $\nabla f$ . So $\nabla f$ gives the direction of greatest increase of $f$ , and the rate of this increase is $|\nabla f|$ . The direction of greatest decrease of $f$ is in the opposite direction (with rate $-|\nabla f|$ ).

Second version of the chain rule. We now describe a more general version of the chain rule. Suppose that $x$ and $y$ are expressed in terms of two other variables $u,\,v$ (for example, $(u,v)$ could be an alternative coordinate system). Substitution expresses $f$ in terms of $u$ and $v$ (again, one should really use a new symbol like $F$ ), and the problem is to find the partial derivatives $\displaystyle{\frac{\partial f}{\partial u}}$ and $\displaystyle{\frac{\partial f}{\partial v}}$ (meaning, of course, the derivative with respect to $u$ or $v$ when the other is kept constant; the notation is somewhat deficient, because it fails to specify what is being kept constant!).

A small change $\delta u$ to $u$ , with $v$ kept constant, causes changes

\delta x\approx\frac{\partial x}{\partial u}\>\delta u,\qquad\delta y\approx% \frac{\partial y}{\partial u}\>\delta u,

By (*), the resulting change in $f$ is approximately

\frac{\partial f}{\partial x}\delta x+\frac{\partial f}{\partial y}\delta y% \approx\left(\frac{\partial f}{\partial x}\,\frac{\partial x}{\partial u}+% \frac{\partial f}{\partial y}\,\frac{\partial y}{\partial u}\right)\>\delta u.

So (as before):

Proposition 3.10 If all the partial derivatives are continuous, then

\frac{\partial f}{\partial u}=\frac{\partial f}{\partial x}\,\frac{\partial x}% {\partial u}+\frac{\partial f}{\partial y}\,\frac{\partial y}{\partial u},

and similarly for v. $\Box$

The two statements can be put together nicely in matrix form:

\left(\begin{array}[]{cc}\displaystyle{\frac{\partial f}{\partial u}}&% \displaystyle{\frac{\partial f}{\partial v}}\end{array}\right)=\left(\begin{% array}[]{cc}\displaystyle{\frac{\partial f}{\partial x}}&\displaystyle{\frac{% \partial f}{\partial y}}\end{array}\right)\left(\begin{array}[]{cc}% \displaystyle{\frac{\partial x}{\partial u}}&\displaystyle{\frac{\partial x}{% \partial v}}\\ \displaystyle{\frac{\partial y}{\partial u}}&\displaystyle{\frac{\partial y}{% \partial v}}\end{array}\right).

The matrix on the right is called the Jacobian matrix for $x$ and $y$ in terms of $u$ and $v$ .

Suppose we now invert the process and express $u$ and $v$ in terms of $x$ and $y$ . The previous identity, applied first with $f=u$ , then with $f=v$ , gives

\left(\begin{array}[]{cc}\displaystyle{\frac{\partial u}{\partial x}}&% \displaystyle{\frac{\partial u}{\partial y}}\\ \displaystyle{\frac{\partial v}{\partial x}}&\displaystyle{\frac{\partial v}{% \partial y}}\end{array}\right)\left(\begin{array}[]{cc}\displaystyle{\frac{% \partial x}{\partial u}}&\displaystyle{\frac{\partial x}{\partial v}}\\ \displaystyle{\frac{\partial y}{\partial u}}&\displaystyle{\frac{\partial y}{% \partial v}}\end{array}\right)=\left(\begin{array}[]{cc}\displaystyle{\frac{% \partial u}{\partial u}}&\displaystyle{\frac{\partial u}{\partial v}}\\ \displaystyle{\frac{\partial v}{\partial u}}&\displaystyle{\frac{\partial v}{% \partial v}}\end{array}\right)=\left(\begin{array}[]{cc}1&0\\ 0&1\end{array}\right).

In other words, the Jacobian matrices are inverses of each other.

Example 3.11. Let $x=7u+2v,\;y=3u+v$ . Then $u=x-2y$ and $v=-3x+7y$ . The Jacobian matrices are

which are indeed inverse to each other. Note that $\displaystyle{\frac{\partial x}{\partial u}}$ is not equal to $1/\displaystyle{\frac{\partial u}{\partial x}}\>$ !

Higher derivatives. By applying the chain rule twice, one can obtain expressions for second-order partial derivatives with respect to new variables. We only consider the special case of the following type: let

x=au+bv,\quad y=cu+dv,

where $a,\,b,\,c,\,d$ are constants. Then, by the chain rule,

$\displaystyle{\frac{\partial f}{\partial u}=}$

Write $\displaystyle{\frac{\partial f}{\partial u}=g}$ . Then, by the chain rule again,

$\displaystyle{\frac{\partial^{2}f}{\partial u^{2}}=\frac{\partial g}{\partial u% }=}$

(using $f_{yx}=f_{xy}$ ). Of course, $\displaystyle{\frac{\partial^{2}f}{\partial v^{2}}}$ is given by a similar expression with $b$ and $d$ replacing $a$ and $c$ .

Some partial differential equations.

Recall that if $f$ is a function of $x$ and $y$ and $f_{y}=0$ throughout the plane, then $f(x,y)=g(x)$ for some function $g$ of one variable. This is the simplest example of a partial differential equation.

Some more interesting examples follow.

Example. Suppose that $f=f(x,y)$ and $f_{xy}=0$ (throughout the plane). Then $f(x,y)=g(x)+h(y)$ for certain functions $g,\,h$ .

Reason: Since $(f_{x})_{y}=0$ , we have $f_{x}=\phi(x)$ for some function $\phi$ . Let $g(x)$ be an indefinite integral of $\phi(x)$ , so that $g^{\prime}(x)=\phi(x)$ . Then

\frac{\partial}{\partial x}\Big{[}f(x,y)-g(x)\Big{]}=f_{x}(x,y)-\phi(x)=0

for all $x$ , so $\>f(x,y)-g(x)=h(y)\;$ for some function $h$ .

Remark. Where ordinary differential equations allow the choice of arbitrary constants, partial differential equations give a choice of arbitrary functions.

Example. $\displaystyle{\frac{\partial f}{\partial y}=c\frac{\partial f}{\partial x}}$ , where $f$ is a function of $x$ and $y$ .

We show that solutions of this equation are of the form $f(x,y)=g(x+cy)$ for some (differentiable) function $g$ of one variable. Note first that any such $f$ certainly satisfies the equation, by a simple application of the ordinary chain rule. Introduce new variables $u,\,v$ by: $u=x+cy,\;v=x-cy$ . Then $x=\frac{1}{2}(u+v)$ and $y=\frac{1}{2c}(u-v)$ , so, by the chain rule

$\displaystyle{\frac{\partial f}{\partial v}=}$

Hence $f$ is a function of $u$ only: $f=g(u)=g(x+cy)$ for some function $g$ .

Example (the wave equation). This is the equation

\frac{\partial^{2}y}{\partial t^{2}}=c^{2}\frac{\partial^{2}y}{\partial x^{2}},

in which $y$ (the displacement) is a function of $x$ (distance along the string) and $t$ (time). We show that solutions are of the form $y=f(x+ct)+g(x-ct)$ , where $f$ and $g$ are functions. (Again, any such $y$ certainly satisfies the equation.)

Introduce new variables $u,\,v$ by: $u=x+ct,\;v=x-ct$ . By the chain rule for second derivatives, we then have

$\displaystyle{\frac{\partial^{2}y}{\partial x^{2}}=}$

$\displaystyle{\frac{\partial^{2}y}{\partial t^{2}}=}$

From these two equations and the original equation, we obtain $\displaystyle{\frac{\partial^{2}y}{\partial u\,\partial v}=0}$ , so (by an earlier example) $y=f(u)+g(v)$ for some functions $f,\,g$ .