Home page for accesible maths 3 Functions of two or more variables

Style control - access keys in brackets

Font (2 3) - + Letter spacing (4 5) - + Word spacing (6 7) - + Line spacing (8 9) - +

3.4 The chain rule and applications

First version of the chain rule. Let f(x,y) be a function in x and y. Suppose we move along a path described by the parametrized curve (x(t),y(t))=γ(t). What rate of change (relative to t) do we experience? By (*),

δf fxδx+fyδy
(fxδxδt+fyδyδt)δt.

Hence, in the limit,

dfdt=fxdxdt+fydydt=fxdxdt+fydydt=fγ(t).

This is the simplest form of the chain rule. Similarly if there are three variables x,y,z. It is valid if all the derivatives exist and are continuous.

Note. Another way to say this is: if F(t)=f(γ(t)) then F(t)=fγ(t). But we should be a little bit careful here: f is a function which depends on (x,y)2, not on t. More specifically, then: F(t)=f(γ(t))γ(t). For careful proofs, this distinction is essential. However, the practice of using the same symbol for both (that is, writing f(t) rather than F(t)) is standard in applied mathematics, where it will represent some physical quantity (e.g. temperature, perhaps denoted by T).

Application of the chain rule: directional derivatives. Let f be a function of x,y. Let 𝒏=(n1n2) be a unit vector. The line through the point 𝒓0 in the direction 𝒏 is described by 𝒓=𝒓0+t𝒏. The directional derivative of f at 𝒓0 in the direction 𝒏 is the rate of change as one moves along this line, that is:

ddtf(𝒓0+t𝒏)=ddtf(x0+tn1,y0+tn2).

By the chain rule, this is

n1fx+n2fy=(f).𝒏.

Note that this is largest (with value |f|) when 𝒏 is in the direction of f. So f gives the direction of greatest increase of f, and the rate of this increase is |f|. The direction of greatest decrease of f is in the opposite direction (with rate -|f|).

Second version of the chain rule. We now describe a more general version of the chain rule. Suppose that x and y are expressed in terms of two other variables u,v (for example, (u,v) could be an alternative coordinate system). Substitution expresses f in terms of u and v (again, one should really use a new symbol like F), and the problem is to find the partial derivatives fu and fv (meaning, of course, the derivative with respect to u or v when the other is kept constant; the notation is somewhat deficient, because it fails to specify what is being kept constant!).

A small change δu to u, with v kept constant, causes changes

δxxuδu,δyyuδu,

By (*), the resulting change in f is approximately

fxδx+fyδy(fxxu+fyyu)δu.

So (as before):

Proposition 3.10 If all the partial derivatives are continuous, then

fu=fxxu+fyyu,

and similarly for v.

The two statements can be put together nicely in matrix form:

(fufv)=(fxfy)(xuxvyuyv).

The matrix on the right is called the Jacobian matrix for x and y in terms of u and v.

Suppose we now invert the process and express u and v in terms of x and y. The previous identity, applied first with f=u, then with f=v, gives

(uxuyvxvy)(xuxvyuyv)=(uuuvvuvv)=(1001).

In other words, the Jacobian matrices are inverses of each other.

Example 3.11. Let  x=7u+2v,y=3u+v. Then  u=x-2y  and  v=-3x+7y. The Jacobian matrices are

which are indeed inverse to each other. Note that xu is not equal to 1/ux!

Higher derivatives. By applying the chain rule twice, one can obtain expressions for second-order partial derivatives with respect to new variables. We only consider the special case of the following type: let

x=au+bv,y=cu+dv,

where a,b,c,d are constants. Then, by the chain rule,

fu=

Write fu=g. Then, by the chain rule again,

2fu2=gu=

(using fyx=fxy). Of course, 2fv2 is given by a similar expression with b and d replacing a and c.

Some partial differential equations.

Recall that if f is a function of x and y and fy=0 throughout the plane, then f(x,y)=g(x) for some function g of one variable. This is the simplest example of a partial differential equation.

Some more interesting examples follow.

Example. Suppose that f=f(x,y) and fxy=0 (throughout the plane). Then f(x,y)=g(x)+h(y) for certain functions g,h.

Reason: Since (fx)y=0, we have fx=ϕ(x) for some function ϕ. Let g(x) be an indefinite integral of ϕ(x), so that g(x)=ϕ(x). Then

x[f(x,y)-g(x)]=fx(x,y)-ϕ(x)=0

for all x, so f(x,y)-g(x)=h(y) for some function h.

Remark. Where ordinary differential equations allow the choice of arbitrary constants, partial differential equations give a choice of arbitrary functions.

Example. fy=cfx, where f is a function of x and y.

We show that solutions of this equation are of the form f(x,y)=g(x+cy) for some (differentiable) function g of one variable. Note first that any such f certainly satisfies the equation, by a simple application of the ordinary chain rule. Introduce new variables u,v by:  u=x+cy,v=x-cy. Then  x=12(u+v)  and  y=12c(u-v), so, by the chain rule

fv=

Hence f is a function of u only: f=g(u)=g(x+cy) for some function g.

Example (the wave equation). This is the equation

2yt2=c22yx2,

in which y (the displacement) is a function of x (distance along the string) and t (time). We show that solutions are of the form y=f(x+ct)+g(x-ct), where f and g are functions. (Again, any such y certainly satisfies the equation.)

Introduce new variables u,v by:  u=x+ct,v=x-ct. By the chain rule for second derivatives, we then have

2yx2=

2yt2=

From these two equations and the original equation, we obtain 2yuv=0, so (by an earlier example) y=f(u)+g(v) for some functions f,g.