13.4 Properties of the Derivative

In elementary calculus, we learn how to differentiate sums, products, quotients, and composite functions. We now generalize these ideas to functions of several variables, paying particular attention to the differentiation of composite functions. The rule for differentiating composites, called the chain rule, takes on a more profound form for functions of several variables than for those of one variable.

If \(f\) is a real-valued function of one variable, written as \(z = f(y)\), and \(y\) is a function of \(x\), written \(y= g(x)\), then \(z\) becomes a function of \(x\) through substitution, namely, \(z = f(g(x))\), and we have the familiar chain rule: \[ \frac{dz}{dx} = \frac{dz}{dy} \frac{dy}{dx} = f' (g(x)) g' (x). \]

If \(f\) is a real-valued function of three variables \(u,v\), and \(w\), written in the form \(z = f(u,v,w)\), and the variables \(u,v,w\) are each functions of \(x,u = g(x), v= h(x)\), and \(w = k (x)\), then by substituting \(g(x), h(x)\), and \(k (x)\) for \(u,v\), and \(w\), we obtain \(z\) as a function of \(x\colon\, z = f (g (x), h (x), k(x))\). The chain rule in this case reads: \[ \frac{dz}{dx} = \frac{\partial z}{\partial u} \frac{du}{dx} + \frac{\partial z}{\partial v} \frac{dv}{dx} + \frac{\partial z}{\partial w} \frac{dw}{dx}. \]

One of the goals of this section is to explain such formulas in detail.

Sums, Products, Quotients

These rules work just as they do in one-variable calculus.

125

Theorem 10 Sums, Products, Quotients

  • (i) Constant Multiple Rule. Let \(f\colon\, U \subset {\mathbb R}^n \rightarrow {\mathbb R}^m\) be differentiable at \({\bf x}_0\) and let \(c\) be a real number. Then \(h ({\bf x}) = cf ( {\bf x})\) is differentiable at \({\bf x}_0\) and \[ {\bf D} h ( {\bf x}_0) = c {\bf D} f ( {\bf x}_0) \qquad \hbox{(equality of matrices)}. \]
  • (ii) Sum Rule. Let \(f\colon\, U \subset {\mathbb R}^n \rightarrow {\mathbb R}^m\) and \(g\colon\, U \subset {\mathbb R}^n \rightarrow {\mathbb R}^m\) be differentiable at \({\bf x}_0\). Then \(h ( {\bf x}) = f ( {\bf x})+ g ( {\bf x})\) is differentiable at \({\bf x}_0\) and \[ {\bf D} h ( {\bf x}_0) = {\bf D} f ( {\bf x}_0) + {\bf D} g ( {\bf x}_0) \qquad \hbox{(sum of matrices).} \]
  • (iii) Product Rule. Let \(f\colon\, U \subset {\mathbb R}^n \rightarrow {\mathbb R}\) and \(g\colon\, U \subset {\mathbb R}^n \rightarrow {\mathbb R}\) be differentiable at \({\bf x}_0\) and let \(h ( {\bf x}) = g ({\bf x}) f ({\bf x})\). Then \(h\colon\, U \subset {\mathbb R}^n \rightarrow {\mathbb R}\) is differentiable at \({\bf x}_0\) and \[ {\bf D} h ( {\bf x}_0 ) = g ( {\bf x}_0) {\bf D}f ( {\bf x}_0)+ f ( {\bf x}_0) {\bf D} g ( {\bf x}_0). \] (Note that each side of this equation is a \(1\times n\) matrix; a more general product rule is presented in Exercise \(31\) at the end of this section.)
  • (iv) Quotient Rule. With the same hypotheses as in rule (iii), let \(h ( {\bf x}) = f ({\bf x}) / g ( {\bf x})\) and suppose \(g\) is never zero on \(U\). Then \(h\) is differentiable at \({\bf x}_0\) and \[ {\bf D} h ( {\bf x}_0) = \frac{ g ( {\bf x}_0) {\bf D} f ( {\bf x}_0) - f ( {\bf x}_0) {\bf D} g ( {\bf x}_0)}{ [ g ( {\bf x}_0)]^2} . \]

proof

The proofs of rules (i) through (iv) proceed almost exactly as in the one-variable case, with a slight difference in notation. We shall prove rules (i) and (ii), leaving the proofs of rules (iii) and (iv) as Exercise 27.

  • (i) To show that \({\bf D} h ( {\bf x}_0) = c {\bf D}f ({\bf x}_0)\), we must show that \[ {\mathop {{\rm limit} }_{{\bf x} \to {\bf x}_0}} \ \frac{ \| h ( {\bf x} ) - h ( {\bf x}_0) - c {\bf D} f ( {\bf x}_0) ( {\bf x}- {\bf x}_0) \| }{ \| {\bf x} - {\bf x}_0 \| } =0, \] that is, that \[ {\mathop {{\rm limit} }_{{\bf x} \to {\bf x}_0}} \ \frac{ \| cf( {\bf x}) - cf( {\bf x}_0) - c {\bf D} f ( {\bf x}_0) ( {\bf x}- {\bf x}_0) \| }{ \| {\bf x} - {\bf x}_0 \| } =0, \] [see equation (4) of Section 13.3]. This is certainly true, since \(f\) is differentiable and the constant \(c\) can be factored out [see Theorem 3(i), Section 13.2].
  • (ii) By the triangle inequality, we may write \begin{eqnarray*} &&\frac{ \| h ({\bf x}) - h ( {\bf x}_0) - [{\bf D} f ( {\bf x}_0) + {\bf D} g ({\bf x}_0) ] ( {\bf x}- {\bf x}_0) \| }{ \| {\bf x} - {\bf x}_0 \|} \\[3pt] && =\frac{ \| f({\bf x})-f({\bf x}_0)-[{\bf D} f({\bf x}_0)]({\bf x}-{\bf x}_0)+g({\bf x})-g({\bf x}_0)-[{\bf D}g({\bf x}_0)]({\bf x}-{\bf x}_0) \|}{\| {\bf x}-{\bf x}_0 \| } \\[3pt] && \leq \frac{ \| f({\bf x})-f({\bf x}_0)-[{\bf D} f({\bf x}_0)]({\bf x}-{\bf x}_0) \| }{ \| {\bf x}-{\bf x}_0 \| }+ \frac{ \| g({\bf x})-g({\bf x}_0)-[{\bf D}g({\bf x}_0)] ({\bf x}-{\bf x}_0) \| }{ \| {\bf x}-{\bf x}_0 \| }, \end{eqnarray*} and each term approaches 0 as \({\bf x}\rightarrow {\bf x}_0\). Hence, rule (ii) holds.

126

example 1

Verify the formula for \({\bf D}h\) in rule (iv) of Theorem 10 with \[ f(x,y,z)=x^2+y^2+z^2 \hbox{ and } g(x,y,z)=x^2+1. \]

solution Here \[ h(x,y,z)=\frac{x^2+y^2+z^2}{x^2+1}, \] so that by direct differentiation \begin{eqnarray*} {\bf D}h(x,y,z) &\!=\!& \bigg[\frac{\partial h}{\partial x},\frac{\partial h}{\partial y}, \frac{\partial h}{\partial z}\bigg] \!=\!\bigg[\frac{(x^2+1)2x-(x^2+y^2+z^2)2x}{(x^2+1)^2}, \frac{2y}{x^2+1},\frac{2z}{x^2+1}\bigg]\\[6pt] &=&\!\bigg[\frac{2x(1-y^2-z^2)}{(x^2+1)^2},\frac{2y}{x^2+1},\frac{2z}{x^2+1}\bigg]. \end{eqnarray*}

By rule (iv), we get \[ {\bf D}h=\frac{g{\bf D} f-f{\bf D}g}{g^2}= \frac{(x^2+1)[2x,2y,2z]-(x^2+y^2+z^2)[2x,0,0]}{(x^2+1)^2}, \] which is the same as what we obtained directly.

Question 13.116 Section 13.4 Progress Check Question 1

9dXZCZH35bHzXpuPvfCOBI18P5OhdQ0oAgkzYG6BaLR6eiI0yjfvmYmvkGeCkmmLE4dm5PsBAkqdLNx9jvTtWLjA50qIgcrj9dJe4JYrHFflCG0z3DdusAP+6CWa8LZZblzAOLJ1rO7iPzgujI306s/5oQ6LlcFhjTILtwyc+mAEYuMSs/PoJmJLm9wQfIk8Y1r9oq0vpZ68/a5/JzMMuGxBqf3D4PZpGaMQRgKx8KTuueMSnHQVbs87XK2PvInBnwgdIXHfL9ojPQ9wKKgRlEUABvy9CIfmxA60bAOFkraYaox3cHl8usUCjAQ7EXvNDhngbJ+ddSWcePRS+C6KiuaKJjoJRrQBgKBX7r7JlNydj+Hebd5ORJB0q3+b9vfMGnnIiv2reBPZdWjudA4zB9v5LenESlSD7z1S8dy/sXnnGZjH9BcvFKhYR0+A3jk2sKdwJhnESURQqln7aeqwHF4tJKXYHQ8g8ISUaM3AtG9JH0lp4D+J/krZXK9j88TzZhCOxqitPPpfPVbYnZjMstFhEv13rJ+LjExaihFu9fesyF/xuz8khtfNsLnpno3lUPfx9KLqKXWhyDw7k+dEyhCV0EQ16M1a8/hl+rS6UbwlXlLsQPcugeyKDb5wK3mp1sEel9kG6xexcHWJ/FXv2ZF2ObfjlZUCLiA1bqv2IMb2ETSf9I613dhXKKx24w26Vutchl5rZh7sDClInrPvjLF/RF0GaNmMu6lShU/Nec8I1dv3sI1mEy6pI/h9NCmOhsohi+8SGGwnGR7CbA2C5KbL5qJtFL9t1ozP+Gj5/k5G/e8DNPhGF3MXNbqTWNh2VqA+JOVpBBrHBNUqCIhb88t3s/NRibC7h63F447DbqTQJ17yuWCv4XR2HSEWAYj0w64WikPwT407UbtIgaVhM5yHMqSgmjS52dfbZDgkvoKzns/T2Kpu2csZfmjoKue6fo8uC7rqgCZLqscsQP8gtIPm7+khrzY/PvV2tSjuDpPbAmFFu9Itm1HBSjY=
3
Correct.
Incorrect.

Chain Rule

As we mentioned earlier, it is in the differentiation of composite functions that we meet apparently substantial alterations of the formula from one-variable calculus. However, if we use the \({\bf D}\) notation, that is, matrix notation for derivatives, the chain rule for functions of several variables looks similar to the one-variable rule.

Theorem 11 Chain Rule

Let \(U\subset {\mathbb R}^n\) and \(V \subset {\mathbb R}^m\) be open sets. Let \(g\colon\, U\subset {\mathbb R}^n\rightarrow {\mathbb R}^m\) and \(f\colon\, V \subset {\mathbb R}^m\rightarrow {\mathbb R}^p\) be given functions such that \(g\) maps \(U\) into \(V\), so that \(f \circ g\) is defined. Suppose \(g\) is differentiable at \({\bf x}_0\) and \(f\) is differentiable at \({\bf y}_0=g({\bf x}_0)\). Then \(f \circ g\) is differentiable at \({\bf x}_0\) and \begin{equation*} {\bf D}(f\circ g)({\bf x}_0)={\bf D} f({\bf y}_0){\bf D}g({\bf x}_0).\tag{1} \end{equation*}

The right-hand side is the matrix product of \({\bf D} f({\bf y}_0)\) with \({\bf D} g({\bf x}_0)\).

127

Below we shall give a proof of the chain rule under the additional assumption that the partial derivatives of f are continuous, building up to the general case by developing two special cases that are themselves important.

For the complete proof of Theorem 11 without the additional assumption of continuity, click here.

\(\bf proof\)

According to the definition of the derivative, we must verify that \[\lim_{\mathbf{x} \rightarrow \mathbf{x}_0} \frac{\|f (g (\mathbf{x}) )- f (g (\mathbf{x}_0)) - \mathbf{D} f(\mathbf{y}_0) \mathbf{D} g (\mathbf{x}_0) \cdot (\mathbf{x} - \mathbf{x}_0 )\|}{\|\mathbf{x} - \mathbf{x}_0\|} = 0. \] First rewrite the numerator and apply the triangle inequality as follows: \begin{align} & \|f(g(\mathbf{x})) - f(g(\mathbf{x}_0)) - \mathbf{D} f(\mathbf{y}_0 \cdot (g( \mathbf{x}) - g (\mathbf{x}_0)) \nonumber \\ & \qquad \qquad + \mathbf{D} f(\mathbf{y}_0) \cdot [g (\mathbf{x}) - g (\mathbf{x}_0) - \mathbf{D} g(\mathbf{x}_0)\cdot (\mathbf{x} - \mathbf{x}_0)]\|\nonumber \\ & \qquad \leq \| f(g(\mathbf{x})) - f(g(\mathbf{x}_0)) -\mathbf{D} f(\mathbf{y}_0) \cdot (g( \mathbf{x}) - g (\mathbf{x}_0))\|\nonumber \\ & \qquad \qquad + \|\mathbf{D}f (\mathbf{y}_0) \cdot [ g (\mathbf{x}) - g (\mathbf{x}_0) - \mathbf{D}g (\mathbf{x}_0) \cdot ( \mathbf{x} - \mathbf{x}_0])\|. \label{ineq_equation(3)} \tag {3} \end{align} As in the proof of Theorem 8, \(\|\mathbf{D} f (\mathbf{y}_0) \cdot \mathbf{h}\|\leq M\|\mathbf{h}\|\) for some constant \(M\). Thus the right-hand side of inequality (\ref{ineq_equation(3)}) is less than or equal to \begin{align} & \|f (g(\mathbf{x})) - f(g(\mathbf{x}_0)) -\mathbf{D} f(\mathbf{y}_0) \cdot (g(\mathbf{x}) - g (\mathbf{x}_0))\| \nonumber \\ & \qquad +M\|g(\mathbf{x}) - g (\mathbf{x}_0) - \mathbf{D} g (\mathbf{x}_0) \cdot (\mathbf{x} - \mathbf{x}_0)\|. \label{terms_inequality(4)} \tag {4} \end{align} Since \(g\) is differentiable at \(\mathbf{x}_0\), given \(\varepsilon \gt 0\), there is a \(\delta_1 \gt 0 \) such that \(0 \lt \| \mathbf{x} - \mathbf{x}_0 \|\lt \delta_1 \) implies \[\frac{\|g(\mathbf{x}) - g(\mathbf{x}_0) - \mathbf{D} g(\mathbf{x}_0) \cdot (\mathbf{x} - \mathbf{x}_0 )\|}{\|\mathbf{x} - \mathbf{x}_0 \|} \lt \frac{\varepsilon}{2M}. \] This makes the second term in expression (\ref{terms_inequality(4)}) less than \( \varepsilon \|\mathbf{x} - \mathbf{x}_0\|/2\). Let us turn to the first term in expression (\ref{terms_inequality(4)}). By Theorem 8, \[ \|g(\mathbf{x}) - g(\mathbf{x}_0) \| \lt M_1\|\mathbf{x} - \mathbf{x}_0\| \] for a constant \(M_1 \) if \(\mathbf{x}\) is near \( \mathbf{x}_0\), say \(0 \lt \|\mathbf{x} - \mathbf{x}_0\| \lt \delta_2\). Now choose \( \delta _3 \) such that \(0 \lt \|\mathbf{y} - \mathbf{y}_0\|\lt \delta _3 \) implies \[\|f(\mathbf{y}) - f(\mathbf{y}_0) - \mathbf{D} f(\mathbf{y}_0) \cdot (\mathbf{y} - \mathbf{y}_0)\|\lt \frac{\varepsilon\|\mathbf{y} - \mathbf{y}_0\|}{2M_1}. \] Since \(\mathbf{y} = g(\mathbf{x}) \) and \( \mathbf{y}_0 = g (\mathbf{x}_0), \| \mathbf{y} - \mathbf{y}_0 \|\lt \delta_3 \) if \(\|\mathbf{x} - \mathbf{x}_0\|\lt \delta_3 / M_1 \) and \(\|\mathbf{x} - \mathbf{x}_0\|\lt \delta_2\), and so \begin{eqnarray*} \|f (g(\mathbf{x})) & - & f(g(\mathbf{x}_0)) -\mathbf{D} f(\mathbf{y}_0) \cdot (g(\mathbf{x}) - g (\mathbf{x}_0))\| \\ & \leq & \frac{\varepsilon\|g(\mathbf{x}) - g(\mathbf{x}_0)\|}{2M_1} \lt \frac{\varepsilon\|\mathbf{x} - \mathbf{x}_0\|}{2} . \end{eqnarray*} Thus if \(\delta = \min (\delta_1, \delta_2, \delta_3 / M_1)\), expression (\ref{terms_inequality(4)}) is less than \[\frac{\varepsilon\|\mathbf{x} - \mathbf{x}_0\|}{2} + \frac{\varepsilon\|\mathbf{x} - \mathbf{x}_0\|}{2} = \varepsilon \|\mathbf{x} - \mathbf{x}_0\|, \] and so \[ \frac{\|f (g (\mathbf{x}) )- f (g (\mathbf{x}_0)) - \mathbf{D} f(\mathbf{y}_0) \mathbf{D} g (\mathbf{x}_0) (\mathbf{x} - \mathbf{x}_0 )\|}{\|\mathbf{x} - \mathbf{x}_0\|} \lt \varepsilon \] for \(0\lt \|\mathbf{x} - \mathbf{x}_0\| \lt \delta\). This proves the theorem. \( \Box\)

First Special Case of the Chain Rule

Suppose \({\bf c}\colon\, {\mathbb R}\rightarrow {\mathbb R}^3\) is a differentiable path and \(f\colon\, {\mathbb R}^3\rightarrow {\mathbb R}\). Let \(h(t)=f({\bf c}(t))=f(x(t),y(t),z(t)),\) where \({\bf c}(t)=(x(t),y(t),z(t))\). Then \begin{equation*} \frac{{\it dh}}{{\it dt}}= \frac{\partial f}{\partial x}\frac{{\it dx}}{{\it dt}} +\frac{\partial f}{\partial y}\frac{{\it dy}}{{\it dt}} +\frac{\partial f}{\partial z}\frac{{\it dz}}{{\it dt}}.\tag{2} \end{equation*}

That is, \[ \frac{{\it dh}}{{\it dt}}={\nabla} f({\bf c}(t))\,{ \cdot}\,{\bf c}'(t), \] where \({\bf c}'(t)=(x'(t),y'(t),z'(t))\).

That is, \[ \frac{{\it dh}}{{\it dt}}={\nabla} f({\bf c}(t))\,{ \cdot}\,{\bf c}'(t), \] where \({\bf c}'(t)=(x'(t),y'(t),z'(t))\).

This is the special case of Theorem 11 in which we take \({\bf c}=g\) and \(f\) to be real-valued, and \(m=3\). Notice that \[ \nabla f({\bf c}(t))\,{ \cdot}\,{\bf c}'(t)={\bf D} f({\bf c}(t)) {\bf Dc}(t), \] where the product on the left-hand side is the dot product of vectors, while the product on the right-hand side is matrix multiplication, and where we regard \({\bf D} f({\bf c}(t))\) as a row matrix and \({\bf Dc}(t)\) as a column matrix. The vectors \(\nabla f({\bf c}(t))\) and \({\bf c}'(t)\) have the same components as their matrix equivalents; the notational change indicates the switch from matrices to vectors.

proof of equation (2)

By definition, \[ \frac{dh}{dt}(t_0)=\mathop {{\rm limit}\ }_{t\rightarrow t_0}\frac{h(t)-h(t_0)}{t-t_0}. \]

Adding and subtracting two terms, we write \begin{eqnarray*} \frac{h(t)-h(t_0)}{t-t_0}&=&\frac{f(x(t),y(t),z(t))-f(x(t_0),y(t_0),z(t_0))}{t-t_0}\\[5pt] &=&\frac{f(x(t),y(t),z(t))-f(x(t_0),y(t),z(t))}{t-t_0}\\[5pt] & & +\frac{f(x(t_0),y(t),z(t))-f(x(t_0),y(t_0),z(t))}{t-t_0}\\[5pt] & &+\frac{f(x(t_0),y(t_0),z(t))-f(x(t_0),y(t_0),z(t_0))}{t-t_0}.\\[-15.8pt] \end{eqnarray*}

Now we invoke the mean-value theorem from one-variable calculus, which states: If \(g\colon\, [a,b]\rightarrow {\mathbb R}\) is continuous and is differentiable on the open interval \((a,b)\), then there is a point \(c\) in \((a,b)\) such that \(g(b)-g(a)=g'(c)(b-a)\). Applying this to \(f\) as a function of \(x\), we can assert that for some \(c\) between \(x\) and \(x_0\), \[ f(x,y,z)-f(x_0,y,z)=\bigg[\frac{\partial f}{\partial x}(c,y,z)\bigg] (x-x_0). \]

128

In this way, we find that \begin{eqnarray*} \frac{h(t)-h(t_0)}{t-t_0}&=&\bigg[\frac{\partial f}{\partial x}(c,y(t),z(t))\bigg] \frac{x(t)-x(t_0)}{t-t_0}+ \bigg[\frac{\partial f}{\partial y}(x(t_0),d,z(t))\bigg] \frac{y(t)-y(t_0)}{t-t_0} \\[6pt] && + \bigg[\frac{\partial f}{\partial z}(x(t_0),y(t_0),e)\bigg] \frac{z(t)-z(t_0)}{t-t_0}, \end{eqnarray*} where \(c,d\), and \(e\) lie between \(x(t)\) and \(x(t_0)\), between \(y(t)\) and \(y(t_0)\), and between \(z(t)\) and \(z(t_0)\), respectively. Taking the limit \(t\rightarrow t_0\), using the continuity of the partials \(\partial f/\partial x, \partial f/\partial y, \partial f/\partial z\), and the fact that \(c,d\), and \(e\) converge to \(x(t_0),y(t_0)\), and \(z(t_0)\), respectively, we obtain formula (2).

Question 13.117 Section 13.4 Progress Check Question 2

0DndgcNuStFPIOCOlV/dXeuiRF8yTozLMJ3ChSjn+E4H0T7Sg7G4sxVl+3xiuuwKnfroJ2FiBz3plRHCsRXXHcdLt5jYTNLre6lMH8FN5okMrOR15QunkH0+rGMYNoWTKPu5X/OaFWR36v2DNDtj71e5/0EWLfYV8uiGZh+IunAcmHl8JNkkCjgHuOSeVGUEmvCOPdCRWK/NOHs6QJHJ9S6CaM/5qfQOl3icc7Sp2z/YjYz7sKNQUETDbMWwBqWLpE6n4+QABT3YPNBwiGmDzdN1IQ8+z86QwvKbKFp0N6FocxJBpiAaHJmzJsVyd8+YEhYAIfYXXF9w5w/IrVV2hdbUXUtlW+3SsmAPh8w80MPDVya/q8yAEmR7PIbF2QpXYy89BsXYvP/VC901t4W04k6CUq8Z3ICzDXbOKxtzl724nt/txS/Z91oI5FD8eBUu6Oym39d+wj+6ZgTh7OX5rD7dl7PuLUsFJ7Lf7q4Pl12Ig47lDIm3XQbbZ1gkEr54RU7CEx2e4nrPB9+haR68KBoM2V2NvJbQoVx7kSbXwbylW5OwXl6UvHqgc8qTi4vjDWphIXSDnHJlWScJ/7jRFdjj5SNp13dsaau6+Z+2UV9xgzHAxywuDZ2vS+cT6KJGaqa1RsCcK5SJ0jYUFjDHnzQso32LSjaPUA/whvPHI+hrruIs6fs5NIdF+Nr03Kk2WiTO/aCESP+tO0c8BPa2gKqItGwjRAi3jlglXSWoIOqIORxHcEUahyZKqQZbo473rNEglQOd7QGaGEPt43d3stkeGLexstrzl21xjJhjSG3YALmZILECwCkSpm9QXPxY4KO2IjA2ujaWXkalrMSfr7F60zU=
3
Correct.
Incorrect.

Second Special Case of the Chain Rule

Let \(f\colon\, {\mathbb R}^3\rightarrow {\mathbb R}\) and let \(g\colon\, {\mathbb R}^3 \rightarrow {\mathbb R}^3\). Write \[ g(x,y,z)=(u(x,y,z) , v(x,y,z) , w(x,y,z)) \] and define \(h\colon\, {\mathbb R}^3\rightarrow {\mathbb R}\) by setting \[ h(x,y,z)=f(u(x,y,z), v(x,y,z), w(x,y,z)). \]

In this case, the chain rule states that \begin{equation*} \bigg[\begin{array}{ccc} \displaystyle \frac{\partial h}{\partial x} &\displaystyle \frac{\partial h}{\partial y} &\displaystyle \frac{\partial h}{\partial z} \end{array}\bigg] = \bigg[\begin{array}{ccc} \displaystyle \frac{\partial f}{\partial u} &\displaystyle \frac{\partial f}{\partial v} &\displaystyle \frac{\partial f}{\partial w} \end{array}\bigg] \left[ \begin{array}{ccc} \displaystyle \frac{\partial u}{\partial x}&\displaystyle \frac{\partial u}{\partial y}&\displaystyle \frac{\partial u}{\partial z}\\[10pt] \displaystyle \frac{\partial v}{\partial x} &\displaystyle \frac{\partial v}{\partial y}&\displaystyle \frac{\partial v}{\partial z}\\[10pt] \displaystyle \frac{\partial w}{\partial x}&\displaystyle \frac{\partial w}{\partial y}&\displaystyle \frac{\partial w}{\partial z} \end{array}\right]\!.\tag{3} \end{equation*}

In this special case, we have taken \(n=m=3\) and \(p=1\) for concreteness, and \(U={\mathbb R}^3\) and \(V={\mathbb R}^3\) for simplicity, and have written out the matrix product \([{\bf D}f({\bf y}_0)][{\bf D}g({\bf x}_0)]\) explicitly (with the arguments \({\bf x}_0\) and \({\bf y}_0\) suppressed in the matrices).

proof of the second special case of the chain rule

By definition, \(\partial h/ \partial x\) is obtained by differentiating \(h\) with respect to \(x\), holding \(y\) and \(z\) fixed. But then \((u(x,y,z),v(x,y,z),w(x,y,z))\) may be regarded as a vector function of the single variable \(x\). The first special case applies to this situation and, after the variables are renamed, gives \begin{equation*} \frac{\partial h}{\partial x}=\frac{\partial f}{\partial u}\frac{\partial u}{\partial x}+\frac{\partial f}{\partial v}\frac{\partial v}{\partial x}+\frac{\partial f}{\partial w}\frac{\partial w}{\partial x}.\tag{3'} \end{equation*}

Similarly, \begin{equation*} \frac{\partial h}{\partial y}=\frac{\partial f}{\partial u}\frac{\partial u}{\partial y}+\frac{\partial f}{\partial v}\frac{\partial v}{\partial y}+\frac{\partial f}{\partial w}\frac{\partial w}{\partial y}\tag{3''} \end{equation*} and \begin{equation*} \frac{\partial h}{\partial z}=\frac{\partial f}{\partial u}\frac{\partial u}{\partial z}+\frac{\partial f}{\partial v}\frac{\partial v}{\partial z}+\frac{\partial f}{\partial w}\frac{\partial w}{\partial z}.\tag{3'''} \end{equation*}

129

In actual calculations, at the last step, we usually express \(\frac{\partial f}{\partial u}\), \(\frac{\partial f}{\partial v}\), and \(\frac{\partial f}{\partial w}\) in terms of \(x,y,z\). These equations are exactly what would be obtained by multiplying out the matrices in equation (3).

proof of theorem 11

The general case in equation (1) may be proved in two steps. First, equation (2) is generalized to \(m\) variables; that is, for \(f(x_1,\ldots, x_m)\) and \({\bf c}(t)=(x_1(t),\ldots ,x_m(t))\), we have \[ \frac{{\it dh}}{{\it dt}}= \sum^m_{i=1}\frac{\partial f}{\partial x_i}\frac{dx_i}{{\it dt}}, \] where \(h(t)=f(x_1(t),\ldots ,x_m(t))\). Second, the result obtained in the first step is used to obtain the formula \[ \frac{\partial h_j}{\partial x_i}=\sum^m_{k=1}\frac{\partial f_j}{\partial y_k}\frac{\partial y_k}{\partial x_i}, \] where \(f=(f_1,\ldots, f_p)\) is a vector function of arguments \(y_1,\ldots,y_m; g(x_1,\ldots ,x_n) = (y_1(x_1,\) \(\ldots,\) \(x_n),\ldots ,y_m(x_1,\ldots, x_n));\) and \(h_j(x_1,\ldots ,x_n)=f_j(y_1(x_1,\ldots , x_n),\ldots , y_m (x_1,\ldots, x_n))\). (Using the letter \(y\) for both functions and arguments is an abuse of notation, but it can help us remember the formula.) This formula is equivalent to formula (1) after the matrices are multiplied out.

The pattern of the chain rule will become clear once you have worked some additional examples. For instance, \[ \frac{\partial}{\partial x}f(u(x,y),v(x,y),w(x,y),z(x,y))=\frac{\partial f}{\partial u}\frac{\partial u}{\partial x}+\frac{\partial f}{\partial v}\frac{\partial v}{\partial x}+\frac{\partial f}{\partial w}\frac{\partial w}{\partial x}+\frac{\partial f}{\partial z}\frac{\partial z}{\partial x}, \] with a similar formula for \(\partial f/\partial y\).

The chain rule can help us understand the relationship between the geometry of a mapping \(f\colon\, {\mathbb R}^2\rightarrow {\mathbb R}^2\) and the geometry of curves in \({\mathbb R}^2\). (Similar statements may be made about \({\mathbb R}^3\) or, generally, \({\mathbb R}^n\).) If \({\bf c}(t)\) is a path in the plane, then as we saw in Section 12.1, \({\bf c'}(t)\) represents the tangent (or velocity) vector of the path \({\bf c}(t)\), and this tangent (or velocity) vector is thought of as beginning at \({\bf c}(t)\). Now let \({\bf p}(t)=f({\bf c}(t))\), where \(f\colon\, {\mathbb R}^2\rightarrow {\mathbb R}^2\). The path \({\bf p}\) represents the image of the path \({\bf c}(t)\) under the mapping \(f\). The tangent vector to \({\bf p}\) is given by the chain rule:

In other words, the derivative matrix of f maps the tangent (or velocity) vector of a path \({\bf c}\) to the tangent (or velocity) vector of the corresponding image path \({\bf p}\) (see Figure 13.45). Thus, points are mapped by f, while tangent vectors to curves are mapped by the derivative of f, evaluated at the base point of the tangent vector in the domain.

130

Figure 13.45: Tangent vectors are mapped by the derivative matrix.

example 2

Verify the chain rule in the form of formula \((3')\) for \[ f(u,v,w)=u^2+v^2-w, \] where \[ u(x,y,z)=x^2y, v(x,y,z)=y^2, w(x,y,z) =e^{-xz}. \]

solution Here \begin{eqnarray*} h(x,y,z)&=&f(u(x,y,z), v(x,y,z),w(x,y,z))\\[3pt] &=&(x^2y)^2+y^4-e^{-xz}=x^4y^2+y^4-e^{-xz}. \end{eqnarray*}

Thus, differentiating directly, \[ \frac{\partial h}{\partial x}=4x^3y^2+ze^{-xz}. \]

On the other hand, using the chain rule, \begin{eqnarray*} \frac{\partial h}{\partial x}&=&\frac{\partial f}{\partial u}\frac{\partial u}{\partial x}+\frac{\partial f}{\partial v}\frac{\partial v}{\partial x}+\frac{\partial f}{\partial w}\frac{\partial w}{\partial x}=2u(2xy)+2v \,{\cdot}\, 0+(-1)(-ze^{-xz})\\[3pt] &=& (2x^2y)(2xy)+ze^{-xz}, \end{eqnarray*} which is the same as the preceding equation.

example 3

Given \(g(x,y)=(x^2+1,y^2)\) and \(f(u,v)= (u+v,u,v^2),\) compute the derivative of \(f\circ g\) at the point \((x,y)= (1,1)\) using the chain rule.

solution The matrices of partial derivatives are \[ {\bf D} f(u,v)=\left[ \begin{array}{c@{\quad}c} \\[-10.5pt] \displaystyle \frac{\partial f_1}{\partial u} &\displaystyle \frac{\partial f_1}{\partial v}\\[10pt] \displaystyle \frac{\partial f_2}{\partial u} &\displaystyle \frac{\partial f_2}{\partial v}\\[10pt] \displaystyle \frac{\partial f_3}{\partial u}&\displaystyle \frac{\partial f_3}{\partial v} \end{array} \right] =\left[ \begin{array}{c@{\quad}c} 1&1\\ 1&0\\ 0&2v \end{array} \right]\!\!\quad\hbox{and}\quad {\bf D}g(x,y)=\bigg[ \begin{array}{c@{\quad}c} 2x&0\\ 0&2y \end{array} \bigg]. \]

131

When \((x,y)=(1,1)\), note that \(g(x,y)=(u,v)=(2,1)\). Hence, \[ {\bf D} (f\circ g)(1,1)={\bf D} f(2,1){\bf D}g(1,1)=\left[ \begin{array}{c@{\quad}c} 1&1\\ 1&0\\ 0&2 \end{array} \right] \left[ \begin{array}{c@{\quad}c} 2&0\\ 0&2 \end{array} \right] =\left[ \begin{array}{c@{\quad}c} 2&2\\ 2&0\\ 0&4 \end{array} \right] \] is the required derivative.

example 4

Let \(f(x,y)\) be given and make the substitution \(x=r\cos \theta, y=r\sin \theta\) (polar coordinates). Write a formula for \(\partial f/\partial \theta\).

solution By the chain rule, \[ \frac{\partial f}{\partial \theta}=\frac{\partial f}{\partial x}\frac{\partial x}{\partial \theta}+\frac{\partial f}{\partial y}\frac{\partial y}{\partial \theta}, \] that is, \[ \frac{\partial f}{\partial \theta}=-r \sin \theta \frac{\partial f}{\partial x}+ r\cos \theta \frac{\partial f}{\partial y}. \]

example 5

Let \(f(x,y) = (\cos y +x^2,e^{x+y})\) and \(g(u,v)=(e^{u^2},u-\sin v)\). (a) Write a formula for \(f\circ g\). (b) Calculate \({\bf D} (f\circ g)(0,0)\) using the chain rule.

solution (a) We have \begin{eqnarray*} (f \circ g)(u,v)&=&f(e^{u^2},u-\sin v)\\[2pt] &=& \big(\cos\, (u-\sin v)+e^{2u^2},e^{e^{u^{2}}\,+\,u-\sin v} \big). \end{eqnarray*}

(b) By the chain rule, \[ {\bf D}(f\circ g)(0,0)=[{\bf D} f(g(0,0))][{\bf D}g(0,0)]=[{\bf D} f(1,0)][{\bf D}g(0,0)]. \]

Now \[ {\bf D}g(0,0)=\left[ \begin{array}{c@{\quad}c} 2ue^{u^2}&0\\ 1&{-}{\cos v} \end{array} \right]_{(u,v)=(0,0)}=\left[\begin{array}{c@{\quad}c} 0&0\\ 1&-1 \end{array} \right] \] and \[ {\bf D} f(1,0)=\left[ \begin{array}{c@{\quad}c} 2x&{-}{\sin y}\\ e^{x+y}&e^{x+y} \end{array} \right]_{(x,y)=(1,0)}=\left[\begin{array}{c@{\quad}c} 2&0\\ e&e \end{array} \right]\!. \] [Remember that \({\bf D} f\) is evaluated at \(g(0,0)\), not at \((0,0)\)!]. Thus, \[ {\bf D}(f\circ g)(0,0)=\left[ \begin{array}{c@{\quad}c} 2&0\\ e&e \end{array} \right]\left[ \begin{array}{c@{\quad}c} 0&0\\ 1&-1 \end{array} \right]=\left[ \begin{array}{c@{\quad}c} 0&0\\ e&-e \end{array} \right]\!. \]

Question 13.118 Section 13.4 Progress Check Question 3

Oyx+8hY+kWJOwEb/o0cTgSpZW9OB91REJAkdhEfAFlv/TUwj4Uz04k1dDBoEbAZCvpl2RHrUWibVLo3kodHurT8jPTMFq18umfUAjBexo3rQuyebcIO6R7cT4DS10XOVFN1IHgoDMIWCoZEuisRo8CUUSPSxBVxaZssYGu+uM5Bk/kk4764XcvlgNxt0Tb+Rig9jPA2HQVE9eiDpbFGPDhhT+B7/9QHUl7F7MmQbGNqntXYmd7qiF0WjvhXOkTq0/twdQU7OtMpNH+Mf5c3Xs9i0JC/TwZEiqigoiY9O/9VPGvrB1Gm0HJoKHBO8xNTIaRpgSx0Bso8jfSEqLtPmJl/IyZ5P0TU7FflhYwnCNb0lzVNl2avZ4abBPaoMI/ABsB+bbQ40wl6NNZRZPqN1QT8w0owmcg38jw2hJ9A6IDdT4hDPnmQG/akhZrDjLuTbcsUXVRQOEFOiNfk6BBgYMpXNTvLD2njQmJfKUXg3AL/cRLkSCV8KWr/CnHc50NrjTNkZ1gNSL9LR5Vv2ZQGMiEVc2R9Q7QegGb5fNwGBsbx5H+MGQ9JHzL5mcvSk8Jy3WzQP0BuG+s684g55n5wKH38zicj9xmpVsgwkbp7eyKnXPjHBpCPIp31QAi25JPafybizmENNypGuKnxlLXnOluGLWXGbXRbOm/2NNKuj3fD4R0MdVpk10rcwm37Wrlnv9PoqQHOO0AK4Lq/905pRrygrY6ceMsJwuCtP0X/UT9/ymtcSVXo8oJGfSD8QzAJU6xmgpT/efpspYa/wyN35/q53VvjC0H+IJoc5qB9D1zsC+YGtM/WruLEbfUNiblDLXrXggbyV5OGKECT13eaEOSbWKbs6ZyzmrrT2FwozxtXPhGLlfiq7PpL3eBPFGEGz606VSpYbGbtgXyfWVvhOjn+dygOA/aq3QkW24I0lR3XXWikmDHr/NdVci6tNrsqPCPNSF4Y/m+9a5yyEfT9QTYzR7Y+SyO/Oi9TAcpCph1N8Ez3SIPop2YuALRxQZcsWmn+8scO8QGcSuie2mX/fF++ekps=
3
Correct.
Try again. Remember to evaluate \({\bf D}f\) at \(g(1,1)\).
Incorrect.

example 6

Let \(f\colon\, U\subset {\mathbb R}^{n}\rightarrow {\mathbb R}^{m}\) be differentiable, with \(f=(f_1,\ldots ,f_m)\), and let \(g({\bf x})=\sin\,[f({\bf x})\,{ \cdot}\, f{{\bf (x)}}]\). Compute \({\bf D}g({\bf x})\).

132

solution By the chain rule, \({\bf D}g({\bf x})=\cos\, [f({\bf x})\,{ \cdot}\, f({\bf x})]{\bf D}h({\bf x})\), where \(h({\bf x})= [f({\bf x})\,{\cdot}\, f({\bf x})]=f^2_1({\bf x})+ \cdots +f^2_m({\bf x})\). Then \begin{eqnarray*} {\bf D}h(x)&=&\bigg[ \frac{\partial h}{\partial x_1} \quad\cdots\quad \frac{\partial h}{\partial x_n}\bigg]\\[4pt] &=&\bigg[2 f_1\frac{\partial f_1}{\partial x_1}+\cdots +2 f_m \frac{\partial f_m}{\partial x_1} \quad\cdots\quad 2 f_1\frac{\partial f_1}{\partial x_n}+ \cdots + 2 f_m \frac{\partial f_m}{\partial x_n}\bigg], \end{eqnarray*} which can be written \(2f({\bf x}){\bf D} f({\bf x})\), where we regard \(f\) as a row matrix, \[ f=[f_1\quad\cdots\quad f_m] \ \hbox{and} \ {\bf D} f=\left[ \begin{array}{c@{\quad}c@{\quad}c}\\[-10pt] \displaystyle \frac{\partial f_1}{\partial x_1}&\cdots & \displaystyle \frac{\partial f_1}{\partial x_n}\\ \vdots & &\vdots\\ \displaystyle \frac{\partial f_m}{\partial x_1} &\cdots & \displaystyle \frac{\partial f_m}{\partial x_n} \end{array} \right]\!. \]

Thus, \({\bf D}g({\bf x})=2[\cos\,(f({\bf x}) \,{ \cdot}\, f({\bf x}))]f({\bf x}){\bf D} f({\bf x})\).