manyspikes

Compound functions and their derivatives

Initialising environment...

In the previous section, we covered expressions for some common derivatives. For instance, we have seen that:

g(x)=ax    dgdx(x)=a\begin{equation} g(x) = ax \implies \frac{dg}{dx}(x) = a \end{equation}

and

h(x)=lnx    dhdx(x)=1x.\begin{equation} h(x) = \ln x \implies \frac{dh}{dx}(x) = \frac{1}{x}. \end{equation}

But what if we wanted to compute the derivative of f(x)=g(x)+h(x)f(x)=g(x)+h(x)? Or maybe we would like to compute f(x)=g(x)h(x)f(x)=g(x) \cdot h(x)? Fortunately, we can rely on a few rules that will help us deal with compositions of functions like the ones above.

Sum rule:ddx(g(x)+h(x))=dgdx(x)+dhdx(x)Product rule:ddx(g(x)h(x))=dgdx(x)h(x)+dhdx(x)g(x)Chain rule:ddxg(h(x))=dgdh(h(x))dhdx(x)\begin{align} \text{Sum rule:}& \qquad \frac{d}{dx}\left(g(x) + h(x)\right) = \frac{dg}{dx}(x) + \frac{dh}{dx}(x)\\[3 ex] \text{Product rule:}& \qquad \frac{d}{dx}\left(g(x) \cdot h(x)\right) = \frac{dg}{dx}(x)h(x) + \frac{dh}{dx}(x)g(x)\\[3 ex] \text{Chain rule:}& \qquad \frac{d}{dx}g(h(x)) = \frac{dg}{dh}(h(x)) \cdot \frac{dh}{dx}(x) \end{align}

Examples

Let's see how we could use the rules above to compute the derivative of f(x)=g(x)+h(x)f(x)=g(x) + h(x), with g(x)=axg(x)=ax and h(x)=lnxh(x)=\ln x. By the sum rule, we have:

dfdx=dgdx(x)+dhdx(x)=ddx(ax)+ddx(lnx)=a+1x\begin{align} \frac{df}{dx} & = \frac{dg}{dx}(x) + \frac{dh}{dx}(x) \\ & = \frac{d}{dx}(ax) + \frac{d}{dx}(\ln x) \\ & = a + \frac{1}{x} \end{align}

We can double-check this numerically by applying the definition of the derivative with increasingly small values of ϵ\epsilon. We should see that our approximations get closer and closer to the value we computed analytically:

As another example, let's compute the derivative of f(x)=g(x)h(x)f(x)=g(x) \cdot h(x), with g(x)=axg(x)=ax and h(x)=lnxh(x)=\ln x. By the applying the product rule, we get:

dfdx=dgdx(x)h(x)+dhdx(x)g(x)=ddx(ax)lnx+ddx(lnx)ax=alnx+axx=alnx+a\begin{align} \frac{df}{dx} & = \frac{dg}{dx}(x)h(x) + \frac{dh}{dx}(x)g(x) \\ & = \frac{d}{dx}(ax)\cdot\ln x + \frac{d}{dx}(\ln x) \cdot ax \\ & = a\ln x + \frac{ax}{x} \\ & = a\ln x + a \end{align}

Again, we can confirm this numerically:

As a final example, let's compute the derivative of f(x)=h(g(x))f(x)=h(g(x)), with g(x)=axg(x)=ax and h(x)=lnxh(x)=\ln x. By the applying the product rule, we get:

dfdx=dhdgg(x)dgdx(x)=1axddx(ax)=1axa=1x\begin{align} \frac{df}{dx} & = \frac{dh}{dg} g(x) \cdot \frac{dg}{dx}(x) \\ & = \frac{1}{ax} \cdot \frac{d}{dx}(ax) \\ & = \frac{1}{ax} \cdot a \\ & = \frac{1}{x} \end{align}

Again, we can confirm this numerically:

Let's pause to appreciate what we have learned. We know that we can find expressions of derivatives for large families of functions (linear function, power functions, logarithms, etc) and we also know how to deal with functions that are arbitrary compositions of these large families of functions. Furthermore, the rules are fairly straightforward. This means that the process of finding the derivative of fairly complex functions can be automated and solved by a computer! Indeed, this is a key part of how neural networks are optimised. However, there is a big difference between the functions we are working with here and the functions we deal with when optimising ML models, which is that the latter deal with functions of many variables, as opposed to just one variable. To address this, we will introduce the concept of partial derivatives in the next section.