In the previous section, we covered expressions for some common derivatives. For instance, we have seen that:
and
But what if we wanted to compute the derivative of ? Or maybe we would like to compute ? Fortunately, we can rely on a few rules that will help us deal with compositions of functions like the ones above.
Let's see how we could use the rules above to compute the derivative of , with and . By the sum rule, we have:
We can double-check this numerically by applying the definition of the derivative with increasingly small values of . We should see that our approximations get closer and closer to the value we computed analytically:
As another example, let's compute the derivative of , with and . By the applying the product rule, we get:
Again, we can confirm this numerically:
As a final example, let's compute the derivative of , with and . By the applying the product rule, we get:
Again, we can confirm this numerically:
Let's pause to appreciate what we have learned. We know that we can find expressions of derivatives for large families of functions (linear function, power functions, logarithms, etc) and we also know how to deal with functions that are arbitrary compositions of these large families of functions. Furthermore, the rules are fairly straightforward. This means that the process of finding the derivative of fairly complex functions can be automated and solved by a computer! Indeed, this is a key part of how neural networks are optimised. However, there is a big difference between the functions we are working with here and the functions we deal with when optimising ML models, which is that the latter deal with functions of many variables, as opposed to just one variable. To address this, we will introduce the concept of partial derivatives in the next section.