So here’s how I introduced the Chain Rule to my AP Calculus class.

Let u = g(x) = 3x – 6.

1.  What is the value of g(x) at x = 5?

2. What is the rate at which g(x) changes when x = 5?

Now let y = f(u) = u^2.

3.  What is the value of f(u) at u = 9?

4. What is the rate at which f(u) changes when u = 9?

Okay — so far, this is all pretty routine.  Here comes the interesting part: what happens when we compose y = f(u) with u = g(x)?  We get y = f(u) = f(g(x)) = (3x-6)^2.

5.  What is the value of f(g(x)) at x = 5?

6. What is the rate at which f(g(x)) changes when x = 5?

Let’s stop and think about what is going on here.

1. At x = 5: Rate at which u is changing WRT x: 3 u’s per x.

2. At u = 9: Rate at which y is changing WRT u: 18 y’s per u.

Since u(5) = 9, it stands to reason that:

3. At x = 5: Rate at which y is changing WRT x: 18(3) = 54 y’s per x.

Summary 1: dy/dx = 54, which is 18(3), which is 2(9)(3), which is 2*u(5)*3, which is 2*u(5)*u'(5), which is f'(u(5)) * u'(5).

Generalization 1: dy/dx = dy/du * du/dx.

Generalization 2: (f-of-g)'(a) = f'(g(a)) * g'(a).

Got any feedback for me?  Light up the comments section.