Also, in example one, when they show what the plane looks like using vector notation, shouldn't the second vector components be x-8 not x, and y-4, not y and etc?

I think what it should say is to dot the gradient of f (evaluated at (8, 4, 3)) with the vector: ((x, y, z) - (8, 4, 3)) i.e.: ∇f(*x_0*) . (*x* - *x_0*) to fully replicate the generic formula, so in short: yes. Well noticed!

At "Example 1", given that f(x, y, z) = e^(x^2-y^3), shouldn't f(4, 8, 3) be 3*e^(4^2-8^3) = 3*e^(-496)≃0 ?

You have the function the incorrect way around I believe. it is f(8,4,3). NOT f(4,8,3)

Why does local linearization sound similar to taylor series?!

Because it is indeed a (first-order) Taylor expansion for the original function! It's just that in this case you're trying to approximate a multivariable function, and so you're using partial derivatives instead of regular derivatives.

Is "the local linearization" the same as "Linear approximation"(https://en.wikipedia.org/wiki/Linear_approximation) ?

These are the same. To clarify the equality, compare this article to the formula that follows after quote: "Linear approximations for vector functions of a vector variable are obtained in the same way, with the derivative at a point replaced by the Jacobian matrix." (a, b) at Wikipedia is (x0, y0) here.

Main content

Course: Multivariable calculus > Unit 3

Lesson 1: Tangent planes and local linearization

Local linearization

Google Classroom

Learn how to generalize the idea of a tangent plane into a linear approximation of scalar-valued multivariable functions.

Background

The gradient

What we're building to

Local linearization generalizes the idea of tangent planes to any multivariable function. Here, I will just talk about the case of scalar-valued multivariable functions.
The idea is to approximate a function near one of its inputs with a simpler function that has the same value at that input, as well as the same partial derivative values.
Written with vectors, here's what the approximation function looks like:
$L_{f} (x) = \underset{Constant}{\underset{⏟}{f (x_{0})}} + \underset{Constant vector}{\underset{⏟}{\nabla f (x_{0})}} \cdot \overset{x is the variable}{\overset{⏞}{(x - x_{0})}}$ ‍
This is called the local linearization of $f$ ‍ near $x_{0}$ ‍.

Tangent planes as approximations

In the previous article, I talked about finding the tangent plane to a two-variable function's graph.

The formula for the tangent plane ended up looking like this.

\begin{array}{r} T (x, y) = f (x_{0}, y_{0}) + f_{x} (x_{0}, y_{0}) (x - x_{0}) + f_{y} (x_{0}, y_{0}) (y - y_{0}) \end{array}

This function

T (x, y)

often goes by a different name: The "local linearization" of

f

at the point

(x_{0}, y_{0})

. You can think about this as the simplest function satisfying two properties:

It has the same value of $f$ ‍ at the point $(x_{0}, y_{0})$ ‍.
It has the same partial derivatives as $f$ ‍ at the point $(x_{0}, y_{0})$ ‍.

As always in multivariable calculus, it is healthy to contemplate a new concept without relying on graphical intuition. That's not to say you should not try to think visually. Maybe instead think purely about the input space, or think relevant transformation rather than the graph.

Fundamentally, a local linearization approximates one function near a point based on the information you can get from its derivative(s) at that point.

In the case of functions with a two-variable input and a scalar (i.e. non-vector) output, this can be visualized as a tangent plane. However, with higher dimensions we don't have this visual luxury, so we are left to think about it just as an approximation.

In real-world applications of multivariable calculus, you almost never care about an actual plane in space. Instead, you might have some complicated function, like, oh, I don't know, air resistance on a parachute as a function of speed and orientation. Dealing with the actual function may be tricky or computationally expensive, so it's helpful to approximate it with something simpler, like a linear function.

What do I mean by "Linear function"?

Consider a function with a multidimensional input.

$f (x_{1}, x_{2}, \dots, x_{n})$ ‍

This function is called linear if in its definition, all the coordinates are just multiplied by constants, with nothing else happening to them. For example, it might look like this:

$f (x_{1}, x_{2}, \dots, x_{n}) = 2 x_{1} + 3 x_{2} + \dots - 5 x_{n}$ ‍

The full story of linearity goes deeper (hence the existence of the field "Linear algebra"), but for now, this conception will do. Typically, instead of writing out all the variable like this, you would treat the input as a vector:

$x = [\begin{matrix} x_{1} \\ x_{2} \\ ⋮ \\ x_{n} \end{matrix}]$ ‍

And you would define the function using a dot product:

$f (x) = [\begin{matrix} 2 \\ 3 \\ ⋮ \\ - 5 \end{matrix}] \cdot x$ ‍

For the purposes of this article, and more generally when you talk about local linearization, you are allowed to add in a constant to this expression:

$f (x) = \underset{Some constant}{\underset{⏟}{c}} + \overset{Some vector}{\overset{⏞}{v}} \cdot x$ ‍

If you wanted to be pedantic, this is no longer a linear function. It's what's called an "affine" function. But most people would say "whatever, it's basically linear".

Local linearization

Now, suppose your function

f (x)

does not have the luxury of being linear. (The bolded "

x

" still represents a multidimensional vector). It might be defined by some crazy expression way more wild than a dot product.

The idea of a local linearization is to approximate this function near some particular input value,

x_{0}

, with a function that is linear. Specifically, here's what that new function looks like:

$L_{f} (x) = \underset{Constant}{\underset{⏟}{f (x_{0})}} + \underset{Constant vector}{\underset{⏟}{\nabla f (x_{0})}} \cdot \overset{x is the variable}{\overset{⏞}{(x - x_{0})}}$ ‍

Notice, by plugging in $x = x_{0}$ ‍, you can see that both functions $f$ ‍ and $L_{f}$ ‍ will have the same value at the input $x_{0}$ ‍.
The vector dotted against the variable $x$ ‍ is the gradient of $f$ ‍ at the specified input, $\nabla f (x_{0})$ ‍. This ensures that both functions $f$ ‍ and $L_{f}$ ‍ will have the same gradient at the specified input. In other words, all their partial derivative information will be the same.

I think the best way to understand this formula is to basically derive it for yourself in the context of a specific function.

Example 1: Finding a local linearization.

Problem: Have yourself a function:

$f (x, y, z) = z e^{x^{2} - y^{3}}$ ‍

Find a linear function

L_{f} (x, y, z)

such that the value of

L_{f}

and all its partial derivatives match those of

f

at the following point:

$(x_{0}, y_{0}, z_{0}) = (8, 4, 3)$ ‍

Step 1: Evaluate

f

at the chosen point

$f (8, 4, 3) =$ ‍

Step 2: Use this to start writing your function. Which of the following functions will be guaranteed to equal

f

at the input

(x, y, z) = (8, 4, 3)

Choose 1 answer:
(Choice A)
$L_{f} (x, y, z) = 3 + 8 a x + 4 b y + 3 c z$ ‍
(Choice B)
$L_{f} (x, y, z) = 3 + a (x - 8) + b (y - 4) + c (z - 3)$ ‍
For both of these, $a$ ‍, $b$ ‍ and $c$ ‍ are all arbitrary constants.

The partial derivatives of

L_{f}

, as you have written it so far, are precisely these constants

a

b

and

c

. So to force our function to have the same partial derivative information as

f

at the point

(8, 4, 3)

, we just need to set these constants equal to the corresponding partial derivatives of

f

at this point.

Step 3: Compute each partial derivative of

f (x, y, z) = z e^{x^{2} - y^{3}}

$f_{x} (x, y, z) =$ ‍
$f_{y} (x, y, z) =$ ‍
$f_{z} (x, y, z) =$ ‍

Now we evaluate each of these at

(8, 4, 3)

$f_{x} (8, 4, 3) =$ ‍
$f_{y} (8, 4, 3) =$ ‍
$f_{z} (8, 4, 3) =$ ‍

Step 4: Replacing the constants

a

b

and

c

in the expression of

L_{f}

with these partial derivative values, what do you get?

$L_{f} (x, y, z) =$ ‍

Now notice what this looks like if you write it with vector notation.

It is just a specific form of the general formula shown above.

$L_{f} (x) = \underset{Constant}{\underset{⏟}{f (x_{0})}} + \underset{Constant vector}{\underset{⏟}{\nabla f (x_{0})}} \cdot \overset{x is the variable}{\overset{⏞}{(x - x_{0})}}$ ‍

Example 2: Using local linearization for estimation

What follows is by no means a practical application, but working through it will help give a feel for what local linearization is doing.

Problem: Suppose you are on a desert island without a calculator, and you need to estimate

\sqrt{2.01 + \sqrt{0.99 + \sqrt{9.01}}}

. How would you do it?

Solution:

We can view this problem as evaluating a certain three-variable function at the point

(2.01, 0.99, 9.01)

, namely

f (x, y, z) = \sqrt{x + \sqrt{y + \sqrt{z}}}

I don't know about you, but I'm not sure how to evaluate square roots by hand. If only this function was linear! Then working it out by hand would only involve adding and multiplying numbers. What we could do is find the local linearization at a nearby point where evaluating

f

is easier. Then we can get close to the right answer by evaluating the linearization at the point

(2.01, 0.99, 9.01)

The point we care about is very close to the much simpler point

(2, 1, 9)

, so we find the local linearization of

f

near that point. As before, we must find

$f (2, 1, 9)$ ‍
All partial derivatives of $f$ ‍ at $(2, 1, 9)$ ‍

The first of these is

\begin{aligned} f (2, 1, 9) & = \sqrt{2 + \sqrt{1 + \sqrt{9}}} \\ = \sqrt{2 + \sqrt{1 + 3}} \\ = \sqrt{2 + \sqrt{4}} \\ = \sqrt{2 + 2} \\ = \sqrt{4} \\ = 2 \end{aligned}

Looks like someone chose a few convenient input values, eh?

On to the partial derivatives (heavy sigh). Since the square roots are abundant, let's write out for ourselves the derivative of

\sqrt{x}

\begin{aligned} \frac{d}{d x} \sqrt{x} & = \frac{d}{d x} x^{\frac{1}{2}} = \frac{1}{2} x^{- \frac{1}{2}} = \frac{1}{2 \sqrt{x}} \end{aligned}

Okay, here we go. The simplest partial derivative is

f_{x}

\begin{aligned} f_{x} & = \frac{\partial}{\partial x} \sqrt{x + \sqrt{y + \sqrt{z}}} = \frac{1}{2 \sqrt{x + \sqrt{y + \sqrt{z}}}} \end{aligned}

Since

y

is nestled in there,

f_{y}

requires some chain rule action:

\begin{aligned} f_{y} & = \frac{\partial}{\partial y} \sqrt{x + \sqrt{y + \sqrt{z}}} = \frac{1}{2 \sqrt{x + \sqrt{y + \sqrt{z}}}} \cdot \frac{1}{2 \sqrt{y + \sqrt{z}}} \end{aligned}

Nestled even deeper, that tricky

z

will require two iterations of the chain rule:

\begin{aligned} f_{z} & = \frac{\partial}{\partial z} \sqrt{x + \sqrt{y + \sqrt{z}}} = \frac{1}{2 \sqrt{x + \sqrt{y + \sqrt{z}}}} \cdot \frac{1}{2 \sqrt{y + \sqrt{z}}} \cdot \frac{1}{2 \sqrt{z}} \end{aligned}

Next, evaluate each one of these at

(2, 1, 9)

. This might seem like a lot, but they are all made up of the same three basic components:

\begin{aligned} \frac{1}{2 \sqrt{x + \sqrt{y + \sqrt{z}}}} & = \frac{1}{2 \sqrt{2 + \sqrt{1 + \sqrt{9}}}} = \frac{1}{2 \sqrt{2 + 2}} = \frac{1}{4} \\ \frac{1}{2 \sqrt{y + \sqrt{z}}} & = \frac{1}{2 \sqrt{1 + \sqrt{9}}} = \frac{1}{2 \sqrt{4}} = \frac{1}{4} \\ \frac{1}{2 \sqrt{z}} & = \frac{1}{2 \sqrt{9}} = \frac{1}{6} \end{aligned}

Plugging these values into our expressions for the partial derivatives, we have

\begin{aligned} f_{x} (2, 1, 9) & = \frac{1}{4} \\ f_{y} (2, 1, 9) & = \frac{1}{4} \cdot \frac{1}{4} = \frac{1}{16} \\ f_{z} (2, 1, 9) & = \frac{1}{4} \cdot \frac{1}{4} \cdot \frac{1}{6} = \frac{1}{96} \end{aligned}

Unraveling the formula for local linearization, we get

\begin{aligned} L_{f} (x) & = f (x_{0}) + \nabla f (x_{0}) \cdot (x - x_{0}) \\ = f (x_{0}) + f_{x} (x_{0}) (x - x_{0}) + f_{y} (x_{0}) (y - y_{0}) + f_{z} (x_{0}) (z - z_{0}) \\ = 2 + \frac{1}{4} (x - 2) + \frac{1}{16} (y - 1) + \frac{1}{96} (z - 9) \end{aligned}

Finally, after all this work, we can plug in

(x, y, z) = (2.01, 0.99, 9.01)

to compute our approximation

\begin{aligned} 2 + \frac{1}{4} (2.01 - 2) + \frac{1}{16} (0.99 - 1) + \frac{1}{96} (9.01 - 9) \\ = 2 + \frac{0.01}{4} + \frac{- 0.01}{16} + \frac{0.01}{96} \end{aligned}

Calculating this by hand still isn't easy, but at least it's doable. When you work it out, the final answer is

2.001979

Had we just used a calculator, the answer is

\sqrt{2.01 + \sqrt{0.99 + \sqrt{9.01}}} \approx 2.001978

So our approximation is pretty good!

Why do we care?

Although it is not common to find yourself estimating square roots on a desert island (at least where I'm from), what is common in the contexts of math and engineering is wrangling with complicated but differentiable functions. The phrase "just linearize it" is tossed around so much that not knowing what it means could be awkward.

Remember, a local linearization approximates one function near a point based on the information you can get from its derivative(s) at that point. Even though you can use a computer to evaluate functions, that's not always enough.

You might need to evaluate it many thousands of times per second, and working it out in full takes too long.
Maybe you don't even have the function explicitly written out, and you just have a few measurements near a point which you wish to extrapolate.
Sometimes what you care about is the inverse function, which can be hard or even impossible to find for the function as a whole, whereas inverting linear functions is relatively straight-forward.

Summary

Local linearization generalizes the idea of tangent planes to any multivariable function.
The idea is to approximate a function near one of its inputs with a simpler function that has the same value at that input, as well as the same partial derivative values.
Written with vectors, here's what the approximation function looks like:
$L_{f} (x) = \underset{Constant}{\underset{⏟}{f (x_{0})}} + \underset{Constant vector}{\underset{⏟}{\nabla f (x_{0})}} \cdot \overset{x is the variable}{\overset{⏞}{(x - x_{0})}}$ ‍
This is called the local linearization of $f$ ‍ near $x_{0}$ ‍.

Want to join the conversation?

Sort by:

Sean
Posted 8 years ago. Direct link to Sean's post “Also, in example one, whe...”
Also, in example one, when they show what the plane looks like using vector notation, shouldn't the second vector components be x-8 not x, and y-4, not y and etc?
Button navigates to signup pageButton navigates to signup page
(19 votes)
Answer
- Charles Morelli
  Posted 6 months ago. Direct link to Charles Morelli's post “I think what it should sa...”
  I think what it should say is to dot the gradient of f (evaluated at (8, 4, 3)) with the vector:
  ((x, y, z) - (8, 4, 3))
  i.e.:
  ∇f(x_0) . (x - x_0)
  to fully replicate the generic formula, so in short: yes.
  Well noticed!
  Button navigates to signup page
  (3 votes)
Geggles
Posted 8 years ago. Direct link to Geggles's post “At "Example 1", given tha...”
At "Example 1", given that f(x, y, z) = e^(x^2-y^3), shouldn't f(4, 8, 3) be 3*e^(4^2-8^3) = 3*e^(-496)≃0 ?
Button navigates to signup pageComment on Geggles's post “At "Example 1", given tha...”
(12 votes)
Answer
- Sean
  Posted 8 years ago. Direct link to Sean's post “You have the function the...”
  You have the function the incorrect way around I believe. it is f(8,4,3). NOT f(4,8,3)
  Button navigates to signup page
  (4 votes)
Eeltaay Ghojoghi
Posted 6 years ago. Direct link to Eeltaay Ghojoghi's post “Why does local linearizat...”
Why does local linearization sound similar to taylor series?!
Button navigates to signup pageButton navigates to signup page
(7 votes)
Answer
- Christopher
  Posted 4 years ago. Direct link to Christopher's post “Because it is indeed a (f...”
  Because it is indeed a (first-order) Taylor expansion for the original function! It's just that in this case you're trying to approximate a multivariable function, and so you're using partial derivatives instead of regular derivatives.
  Button navigates to signup page
  (8 votes)
onionmarktwo
Posted 7 years ago. Direct link to onionmarktwo's post “Is "the local linearizati...”
Is "the local linearization" the same as "Linear approximation"(https://en.wikipedia.org/wiki/Linear_approximation) ?
Button navigates to signup pageButton navigates to signup page
(4 votes)
Answer
- Basileos
  Posted 5 years ago. Direct link to Basileos's post “These are the same. To cl...”
  These are the same. To clarify the equality, compare this article to the formula that follows after quote: "Linear approximations for vector functions of a vector variable are obtained in the same way, with the derivative at a point replaced by the Jacobian matrix."
  
  (a, b) at Wikipedia is (x0, y0) here.
  Button navigates to signup page
  (5 votes)
Michelle Zhuang
Posted 5 years ago. Direct link to Michelle Zhuang's post “Under Example 1, after th...”
Under Example 1, after the line "Now notice what this looks like if you write it with vector notation," under the explanation, the last matrix should have the terms "x-8, y-4, and z-3.
Button navigates to signup pageButton navigates to signup page
(5 votes)
Answer
Brandon Van Over
Posted 8 years ago. Direct link to Brandon Van Over's post “Can this be edited to inc...”
Can this be edited to include an example involving the linear localization of a vector valued function? Maybe something simple like from $\mathbb{R}^2 \to \mathbb{R}^3$?
Button navigates to signup pageButton navigates to signup page
(3 votes)
Answer
Evgenii Neumerzhitckii
Posted 7 years ago. Direct link to Evgenii Neumerzhitckii's post “Does it have anything to ...”
Does it have anything to do with approximating a single variable function with Taylor series, when we have just two first terms?
Button navigates to signup pageButton navigates to signup page
(3 votes)
Answer
- Caleb Clark
  Posted 2 years ago. Direct link to Caleb Clark's post “Yep! This is essentially ...”
  Yep! This is essentially the first order Taylor polynomial, but instead of dealing with the single variable case we are dealing with many.
  Button navigates to signup page
  (1 vote)
sauj123
Posted 8 years ago. Direct link to sauj123's post “"...we consider all input...”
"...we consider all inputs to be part of a vector x..."

Should vector x have components x.y and so on instead of x0, y0 and so on?
Button navigates to signup pageButton navigates to signup page
(2 votes)
Answer
Zechariah Rosenthal
Posted 8 years ago. Direct link to Zechariah Rosenthal's post “Coming from Geggles' comm...”
Coming from Geggles' comment below, If "Example 1" is changed to have the tangent point be (8,4,3) it becomes a much nicer problem resulting in
Lf(x,y,z) = 3 + 48(x-8) - 144(y-4) + (z-3)
Button navigates to signup pageButton navigates to signup page
(2 votes)
Answer
gschex1112
Posted 7 years ago. Direct link to gschex1112's post “Why doesn't the partial w...”
Why doesn't the partial with respect to z in the last example (desert island) have a coefficient of 1/8? Doesn't 1/2*1/2*1/2 = 1/8, not 1/6? Did I do that wrong?
Button navigates to signup pageComment on gschex1112's post “Why doesn't the partial w...”
(1 vote)
Answer