If you're seeing this message, it means we're having trouble loading external resources on our website.

If you're behind a web filter, please make sure that the domains *.kastatic.org and *.kasandbox.org are unblocked.

Main content

Lagrange multipliers, using tangency to solve constrained optimization

The Lagrange multiplier technique is how we take advantage of the observation made in the last video, that the solution to a constrained optimization problem occurs when the contour lines of the function being maximized are tangent to the constraint curve. Created by Grant Sanderson.

Want to join the conversation?

  • blobby green style avatar for user parsanoori79
    Why didn't grant use use the cross product of two gradient vector in order two show that they are in the same direction ? He could easily say that the cross product of them must be zero .
    (11 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Atharva
    Why not just use the constraint to plug it in the function and then maximize the function?
    like use x^2 = 1 - y^2 and plug it in the upper function to get a function only in y and them maximize it. When does this work?
    (8 votes)
    Default Khan Academy avatar avatar for user
  • piceratops sapling style avatar for user Mahika Goel
    I don't get why the two curves have to be tangential to be maximized--doesn't that assume that as you move farther out, the f value increases? I get that in terms of this specific problem, but what about other functions that increase as you move in?
    (7 votes)
    Default Khan Academy avatar avatar for user
  • piceratops seed style avatar for user Václav Gregor
    What if there was a contour line of the function _f_ that would intersect the constraint curve, wouldn't be tangent to it and its value would be higher than the value of the contour lines of f tangent to the constraint curve. How would we optimize the function in such case ?
    (5 votes)
    Default Khan Academy avatar avatar for user
    • ohnoes default style avatar for user Charles Morelli
      If you think about this, what you are requiring is that the contour line is the top of a constant height ridge - for maximum points (or bottom of a constant depth valley - for minimum points). That is the only way a contour line could cross a constraint such as this circle and not have greater (for the ridge, or lesser for the valley) valued contours that would be tangential.
      Here the gradient of the function would be 0 which would be a special case of the more general idea. Then the Lagrange Multiplier, λ, would have to be 0... let's see what happens then - hopefully Grant will say in the next videos.
      (1 vote)
  • mr pants teal style avatar for user Tony
    If the gradient of G happens to be a square matrix, is it possible to multiply the right side of the gradient of F by the inverse of the matrix gradient G to find Lambda?

    F*G^-1 = Lambda ?
    (3 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Aaron Blanche
    so lambda is the eigenvalue of the matrix satisfying gradF = A*gradG?
    (3 votes)
    Default Khan Academy avatar avatar for user
  • blobby green style avatar for user Hardik Shah
    Why can't we work with tangents directly instead of gradients? More explicitly, I'm proposing to find out the tangent equation to both curves in question and solving for their equality. Would this approach be an issue with more variables?
    (3 votes)
    Default Khan Academy avatar avatar for user
  • aqualine ultimate style avatar for user Navya Mehta
    Is there a name given to the vector of first order partial derivatives shown at ? Why do we use such a vector? Thanks.
    (2 votes)
    Default Khan Academy avatar avatar for user
  • purple pi purple style avatar for user Joe
    Does this work for more than 2 variables?
    (3 votes)
    Default Khan Academy avatar avatar for user
  • old spice man blue style avatar for user mand4796
    So just as a refresher: a gradient is basically the derivative of a set of vectors, right?
    (2 votes)
    Default Khan Academy avatar avatar for user

Video transcript

- [Instructor] In the last video I introduced a constrained optimization problem where we were trying to maximize this function, f of x, y equals x squared times y, but subject to a constraint that your values of x and y have to satisfy x squared plus y squared equals one. And the way we were visualizing this was to look at the x, y plane where this circle here represents our constraint. All of the points that make up this set, x squared plus y squared equals one, and then this curvy line here is one of the contours of f, meaning, we're setting f of x, y equal to some constant. And then I was varying around that constant c. So for high values of c, the contour would look something like this, this is where the value of x squared times y is big. And then for small values of c, the contours would look like this. So all the points on this line would be f of x, y equals like 0.01 in this case, something like that. Then the way to think about maximizing this function is to try to increase that value of c as much as you can without it falling off the circle. And the key observation is that happens when they're tangent. So, you know, you might kind of draw this out in a little sketch and say there's some curve representing your constraint, which in this case would be, you know, where our circle is. And then the curve representing the contour would just kiss that curve, just barely touch it in some way. Now, that's pretty, but in terms of solving the problem, we still have some work to do. And the main tool we're gonna use here is the gradient. So let me go ahead and draw a lot more contour lines than there already are for x squared times y. So this is many of the contour lines, and I'll draw the gradient field, the gradient field of f. So I've made a video about the relationship between the gradient and contour lines. And the upshot of it is that these gradient vectors, every time they pass through a contour line, they're perpendicular to it. And the basic reason for that is if you walk along the contour line, the function isn't changing value, so if you want it to change most rapidly, you know, it kind of makes sense you should walk in the perpendicular direction, so that no component of the walk that you're taking is, you know, useless, is along the line where the function doesn't change. But again, there's a whole video on that that's worthy checking out if this feels unfamiliar. For our purposes, what it means is that when we're considering this point of tangency, the gradient of f at that point is gonna be some vector perpendicular to both the curves at that point. So that little vector represents the gradient of f at this point on the plane. And we can do something very similar to understand the other curve. Right now I've just written it as a constraint, x squared plus y squared equals one. But you know, to give that function a name, let's say we've defined g of x, y to be x squared plus y squared, x squared plus y squared. In that case, this constraint is pretty much just one of the contour lines for the function g, and we can take a look at that. If we go over here and we look at all of the other contour lines for this function g, and it should make sense that they're circles, because this function is x squared plus y squared. And if we took a look at the gradient of g, and we go over and ask about the gradient of g, it has that same property, that every gradient vector, if it passes through a contour line, is perpendicular to it. So over on our drawing here, the gradient vector of g would also be perpendicular to both these curves. And you know, maybe in this case, it's not as long as the gradient of f, or maybe it's longer. There's no reason that it would be the same length, but the important fact is that it's proportional. And the way that we're gonna write this in formulas is to say that the gradient of f evaluated, let's see, evaluated at whatever the maximizing value of x and y are, so we should give that a name probably. Maybe x sub m, y sub m, the specific values of x and y that are gonna be at this point that maximizes the function subject to our constraint. So that's gonna be related to the gradient of g, it's not gonna be quite equal, so I'll leave some room here. Related to the gradient of g, evaluated at that same point, xm, ym. And like I said, they're not equal, they're proportional. So we need to have some kind of proportionality constant in there. You almost always use the variable lambda, and this guy has a fancy name, it's called a Lagrange multiplier. Lagrange, Lagrange was one of those famous French mathematicians. I always get him confused with some of the other French mathematicians at the time like Legendre or Laplace, there's a whole bunch of things. Let's see, multiplier, distracting myself talking here. So Lagrange multiplier. So there's a number of things in multivariable calculus named after Lagrange, and this is one of the big ones. This is a technique that he kind of developed or at the very least popularized. And the core idea is to just set these gradients equal to each other, 'cause that represents when the contour line for one function is tangent to the contour line of another. So this, this is something that we can actually work with. And let's start working it out, right, let's see what this translates to in formulas. So I already have g written here, so let's go ahead and just evaluate what the gradient of g should be. And that's the gradient of x squared plus y squared. And the way that we take our gradient is it's gonna be a vector whose components are all partial derivatives. So the first component is the partial derivative with respect to x. So we treat x as a variable, y looks like a constant. The derivative is two x. The second component the partial derivative with respect to y, so now we're treating y as the variable, x is the constant, so the derivative looks like two y. So that's the gradient of g. Then the gradient of f, gradient of f. It's gonna look like gradient of, let's see, what is x? What is f? It's x squared times y. So x squared times y. We do the same thing. First component partial derivative with respect to x, x looks like a variable, so it's derivative is two times x and then that y looks like a constant when we're up here. But then partial derivative with respect to y, that y looks like a variable, that x squared just looks like a constant sitting in front of it. So that's what we get. And now if we kind of work out this Lagrange mulitplier expression using these two vectors, what we have written, what we're gonna have written is that this vector, two xy x squared is proportional with the proportionality constant lambda to the gradient vector for g. So two x two y. And if you want you can think about this as two separate equations. I mean right now it's one equation with vectors, but really what this is saying is you've got two separate equations, two times xy is equal to lambda, ah, gotta change colors a lot here. Lambda times two x. Hm, gonna be stickler for color. Keep red all of the things associated with g. And then, this second equation is that x squared is equal to lambda times two y. And this might seem like a problem, because we have three unknowns, x, y, and this new lambda that we introduced. Kind of shot ourselves in the foot by giving ourselves a new variable to deal with. But we only have two equations. So in order to solve this, we're gonna need three equations. And the third equation is something that we've known the whole time. It's been part of the original problem. It's the constraint itself, x squared plus y squared equals one. So that, that third equation, x squared plus y squared is equal to one. So these are the three equations that characterize our constrained optimization problem. The bottom one just tells you that we have to be on this unit circle here. Allow me to just highlight it. We have to be on this unit circle. And then these top two tell us what's necessary in order for our contour lines, the contour of f and the contour of g to be perfectly tangent with each other. So, in the next video, I'll go ahead and solve this. At this point, it's pretty much just algebra to deal with, but it's worthy going through. And then in the next couple ones, I'll talk about a way that you can encapsulate all three of these equations into one expression, and also a little bit about the interpretation of this lambda that we introduced. 'Cause it's not actually just a dummy variable, it has a pretty interesting meaning in physical contexts once you're actually dealing with a constrained optimization problem in practice. So I'll see you in the next video.