Question 1

What is the point or the purpose of squaring the error line? Why not cubed, square root or even dot or cross product?  I do not mean in a mathematical sense, but in a practical sense.  What information does the square of the error line give us?

Accepted Answer

There are a couple reasons to square the errors. Squaring the value turns everything positive, effectively putting negative and positive errors on equal footing. In other words, it treats any deviation away from the line of the same absolute size (in the positive or negative direction) as the same.

You can achieve the same result (of turning negative numbers into positive ones) by taking the absolute value of the number or raising the values by any positive exponent (like 4, 6, etc.). So, why was squaring the value chosen over taking the absolute value? The most simplistic answer is that dealing with exponents is mathematically and computationally easier than dealing with absolute values - this was particularly true back in the day when people did this work solely by hand. Because of the power of computers now days, that computational "problem" is much less of a problem and some people argue for (and use) the sum of absolute errors (instead of sum of squared errors) instead; however, those people are the minority (I will warn that the general expectation is using the sum of squared errors as the measure... people have seen it, they understand it, they know the various tests and statistics around it. So if a person wanted to use absolute errors instead, they would have to possibly derive and educate their audience).

You could also argue that using the square error instead of the absolute error allows you to place a greater emphasis on values that are relatively further away from the line. In other words, you are punished more for producing a line that is relatively farther away from points because those errors are squared. A potential problem, however, is that outliers can more easily skew the regression line using this methodology. And, that is most likely why you use the smallest multiple of 2 as your exponent instead of something like the "sum of errors raised to the 4th power" or something of that nature, because doing so would highlight the outliers (or near outliers) even more.

Question 2

Can someone explain to me how he got y^2-2y(mx+b)+(mx+b)^2 at 1:13?

Accepted Answer

See 'Square a Binomial' in Algebra.

https://www.khanacademy.org/math/algebra/polynomials/multiplying_polynomials/v/square-a-binomial

Question 3

I don't understand, why y1-(mx1+b) ?
It shouldn't be (mx1+b)-y1?

Accepted Answer

It can be! That is the advantage of using squared error instead of just simply 'linear error'.

Notice that some points end up above the line (where y1-(mx1+b)) and some below (where (mx1+b) - y1). To resolve this problem, statisticians have used a system to square the values, so that all values are positive.

Overall, you can use either version, they both work.

Question 4

Okay, so squaring is done in order to have positive values, but what's the problem actually in having both positive and negative errors? I mean, if we need a line which fits the data, the one which has a 0 or close to 0 error is right between the data set right? 
The only case where this method doesn't work seems me to be the one of aligned data points, but for other cases the "true" error seems not that bad to me.

Accepted Answer

Sum of errors from the mean without squaring is always zero. Check it for yourself. If you have excel just put some numbers in a column of cells, calculate mean and in column next to this one subtract value from the mean. Then add those errors.

Question 5

Why are all the terms (y1, y2, yn...etc) being added? How will that help us find the minimized squared error to the line?

Accepted Answer

Eventually we will find the derivative of the whole thing (find the function that finds the slope if you aren't familiar with calculus) and set that to zero, allowing us to solve the constants for the minimum possible error.

Question 6

What video should I go to when I don't understand why there he starts putting 2's in front of things and having extra brackets worth of stuff...he calls it algebraic equations...it's sooo much fun doing inferential statistics with only a grade 6 education.

Accepted Answer

I assume you mean what he's talking about what he's writing at 0:52 ? That's algebra (probably 2-3 years beyond your level). He's expanding the quadratic (the thing in parentheses that is squared). I'm not sure where that is explained on KhanAcademy, but: (a+b)^2 = a^2  + 2ab + b^2 . Then we could compare what is "a" from what Sal had wrote, and what is "b".

However, if your math is at 6th grade, then you should probably skip any of the videos that say "Proof." Generally the proofs in Statistics will be using math that's 5 or more years beyond that level. Once you learn Calculus (mainly, finding a minimum or maximum via derivatives), I imagine the proof will make perfect sense.

At your level, I would assume that the focus would be on applying Statistical methods (e.g. estimate the mean, compute a confidence interval, etc) instead of deriving anything.

If you're just doing the Stats of KhanAcademy on your own, then if you want to understand the proofs better, I'd suggest going over to the Calculus and Algebra sections, as Statistics makes heavy use of the both of them (Calculus mainly for the proofs).

Question 7

Anybody know how he squares the answer so fast at 0:52. What method did he use? Is there a video on it on Khan Academy?

Accepted Answer

@ https://www.khanacademy.org/math/algebra2/x2ec2f6f830c9fb89:poly-arithmetic/x2ec2f6f830c9fb89:special-products/v/poly-perfect-square

Question 8

Why would I need to do this? (Real life example)

Accepted Answer

Regression is a very common technique in economics to predict the behaviour of the market. So if you ever decide to sell something to a big group of people, you will probably end up using regressions to find the best price for what you want to sell.

Course: Statistics and probability > Unit 5

Proof (part 1) minimizing squared error to regression line

Want to join the conversation?

Video transcript