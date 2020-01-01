Linear Regression Loss Function

There are different ways of evaluating the errors. For example, if you predicted that a student's GPA is 3.0, but the student actual GPA is 1.0, the difference between the actual and predicted GPAs is $1.0 - 3.0 = -2.0$. However, there can't be a negative distance, can it be? So what can we do?

Well, you can either take the absolute difference, which is just $2.0$. Alternatively, you can take the squared difference , which is $2.0^2 = 4.0$. If you can't decide which one to use, you can add them together, it is not the end of the world, so it will be $1.0+4.0 = 5.0$. Well, each of these distance calculation techniques (aka distance metrics) result in a differently behaving linear regression model. To escape the ambiguity about the distance between the actual and the predicted value, we use the term residual, which refers to the error, regardless of how it is calculated. So let's put all residual calculation techinques in a table for you, with their formal names and formulas.

Distance Metric Formal Name Nickname Formula Absolute Lasso L1 |$d$| Squared Ridge L2 $d^2$ Both Elastic Net EN |$d$| + $d^2$

The function we want to normalize when we are fitting a linear regression model is called the loss function, which is the sum of all the squared residuals on the training data, formally called Residual Sum of Squares (RSS): $$RSS = \sum_{i=1}^n{\bigg(y_i-\beta_0-\sum_{j=1}^k{\beta_jx_{ij}}\bigg)^2}$$ Notice the similarity between this equation and the MSE equation defined above. MSE is used to evaluate the performance of the model at the end, and it doesn't not depend on how $\hat{y_i}$ (i.e. the predicted value) is calculated. Whereas, RSS, uses the SS (Sum of Squares) to calculate the residual of all data points in training time.