# The MSE loss

The mean squared error loss quantifies the error between a target variable $\vy$ and an estimate $\hat{\vy}$ for its value.

This loss function is defined as the mean of the squares of the individual losses between each components of $\vy$ and $\hat{\vy}$. Let $\sn$ be the length of the vector $\vy$.

The sum in the definition above is equal to the squared euclidean norm, so we can rewrite the definition as:

The difference vector is often named the residual and noted $\epsilon$. It is the error vector between $\vy$ and the predicted value $\hat{\vy}$:

Using this vocabulary, the mean squared error loss is the squared norm of the residual, with a correction factor to account for the dimension:

$\lmse(\vy, \hat{\vy}) = \frac{1}{\sn} \normtwo{\epsilon}^2$.

This can be visualized on the graph below.

TODO: make the graph.

## The machine-learning notation

In our linear regression articles, we usually note the MSE loss like this:

Where $\trainset$ is the trainset and $\vw$ is some vector of parameters.

In this situation, the vector $\vy$ is the output vector corresponding to our trainset. The estimator $\hat{\vy}$ is the predicted value:

where $\mx$ is the design matrix.

So we have:

$\lmse(\trainset, \vw) = \lmse(\vy, \mx\vw)$.