The mean squared error loss quantifies the error between a target variable →y and an estimate ˆ→y for its value.
This loss function is defined as the mean of the squares of the individual losses between each components of →y and ˆ→y. Let n be the length of the vector →y.
LMSE(→y,ˆ→y)=1nn∑i=1(→yi−ˆ→yi)2The sum in the definition above is equal to the squared euclidean norm, so we can rewrite the definition as:
LMSE(→y,ˆ→y)=1n‖→y−ˆ→y‖22The difference vector is often named the residual and noted ϵ. It is the error vector between →y and the predicted value ˆ→y:
ϵ=→y−ˆ→yUsing this vocabulary, the mean squared error loss is the squared norm of the residual, with a correction factor to account for the dimension:
LMSE(→y,ˆ→y)=1n‖ϵ‖22.
This can be visualized on the graph below.
TODO: make the graph.
The machine-learning notation
In our linear regression articles, we usually note the MSE loss like this:
LMSE(Strain,→w)Where Strain is the trainset and →w is some vector of parameters.
In this situation, the vector →y is the output vector corresponding to our trainset. The estimator ˆ→y is the predicted value:
ˆ→y=X→wwhere X is the design matrix.
So we have:
LMSE(Strain,→w)=LMSE(→y,X→w).