The residual →e is the error vector between the true output vector →y and its estimate ˆ→y:
→e=→y−ˆ→yResiduals for linear regressions
When ˆ→y is produced by a linear model, the residual represent the aspects of →y that cannot be explained by the columns of the design matrix X :
→e=→y−Xˆ→wSince ^vy=Xˆ→w is a vector in the linear space spanned by the columns of X, the residual is a vector outside of this linear space, pointing towards →y.
Note: and when the MSE-loss is used, the residual is orthogonal to this linear space.
Let M(X) be the column space of X. The vector ˆ→y is in this linear subspace while →y isn’t. This is illustrated on the picture below.
.
Residuals for OLS
The parameter vector ˆ→w can be expressed using a closed-form formula in the case of an OLS regression;
^vw=(X⊤X)−1X⊤→yPluging this formula into the definition of a residual, we get:
→e=→y−X(X⊤X)−1X⊤⏟H→yH is the hat-matrix (because it puts a hat on →y). Rewriting the equality with →y as factor, we get:
→e=(In−H)→yAs mentioned above, for OLS regressions, the residual is a vector orthogonal to the column space of X.