A linear regression attempts to estimate an output value using a linear function. Those functions can be expressed concisely using the vector notations. In this article, we define the design matrix and the output vector for a linear regression.
Let be our dataset, made of records where is an input vector and is an output value:
Our goal is to approximate with a linear function of . Let’s note the parameters of the linear function. We want:
The first step towards vector notations is to note . The approximation becomes:
And we can start the sum symbol at :
If we note , we can stack all those approximations for into a vector equation:
The expression above is exactly the definition of a matrix product. Let be the design matrix, which is defined as follow:
It is the matrix whose -th row is the vector :
The approximation can be written as a matrix product:
Which is the vector notation for a linear regression. The vector is named the output vector and the matrix is the design matrix. (And is the vector of parameters).