Vector notation for linear regressions

A linear regression attempts to estimate an output value using a linear function. Those functions can be expressed concisely using the vector notations. In this article, we define the design matrix and the output vector for a linear regression.

Let be our dataset, made of records where is an input vector and is an output value:

Our goal is to approximate with a linear function of . Let’s note the parameters of the linear function. We want:

The first step towards vector notations is to note . The approximation becomes:

And we can start the sum symbol at :

If we note , we can stack all those approximations for into a vector equation:

The expression above is exactly the definition of a matrix product. Let be the design matrix, which is defined as follow:

It is the matrix whose -th row is the vector :

The approximation can be written as a matrix product:

Which is the vector notation for a linear regression. The vector is named the output vector and the matrix is the design matrix. (And is the vector of parameters).