Polynomial basis expansion

Nov 05, 2018

Polynomial basis expansion, also called polynomial features augmentation, is part of the machine-learning preprocessing. It consists in adding powers of the input’s components to the input vector.

Example

Let S be a dataset for a machine-learning task. As usual, S is made of N pairs of input vectors xn and output value yn:

S={(xn,yn)nN}

For clarity, suppose the dimensionality is two: xnR2.

The polynomial augmentation of degree 2 for the input vector xn:

xn=(xn,1xn,2)

Is the vector Φ2(xn):

Φ2(xn)=(xn,1xn,2(xn,1)2(xn,2)2)

Likewise, the degree 3 augmentation is:

Φ3(xn)=(xn,1xn,2(xn,1)2(xn,2)2(xn,1)3(xn,2)3)

and so on…

Polynomial regression and linear models

Polynomial inputs augmentation enriches the expressive power of linear models. Indeed, a polynomial regression of degree d is simply a linear regression on the augmented inputs.

In other words, to fit the polynomial P(x):

P(x)=a1x+a2x2+a3x3

To a dataset S={(xn,yn)nN}, it is enough to fit a linear function fa to the augmented dataset:

Φ3(S)={(Φ3(xn),yn)nN}

This is why a polynomial regression is a linear regression.

Notations

To keep the notations simple, we often consider that the polynomial inputs augmentation is done in the preprocessing and we note x to denote the augmented vector Φd(x).

How much richer does linear regression become?

Very much! By the Stone-Weierstrass theorem (wikipedia link), every continuous function f can be uniformly approximated (as closely as desired) by a polynomial function on a closed interval.

Since our dataset is discrete, this means that whatever the relationship ftrue between the inputs and the outputs, we can get as close as we want using polynomials (as long as the degree d is big enough).

Actually, since the train set is finite, we can always find a polynomial of degree d=N1 that goes through every points in the train set. This means that for d=N1, we can make the train error to be 0. This is called the Lagrange polynomial interpolation (wikipedia link).

Wait… we can make the train error to be 0? Yes, but…

So, is it the ultimate machine-learning technique?

No, because…

Polynomial regressions of high degree tend to overfit. If you’re not sure what that means, check out my dedicated article, which is completely writen and illustrated using polynomial regressions: overfiting.

Overfitting and underfitting regression line

The regression matrix X that we have to inverse grows linearly as the regression’s degree d grows (since it has Nd rows). This yields computational complications.

Numerical errors accumulate when we take the power of a number: large powers of small values are rounded to 0 in floating point arithmetic. So even is there exists a mathematical solution, we might not be able to implement it.

How to improve polynomial regressions?