In this article, we define underfitting and overfitting and show some nice ways to vizualize them on polynomial regressions. In short Underfitting and overfitting describe the ability of a machine-learning model to make good predictions on datasets it wasn’t trained on. Underfitting happens when: The model is too rigid to learn the true relationship in the data. Both test error and train error are large. The error is dominated by the bias error. Overfitting happens when: The model is not rigid enough and mistakes noise for signal. The train error is low but the test error is large. The error is dominated by the variance error. The left hand side graph shows the underfitting scenario. We can see that the model is so rigid that is doesn’t fit the general shape of the data. This is called the bias of the model. The graph in the middle displays a good fit. The model fits the general shape of the data but does not wiggle too much between data points. This is the good bias-variance equilibrium. The right hand side graph shows the overfitting scenario. The model wiggles too much. And the general shape of the data is obfuscated by this high variance. The signal and the noise Let Let The train dataset, This means that for each observation Where: Learning the signal The goal of machine-learning is to learn the signal function To do so, we suppose a model While our end goal is to approximate To illustrate overfitting, we will use polynomial regression. As the degree of the fitted polynomial increases, the model has more freedom to fit complex signals. But also more freedom to fit the unwanted noise. This is illustrated on the picture below, where: the red curve is the real signal the red points are observed values for this signal, polluted by the random noise the blue curve is the regression line learned by a polynomial regression. High variance To better visualize the implications of overfitting on the regression curve, we can generate multiple train datasets, each with the same signal curve (in red), but with different random values for the noise and: where Let’s generate a lot of train datasets like those. If we fit a polynomial regression to each train dataset thus generated and we graph all the regression lines (in blue) on the same plot, we can visualize the high variance induced by overfitting. See picture below. The Bias-Variance Tradeoff Since we used a gaussian noise with Hence: On the picture below, we graphed the mean of the regression curves in blue. The gray shape around it is the spread of all the regression curves. We can see that for low degree polynomials, the blue curve does not fit the red curve. We say that they have high bias. But the gray shape has small width. We say they have low variance. On the other hand, for high degree polynomials, the blue curve perfectly match the red curve. We say they have low bias. But the gray shape has large width. We say they have high variance.