Train error and test error

Nov 01, 2018

The train error is the error commited by a machine-learning model on the dataset it was trained on. The test error is the error commited on another dataset called the test-set.

Let’s explain this in more details.

Machine learning setup

The dataset of study consists of pairs of input vector and output value :

We suppose that there exists an approximate deterministic relationship between the inputs and the outputs :

The goal of a machine-learning model is to learn this relationship using a subset of the dataset.

Train-set and test-set

The dataset is often split into two disjoint parts and respectively named the train-set and the test-set:

Training

To learn the relationship , we choose a class of models among which we select the best model. This is done by minimizing a function named the training objective, on the train-set:

After training, we should have the approximate relationship (on the whole dataset ):

The loss function

To measure progress during training, we must decide how costly each mistake is. This is the role of the loss function which measures how well some model performs on some subset of the dataset :

The train error

The train error is the loss of a model on the same dataset it was trained on:

The test error

The test error is the loss of a model on the other dataset :

For mathematical correctness, it is important that the test-set does not contain any pairs from the train-set :