The train error is the error commited by a machine-learning model on the dataset it was trained on. The test error is the error commited on another dataset called the test-set.
Let’s explain this in more details.
Machine learning setup
The dataset S of study consists of N pairs of input vector →xn and output value yn:
S={(→xn,yn)∣n≤N}We suppose that there exists an approximate deterministic relationship ftrue between the inputs →xn and the outputs yn:
∀n≤N,yn≈ftrue(→xn)The goal of a machine-learning model is to learn this relationship using a subset Strain⊆S of the dataset.
Train-set and test-set
The dataset S is often split into two disjoint parts Strain and Stest respectively named the train-set and the test-set:
S=Strain∪StestTraining
To learn the relationship ftrue, we choose a class of models F among which we select the best model. This is done by minimizing a function named the training objective, on the train-set:
fStrain=argminf∈FG(f,Strain)After training, we should have the approximate relationship (on the whole dataset S):
∀n≤N,yn≈fStrain(→xn)The loss function
To measure progress during training, we must decide how costly each mistake is. This is the role of the loss function which measures how well some model f∈F performs on some subset Sx of the dataset S:
L(f,Sx)∈R+The train error
The train error is the loss of a model fStrain on the same dataset Strain it was trained on:
train error=L(fStrain,Strain)The test error
The test error is the loss of a model fStrain on the other dataset Stest:
test error=L(fStrain,Stest)For mathematical correctness, it is important that the test-set Stest does not contain any pairs from the train-set Strain:
Strain∩Stest=∅