In a classification problem, the dataset labels While in regression, output values are numerical (labels can take at most a finite number of values: Assume that the labels are binary: As for regression, we suppose that there exists an approximate deterministic relationship Our goal is to use a subset Classification using regression We can try to use a regression and then binarize the predicted value: values above a given threshold are set to Let’s generate a dataset maThe use of logistic regression de of On the picture below: The orange marks are datapoints. Datapoints in class We fit a linear least squares line to this dataset. The line is drawn in blue. The orange dashed line shows the threshold value of If the blue line is under the orange line, the point is classified as Everything looks good so far! If the computed value label label The problem with this approach is that the loss function used by the regression is not at all adapted to classification. Even on easy dataset like this one where the separation lies at Let’s generate Polynomial regression The stability is much better using a polynomial regression of degree But this is still suboptimal. While it is true that a small MSE error induces a small misclassification error, it can happen that every example is correctly classified but the MSE error is large. This is because a predicted value of MSE error is still A model with MSE loss might therefore have to work much harder than necessary in order to provide a decent upper bound on the classification error.