In this article we will derive the normal distribution as the probability distribution that models measurement errors. We start with a dart game and follow Herschel’s derivation.

Suppose we are playing a dart game as shown in the figure below. What is the probability that a dart lands at a given position on the boart?

We can think of this problem as measurement errors. The goal is to measure a quantity (here: the position of the bull’s eye), but there is noise in the measurement device (here: your aim is not perfect) so we can only obtain values “polluted” by some variable, random noise.

The quality of the device (here: the quality of the player) can be measured by the spread of the measured values. For instance, the figure below illustrates the measurements made by two devices. The device on the left hand side has more precision (= the player is more skilled) than on the right hand side.

Let’s note

the probability that the dart lands on an infinitesimal surface area located at .

We can express this probability in polar coordinates too:

Whatever the quality of the measurement device (or the skills of the player), it is reasonable to expect that small errors are more frequent than big ones. So the probability should decrease when the distance to the center increases. This means is a decreasing function of .

Also, there is no reason to expect that measurements will land more often on the left of the bull’s eye than on the right. More generally and taking the symmetry into account, every location on a circle centered on the bull’s eye should have the same probability to be hit. This mean that does not depend on the rotation angle :

We can arbitrarily choose two orthogonal directions and fix a cartesian axis on the dart board. This is illustrated on the picture below.

If we further assume that knowing doesn’t tell us anything about and vis versa, then we have the following additional condition on the form of :

Since we get:

Setting , this becomes:

Solving this equation, we find that:

with one undetermined parameter . If we compute the variance of , we find that . So we can rewrite the formula for using :

This formula is the *gaussian distribution* or *normal distribution*, noted . As explained above, it is the probability distributions of errors when measuring a target value of value . Here is a plot for two values of :

What if the target value is not ? We can simply operate a change of axis and translate everything by . The translated random variable will thus represent a measurement of and can be expressed with the normal distribution:

If we replace in the formula for we find the general formula for the *normal distribution*:

But what is the variance ? It is a parameter that controls the quality of our measurement device. Recall this picture where on the left hand side the variance is much smaller than on the right hand side.

Here is a video summary of this derivation of the *normal distribution* (called the Herschel’s derivation):

## Read next

The *normal distribution* can be seen as a random number generator that generates measurement of a target value with embedded measurement error .

Given multiple measurements, can we retrieve the target value ? Check out my next article to answer this question here