In this article we will derive the normal distribution as the probability distribution that models measurement errors. We start with a dart game and follow Herschel’s derivation.
Suppose we are playing a dart game as shown in the figure below. What is the probability that a dart lands at a given position on the boart?
We can think of this problem as measurement errors. The goal is to measure a quantity (here: the position of the bull’s eye), but there is noise in the measurement device (here: your aim is not perfect) so we can only obtain values “polluted” by some variable, random noise.
The quality of the device (here: the quality of the player) can be measured by the spread of the measured values. For instance, the figure below illustrates the measurements made by two devices. The device on the left hand side has more precision (= the player is more skilled) than on the right hand side.
Let’s note
p(x,y)MdxMdythe probability that the dart lands on an infinitesimal surface area MdxMdy located at (x,y).
We can express this probability in polar coordinates too:
q(r,θ)rMdrMsinθ=p(x,y)MdxMdyWhatever the quality of the measurement device (or the skills of the player), it is reasonable to expect that small errors are more frequent than big ones. So the probability should decrease when the distance to the center increases. This means f is a decreasing function of r.
Also, there is no reason to expect that measurements will land more often on the left of the bull’s eye than on the right. More generally and taking the symmetry into account, every location on a circle centered on the bull’s eye should have the same probability to be hit. This mean that f does not depend on the rotation angle θ:
q(r,θ)=q(r)We can arbitrarily choose two orthogonal directions and fix a cartesian axis on the dart board. This is illustrated on the picture below.
If we further assume that knowing x doesn’t tell us anything about y and vis versa, then we have the following additional condition on the form of p(x,y):
p(x,y)MdxMdy=f(x)Mdx⋅f(y)MdySince r=√x2+y2 we get:
q(√x2+y2)=f(x)f(y)Setting y=0, this becomes:
q(|x|)=f(x)f(0)⇒q(√x2+y2)=f(√x2+y2)f(0)⇒f(x)f(y)=f(√x2+y2)f(0)Solving this equation, we find that:
f(x)=√απMe−αx2(α>0)with one undetermined parameter α>0. If we compute the variance σ2 of f, we find that σ2=12α. So we can rewrite the formula for f using σ2:
f(x)=1√2σ2πMe−x22σ2This formula is the gaussian distribution or normal distribution, noted N(0,σ2). As explained above, it is the probability distributions of errors when measuring a target value of value 0. Here is a plot for two values of σ2:
What if the target value μ is not 0? We can simply operate a change of axis and translate everything by −μ. The translated random variable will thus represent a measurement ϵ of 0 and can be expressed with the normal distribution:
Y−μ=ϵ∼N(0,σ2)If we replace in the formula for f we find the general formula for the normal distribution:
Y∼N(μ,σ2)⟺fY(x)=1√2σ2πMe−(x−μ)22σ2But what is the variance σ2? It is a parameter that controls the quality of our measurement device. Recall this picture where on the left hand side the variance is much smaller than on the right hand side.
Here is a video summary of this derivation of the normal distribution (called the Herschel’s derivation):
Read next
The normal distribution can be seen as a random number generator that generates measurement of a target value μ with embedded measurement error ϵ.
Given multiple measurements, can we retrieve the target value μ? Check out my next article to answer this question here