Introduction to statistical estimators

In this article we define what an estimator is. We focus on the theory to compare and assess estimators, rather than how to find one.

Note: estimators are statistics, so I suggest you read our dedicated article on statistics first.


In a typical inference situation, we dispose of a sample of observations:

We model this sample as observations of a random variable whose source is some probability distribution that depends on some unknown parameter .

Point estimators

The purpose of an estimator is to use the observed sample to estimate the true value of .

Since an estimator is a function of the sample, it is a statistic.

Definition: point estimator
Let the range of possible values for . A point estimator of is a statistic taking values in :

Don’t confuse the notations: is a fixed value while is a random variable and is an observation of this random variable.


This definition is very large and clearly not every estimator are interesting. Let’s narrow it down.

Definition: consistent estimator
A point estimator of is consistent if it converges to when the sample size increases:

Precision of an estimator

To measure the precision of an estimator, we can use the mean squared-error:

Definiton: mean squared-error
The mean squared-error of an estimator is the squared-distance between the estimate and the true value of the parameter:

Which can be used to bound the concentration of around the true value :

If converges towards when increases, the estimator is consistent. But we can find consistent estimators for which the MSE does not converge towards .

So, how small can we make the ? Before we answer this question, it will be usefull to use the bias-variance decomposition.

Definition: bias-variance decomposition
The bias-variance decomposition expresses the MSE loss in terms of the bias and the variance of the estimator:

Which explains why unbiased estimators are so popular. Let’s turn our attention to such estimators.


Definition: unbiased estimator
An estimator is unbiased when:

Although unbiased estimators are convenient, always remember that a biased low-variance estimators can be preferable to unbiased high-variance ones. Moreover, biased estimators can be consistent if the bias decreases when increases.

What about the variance term, can we make it as small as we want?


We do have a lower bound on the variance of unbiased estimators:

Cramér-Rao lower bound
Given some regularity conditions, any unbiased estimator of finite variance satisfies:

Where is the Fisher information.

Can we achieve this bound?

attains the Cramér-Rao lower bound if and only if the density of is a one-parameter exponential family with sufficient statistic

And if we can’t achieve it, how can we improve our estimator? The following theorem tells us that in order to reduce the variance of our estimator, we should throw away irrelevant aspects of the data.

Rao-Blackwell theorem
Let be an unbiased estimator of with finite variacne, and let be a sufficient statistic for . Then is also an unbiased estimator of and:

Equality is attained when: .

Recall that a statistic contains information than a statistics when there exists a function such that: .

The following theorem tells us that the more we throw away irrelevant information, the lower the variance of our estimator:

Let be an unbiased estimator, and and two sufficient statistics. If there exists a function such that , then:

So the best we can do is use a minimally sufficient statistic.

Estimators in practice

Common estimators are:

  • the maximum likelihood estimator which maximizes ;
  • the maximum a posterior estimator which maximizes ;
  • the method of moment estimator which approximates with .