The maximum likelihood estimator (MLE)

The maximum likelihood estimator is one of the most used estimators in statistics. In this article, we introduce this estimator and study its properties.

In a typical inference task, we have some data that we wish to understand better. The statistical approach is to model the source of these data as a random variable whose outcomes are produced with joint-probability where is an unknown parameter.

Definitions

A maximum likelihood estimator for is an estimator that maximizes the probability of producing the sample we observed.

Definition: likelihood
The likelihood is the probability seen as a function of :
Definition: MLE
When the likelihood admits a unique global maximum, the MLE is:

In practice, we often maximize the log-likelihood instead of the likelihood. Since is an increasing function, this yields an equivalent solution.

The log-likelihood is noted :

Remarks:

  • the likelihood is not the probability of ;
  • maximizing the probability of is called “maximum a posteriori estimation”.

Estimator performance

As explained in our primer on estimators, we first want to know if the MLE is consistent.

Consistency

Under some regularity conditions on the density , the MLE is a consistent estimator, for instance:

  • when and is concave;
  • when and is continuously differentiable;
  • when is from a -parameter exponential family.

Asymptotic performance

Assuming an i.i.d. sample and under sufficient regularity of the distribution , the MLE has excellent asymptotical properties:

Theorem
For i.i.d. samples with sufficient regularity and assuming consistency, the asymptotic distribution of the MLE is:

Where:

is the Fisher information.

So, for large sample sizes :

  • it is approximately normally distributed;
  • approximately unbiased;
  • approximately achieves the Cramer-Rao lower bound.

…What else?

What are those regularity conditions?

  • is an open subset of (so that it always make sense for an estimator to have symmetric distribution around ).
  • The support of is independent of (so that we can interchange integration and differentiation).
  • .
  • and .
  • .
  • and such that and:

Other properties

The MLE is equivariant, which is very convenient in practice.

Proposition: Equivariance of the MLE
MLEs are equivariant: let a bijection. If is the MLE of , then is the MLE of :