Loading [MathJax]/jax/output/SVG/jax.js

The maximum likelihood estimator (MLE)

Nov 14, 2018

The maximum likelihood estimator is one of the most used estimators in statistics. In this article, we introduce this estimator and study its properties.

In a typical inference task, we have some data (x1,,xn) that we wish to understand better. The statistical approach is to model the source of these data as a random variable X=(X1,,Xn) whose outcomes are produced with joint-probability fX(xθ) where θΘ is an unknown parameter.

Definitions

A maximum likelihood estimator for θ is an estimator that maximizes the probability of producing the sample we observed.

Definition: likelihood
The likelihood is the probability fX seen as a function of θ:
Lx(θ)=fX(xθ)
Definition: MLE
When the likelihood admits a unique global maximum, the MLE ˆθ is:
ˆθ=argmaxtΘLx(t)

In practice, we often maximize the log-likelihood instead of the likelihood. Since ln is an increasing function, this yields an equivalent solution.

The log-likelihood is noted l:

lx(θ)=lnLx(θ)=lnfX(xθ)

Remarks:

  • the likelihood is not the probability of θ;
  • maximizing the probability of θ is called “maximum a posteriori estimation”.

Estimator performance

As explained in our primer on estimators, we first want to know if the MLE is consistent.

Consistency

Under some regularity conditions on the density fX, the MLE is a consistent estimator, for instance:

  • when θRd and Lx(θ) is concave;
  • when θR and Lx(θ) is continuously differentiable;
  • when fX(xθ) is from a k-parameter exponential family.

Asymptotic performance

Assuming an i.i.d. sample and under sufficient regularity of the distribution fX, the MLE has excellent asymptotical properties:

Theorem
For i.i.d. samples with sufficient regularity and assuming consistency, the asymptotic distribution of the MLE is:
n(ˆθnθ)dN(0,1I1(θ))

Where:

I1(θ)=E[ddθlx1(θ)]

is the Fisher information.

So, for large sample sizes n:

  • it is approximately normally distributed;
  • approximately unbiased;
  • approximately achieves the Cramer-Rao lower bound.

…What else?

What are those regularity conditions?

  • Θ is an open subset of R (so that it always make sense for an estimator to have symmetric distribution around θ).
  • The support of fX is independent of θ (so that we can interchange integration and differentiation).
  • LxC3.
  • E[lxi(θ)]=0 and var[lxi(θ)]=I1(θ)>0.
  • E[lxi(θ)]=I1(θ)>0.
  • m(x)>0 and δ>0 such that Eθ[m(Xi)]< and:
|tθ|<δ|lx(t)|m(x)

Other properties

The MLE is equivariant, which is very convenient in practice.

Proposition: Equivariance of the MLE
MLEs are equivariant: let g:ΘΘ a bijection. If ˆθ is the MLE of θ, then g(ˆθ) is the MLE of g(θ):
^g(θ)=g(ˆθ)