In a typical inference task, we have some data that we wish to understand better. The statistical approach is to model the source of these data as a random variable whose outcomes are produced with joint-probability where is an unknown parameter.
A maximum likelihood estimator for is an estimator that maximizes the probability of producing the sample we observed.
- Definition: likelihood
- The likelihood is the probability seen as a function of :
- Definition: MLE
- When the likelihood admits a unique global maximum, the MLE is:
In practice, we often maximize the log-likelihood instead of the likelihood. Since is an increasing function, this yields an equivalent solution.
The log-likelihood is noted :
- the likelihood is not the probability of ;
- maximizing the probability of is called “maximum a posteriori estimation”.
As explained in our primer on estimators, we first want to know if the MLE is consistent.
Under some regularity conditions on the density , the MLE is a consistent estimator, for instance:
- when and is concave;
- when and is continuously differentiable;
- when is from a -parameter exponential family.
Assuming an i.i.d. sample and under sufficient regularity of the distribution , the MLE has excellent asymptotical properties:
- For i.i.d. samples with sufficient regularity and assuming consistency, the asymptotic distribution of the MLE is:
is the Fisher information.
So, for large sample sizes :
- it is approximately normally distributed;
- approximately unbiased;
- approximately achieves the Cramer-Rao lower bound.
What are those regularity conditions?
- is an open subset of (so that it always make sense for an estimator to have symmetric distribution around ).
- The support of is independent of (so that we can interchange integration and differentiation).
- and .
- and such that and:
The MLE is equivariant, which is very convenient in practice.
- Proposition: Equivariance of the MLE
- MLEs are equivariant: let a bijection. If is the MLE of , then is the MLE of :