# The maximum likelihood estimator (MLE)

The maximum likelihood estimator is one of the most used estimators in statistics. In this article, we introduce this estimator and study its properties.

In a typical inference task, we have some data $(\sx_1, \dotsc, \sx_{\sn})$ that we wish to understand better. The statistical approach is to model the source of these data as a random variable $\rvx = (\rx_1, \dotsc, \rx_{\sn})$ whose outcomes are produced with joint-probability $f_{\rvx}(\vx \mid \theta)$ where $\theta \in \Theta$ is an unknown parameter.

## Definitions

A maximum likelihood estimator for $\theta$ is an estimator that maximizes the probability of producing the sample we observed.

Definition: likelihood
The likelihood is the probability $f_\rvx$ seen as a function of $\theta$:
Definition: MLE
When the likelihood admits a unique global maximum, the MLE $\hat{\theta}$ is:

In practice, we often maximize the log-likelihood instead of the likelihood. Since $\ln$ is an increasing function, this yields an equivalent solution.

The log-likelihood is noted $l$:

Remarks:

• the likelihood is not the probability of $\theta$;
• maximizing the probability of $\theta$ is called “maximum a posteriori estimation”.

## Estimator performance

As explained in our primer on estimators, we first want to know if the MLE is consistent.

### Consistency

Under some regularity conditions on the density $f_{\rvx}$, the MLE is a consistent estimator, for instance:

• when $\theta \in \realvset{\sd}$ and $L_{\vx}(\theta)$ is concave;
• when $\theta \in \realset$ and $L_{\vx}(\theta)$ is continuously differentiable;
• when $f_{\rvx}(\vx \mid \theta)$ is from a $k$-parameter exponential family.

### Asymptotic performance

Assuming an i.i.d. sample and under sufficient regularity of the distribution $f_{\rvx}$, the MLE has excellent asymptotical properties:

Theorem
For i.i.d. samples with sufficient regularity and assuming consistency, the asymptotic distribution of the MLE is:

Where:

is the Fisher information.

So, for large sample sizes $\sn$:

• it is approximately normally distributed;
• approximately unbiased;
• approximately achieves the Cramer-Rao lower bound.

…What else?

What are those regularity conditions?

• $\Theta$ is an open subset of $\realset$ (so that it always make sense for an estimator to have symmetric distribution around $\theta$).
• The support of $f_{\rvx}$ is independent of $\theta$ (so that we can interchange integration and differentiation).
• $L_{\vx} \in \mathcal{C}^3$.
• $\expectation[l'_{\sx_{\si}}(\theta)] = 0$ and $\var[l'_{\sx_{\si}}(\theta)] = \mathcal{I}_1(\theta) > 0$.
• $-\expectation[l''_{\sx_{\si}}(\theta)] = \mathcal{I}_1(\theta) > 0$.
• $\exists\fm(\sx)>0$ and $\delta>0$ such that $% $ and:

### Other properties

The MLE is equivariant, which is very convenient in practice.

Proposition: Equivariance of the MLE
MLEs are equivariant: let $g: \Theta \to \Theta'$ a bijection. If $\hat{\theta}$ is the MLE of $\theta$, then $g(\hat{\theta})$ is the MLE of $g(\theta)$: