To understand what a generalized linear model does, let’s look back at linear models.

## Typical linear model setup

In the typical setup for a linear model, we have a random input vector and a random output variable whose mean is a linear function of .

For instance, a linear least squares regression amounts to:

Similarly, a linear regression with MAE loss amounts to:

Hence, with tradidional linear models, we attempt to predict the mean of . When this mean has a non-linear dependence with , we can try to predict another parameter instead and then use a non linear function to tranform this parameter into the mean.

## Exponential family

A family of distribution for which this approach works particularly well is the exponential family. Which is good news because the most familiar distributions are part of this family (normal, exponential, Bernoulli, Poisson, geometric, etc.)

A density belongs to the exponential family if it can be writen as:

The parameter is called the natural parameter of the density and is named the cumulant.

This family of distribution is even more interesting since is a sufficient statistic!

Generalized linear models’ aim is to estimate the natural parameter on the basis of the dataset. Before we dive into more details, let’s convince ourselves that exponential families are THE thing.

## The link function

Estimating using the sufficient statistic , is made possible through the link function such that:

for .

This link function always exists because is a sufficient statistic.

## Usual distributions are members of the exponential family

The distribution is a member of the exponential family with parameters:

A generalized linear model for this distribution is called a logistic regression.

The distribution is a member of the exponential family with parameters:

The normal distribution is a member of the exponential family with parameters:

## Generalized linear models

### Setup

Let a random vector and a random variable. Assume our dataset is made of i.i.d. samples from :

We suppose the distribution of given is member of the exponential family:

Where the natural parameter linearly depends on :

### Loss function

We will used a maximum likelihood estimation method for . Our goal is thus to maximize the likelihood:

Which amounts to maximizing the log-likelihood. In other words, our loss function is:

This loss function is convex.

Let’s note the matrix whose -th row is the vector :

The gradient is:

And this loss function can be minimized using gradient descent to find .