Gradient descent is an optimization algorithm that tries to find the minimum of a function by following its gradient.

## The gradient

The gradient of a differentiable function is a vector that points in the direction of greatest rate of increase of .

Mathematically, it is defined by the partial derivatives:

## Gradient descent

Since the gradient is directed along the steepest ascent, it’s opposite is directed along the steepest descent. The gradient descent algorithm aims at finding a function’s minimum by taking successive steps directed along :

where is the size of the step at iteration .

```
def gradient_descent(f, x_0, max_steps, step_size):
x = x_0
for step in range(max_steps):
x = x - step_size * gradient(of=f, at=x)
return x
```

## Pitfalls

As illustrated on the picture above, the gradient descent algorithm might get stuck in a local minimum. Since it always follows the steepest descent, there is no way to get away from a local minimum once it has been reached.

## Illustration

The contour plot below displays the contours line of a function (think of it as seeing the surface from above). Click anywhere on it to start a gradient descent.