What is gradient descent (GD)?

Oct 16, 2018

Gradient descent is an optimization algorithm that tries to find the minimum of a function by following its gradient.

The gradient

The gradient f of a differentiable function f is a vector that points in the direction of greatest rate of increase of f.

Gradient

Mathematically, it is defined by the partial derivatives:

f(v)=(f(v)v1,...,f(v)vn)

Gradient descent

Since the gradient is directed along the steepest ascent, it’s opposite f is directed along the steepest descent. The gradient descent algorithm aims at finding a function’s minimum by taking successive steps directed along f:

xn+1=xnsnf(xn)

where sn is the size of the step at iteration n.

Gradient descent

def gradient_descent(f, x_0, max_steps, step_size):
    x = x_0
    for step in range(max_steps):
        x = x - step_size * gradient(of=f, at=x)
    return x

Pitfalls

As illustrated on the picture above, the gradient descent algorithm might get stuck in a local minimum. Since it always follows the steepest descent, there is no way to get away from a local minimum once it has been reached.

Illustration

The contour plot below displays the contours line of a function f (think of it as seeing the surface from above). Click anywhere on it to start a gradient descent.