A glimpse of the maths behind Optimization Algorithms

Thibaut
14 min readMay 11, 2021
By Reina Kousaka, Unsplash licence.

Ok, let’s start…

(∩`-´)⊃━☆゚.*・。゚

import numpy as np
from scipy.optimize import minimize

End…

No? Ok… ¯\_(ツ)_/¯ We need to dive a bit more into maths…

The goal of this article is to provide a basic understanding of the maths behind common types of optimization algorithms. I used both my notes from the DSTI and other sources like papers and blog posts. The links are provided.

We will start from vanilla fixed step gradient descent, for purely education purpose, and pile-up concepts until we get to the widely used Adam optimizer. By the end, we open on some other promising algorithms.

Notations:

  • J: ℝⁿ → ℝ is the cost function, we are looking for its minimum.
  • x = (x₁, …, xₙ) is a vector of the space ℝⁿ
  • x* is our notation for the minimum.
  • xₖ is the value of x at the kᵗʰ iteration of the gradient descent.

If you want a refresher on mathematical formulas such as Euler equality, you may find it on this article: “A cheatsheet of mathematical formulas for fundations of Continuous Optimization”.

1/ Minimization without constraints

--

--

Thibaut

Publications in English & French about Data Science, Artificial Intelligence, and Innovation.