In order to clearly understand the algorithms behind ML, it is important to have clear numerical analysis understanding.
First a good summary of gradient descent methods are explained in this paper:
An overview of gradient descent optimization algorithms – arxiv
Then, Michel Bierlaire from the EPFL in Switzerland wrote a good book on optimization, and has a YouTube channel which gives very good introduction to optimization methods for ML.
Michel Berliaire EPFL web page
YouTube channel: Michel_Bierlaire