From Gradient Descent to Stochastic Gradient Descent (SGD)
RMSProp and Adam
Learning to learn by gradient descent by gradient descent
Learning to optimize
References
Learning to learn by gradient descent by gradient descent,
Marcin Andrychowicz,
Misha Denil,
Sergio Gomez,
Matthew W. Hoffman,
David Pfau,
Tom Schaul,
Brendan Shillingford,
Nando de Freitas.arXiv:1606.04474 [cs](2016)
Learning to Optimize,
Ke Li,
Jitendra Malik.arXiv:1606.01885 [cs, math, stat](2016)
Learning to Optimize Neural Nets,
Ke Li,
Jitendra Malik.arXiv:1703.00441 [cs, math, stat](2017)