Learning to learn and to optimize

  1. From Gradient Descent to Stochastic Gradient Descent (SGD)
  2. RMSProp and Adam
  3. Learning to learn by gradient descent by gradient descent
  4. Learning to optimize

References

  • Learning to learn by gradient descent by gradient descent, Marcin Andrychowicz, Misha Denil, Sergio Gomez, Matthew W. Hoffman, David Pfau, Tom Schaul, Brendan Shillingford, Nando de Freitas. arXiv:1606.04474 [cs] (2016)
  • Learning to Optimize, Ke Li, Jitendra Malik. arXiv:1606.01885 [cs, math, stat] (2016)
  • Learning to Optimize Neural Nets, Ke Li, Jitendra Malik. arXiv:1703.00441 [cs, math, stat] (2017)