Subsampled Nonmonotone Spectral Gradient Methods

Stefania Bellavia 1 , Nataša Krklec Jerinkić 2  and Greta Malaspina 2
  • 1 Department of Industrial Engineering, Italy
  • 2 Department of Mathematics and Informatics, Faculty of Sciences, Serbia

Abstract

This paper deals with subsampled spectral gradient methods for minimizing finite sums. Subsample function and gradient approximations are employed in order to reduce the overall computational cost of the classical spectral gradient methods. The global convergence is enforced by a nonmonotone line search procedure. Global convergence is proved provided that functions and gradients are approximated with increasing accuracy. R-linear convergence and worst-case iteration complexity is investigated in case of strongly convex objective function. Numerical results on well known binary classification problems are given to show the effectiveness of this framework and analyze the effect of different spectral coefficient approximations arising from the variable sample nature of this procedure.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • 1. S. Bellavia, G. Gurioli, and B. Morini, Theoretical study of an adaptive cubic regularization method with dynamic inexact hessian information, arXiv:1808.06239, 2018.

  • 2. S. Bellavia, N. Krejić, and N. Krklec Jerinkić, Subsampled inexact newton methods for minimizing large sums of convex functions, IMA J. Numerical Analysis, 2019.

  • 3. A. Berahas, R. Bollapragada, and J. Nocedal, An investigation of newton-sketch and subsampled newton methods, arXiv:1705.06211v3, 2018.

  • 4. E. Birgin, N. Krejić, and J. Martínez, On the employment of inexact restoration for the minimization of functions whose evaluation is subject to programming errors, Mathematics of Computation, vol. 87, pp. 1307–1326, 2018.

  • 5. D. Blatt, A. O. Hero, and H. Gauchman, A convergent incremental gradient method with a constant step size, SIAM Journal of Optimization, vol. 18, no. 1, pp. 29–51, 2007.

  • 6. R. Bollapragada, R. R. Byrd, and J. Nocedal, Exact and inexact subsampled newton methods for optimization, IMA Journal Numerical Analysis, 2018.

  • 7. L. Bottou, F. E. Curtis, and J. Nocedal, Optimization methods for large-scale machine learning, SIAM Review, vol. 60, no. 2, pp. 223–311, 2018.

  • 8. L. Bottou, Stochastic gradient learning in neural networks, Proceedings of Neuro-Nimes, vol. 91, no. 8, p. 12, 1991.

  • 9. R. Byrd, G. Chin, J. Nocedal, and Y. Wu, Sample size selection in optimization methods for machine learning, Mathematical Programming, vol. 134, no. 1, pp. 127–155, 2012.

  • 10. M. P. Friedlander and M. Schmidt, Hybrid deterministic-stochastic methods for data fitting, SIAM Journal on Scientific Computing, vol. 34, no. 3, pp. 1380–1405, 2012.

  • 11. R. Johnson and T. Zhang, Accelerating stochastic gradient descent using predictive variance reduction, Advances in Neural Information Processing Systems, 2013.

  • 12. N. Krejić and N. Krklec Jerinkić, Nonmonotone line search methods with variable sample size, Numerical Algorithms, vol. 68, no. 4, pp. 711–739, 2015.

  • 13. F. Roosta-Khorasani and M. Mahoney, Sub-sampled newton methods, Mathematical Programming, vol. 174, pp. 293–326, 2019.

  • 14. N. N. Schraudolph, J. Yu, and S. Günter, A stochastic quasi-newton method for online convex optimization, SAIS International Conference on Artificial Intelligence and Statistics, pp. 436–443, 2007.

  • 15. C. Tan, S. Ma, Y. H. Dai, and Y. Qian, Barzilai-borwein step size for stochastic gradient descent, Advances in Neural Information Processing Systems, vol. 29, pp. 685–693, 2016.

  • 16. Z. Yang, C. Wang, Z. Zhang, and J. Li, Random barzilai-borwein step size for mini-batch algorithms, Engineering Applications of Artificial Intelligence, vol. 72, pp. 124–135, 2018.

  • 17. J. Barzilai and J. M. Borwein, Two-point step size gradient method, IMA J. Numerical Analysis, vol. 8, no. 1, pp. 141–148, 1988.

  • 18. E. G. Birgin, J. M. Martínez, and M. Raydan, Spectral projected gradient methods: Review and perspectives, Journal of Statistical Software, vol. 60, 2014.

  • 19. Y. H. Dai, W. W. Hager, K. Schittkowski, and H. Zhang, The cyclic barzilai-borwein method for unconstrained optimization, IMA Journal Numerical Analysis, vol. 26, no. 3, pp. 604–627, 2006.

  • 20. R. Fletcher, On the barzilai-borwein gradient method, in Optimization and Control with Applications, Applied Optimization (L. Qi, K. Teo, and X. Yang, eds.), vol. 96, pp. 235–256, Springer, 2005.

  • 21. D. di Serafino, V. Ruggiero, G. Toraldo, and L. Zanni, On the steplength selection in gradient methods for unconstrained optimization, Applied Mathematics and Computation, vol. 318, pp. 176–195, 2006.

  • 22. M. Raydan, The barzilai and borwein gradient method for the large scale unconstrained minimization problem, SIAM Journal Optimization, vol. 7, no. 1, pp. 26–33, 1997.

  • 23. N. Krejić and N. Krklec Jerinkić, Spectral projected gradient method for stochastic optimization, Journal of Global Optimization, vol. 73, pp. 59–81, 2018.

  • 24. D. H. Li and M. Fukushima, A derivative-free line search and global convergence of broyden-like method for nonlinear equations, Optimization Methods and Software, vol. 13, no. 3, pp. 181–201, 2000.

  • 25. G. N. Grapiglia and E. Sachs, On the worst-case evaluation complexity of nonmonotone line search algorithms, Computational Optimization and applications, vol. 68, no. 3, pp. 555–577, 2017.

  • 26. Causality workbench team, a marketing dataset. http://www.causality.inf.ethz.ch/data/CINA.html, 2008.

  • 27. M. Lichman, Uci machine learning repository. https://archive.ics.uci.edu/ml/index.php, 2013.

OPEN ACCESS

Journal + Issues

Search