Impact of Training Set Batch Size on the Performance of Convolutional Neural Networks for Diverse Datasets

Open access

Abstract

A problem of improving the performance of convolutional neural networks is considered. A parameter of the training set is investigated. The parameter is the batch size. The goal is to find an impact of training set batch size on the performance. To get consistent results, diverse datasets are used. They areMNIST and CIFAR-10. Simplicity of the MNIST dataset stands against complexity of the CIFAR-10 dataset, although the simpler dataset has 10 classes as well as the more complicated one. To achieve acceptable testing results, various convolutional neural network architectures are selected for the MNIST and CIFAR-10 datasets, with two and five convolutional layers, respectively. The assumption about the dependence of the recognition accuracy on the batch size value is confirmed: the larger the batch size value, the higher the recognition accuracy. Another assumption about the impact of the type of the batch size value on the CNN performance is not confirmed.

[1] P. Goyal, P. Dollar, R. Girshick, P. Noordhuis, “Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour,” Facebook AI Research (FAIR), In CVPR, 2017.

[2] L. Bottou. “Online Learning and Stochastic Approximations,” Online Learning and Neural Networks, 1998.

[3] D. Mishkina, N. Sergievskiyb, J. Matasa, “Systematic Evaluation of CNN Advances on the ImageNet,” Center for Machine Perception, Faculty of Electrical Engineering, 2016.

[4] S. Ioffe and C. Szegedy, “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,” In Proceedings of The 32nd International Conference on Machine Learning, pp. 448-456, 2015.

[5] L. Wang, Y. Yang, R. Min, and S. Chakradhar, “Accelerating Deep Neural Network Training with Inconsistent Stochastic Gradient Descent,” Neural Networks, vol. 93, pp. 219-229, Sep. 2017. https://doi.org/10.1016/j.neunet.2017.06.003

[6] M. Dereziński, D. Mahajan, S. S. Keerthi, S. V. N. Vishwanathan and M. Weimer, Batch-Expansion Training: An Efficient Optimization Paradigm for Machine Learning. 2017.

[7] N. Sh. Keskar, Dh. Mudigere, J. Nocedal, M. Smelyanskiy and P. T.- P. Tang, “On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima,” ICLR, 2017.

[8] D. R. Wilson and T. R. Martinez, “The General Inefficiency of Batch Training for Gradient Descent Learning,” Neural Networks, vol. 16, no. 10, pp. 1429-1451, Dec. 2003. https://doi.org/10.1016/s0893-6080(03)00138-2

[9] M. Takac, A. Bijral, P. Richtarik and N. Srebro, “Mini-Batch Primal and Dual Methods for SVMs,” JCMB, 2013.

[10] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2016. https://doi.org/10.1109/cvpr.2016.90

[11] A. Krizhevsky, “One Weird Trick for Parallelizing Convolutional Neural Networks,” In CoRR, 2014.

[12] K. Simonyan and A. Zisserman “Very Deep Convolutional Networks for Large-Scale Image Recognition,” In Proceedings of ICLR, 2015.

[13] M. Li, T. Zhang, Y. Chen, and A. J. Smola, “Efficient Mini-Batch Training for Stochastic Optimization,” In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’14, 2014. http://dx.doi.org/10.1145/2623330.2623612

[14] Z. Lin, M. Courbariaux, R. Memisevic and Y. Bengio, “Neural networks with Few Multiplications,” In Proceedings of the 32d International Conference on Machine Learning, ICML ’16, pp. 561-568, 2016.

[15] D. Ciresan, U. Meier, and J. Schmidhuber, “Multi-Column Deep Neural Networks for Image Classification,” 2012 IEEE Conference on Computer Vision and Pattern Recognition, Jun. 2012. https://doi.org/10.1109/cvpr.2012.6248110

[16] V. V. Romanuke, “Training Data Expansion and Boosting of Convolutional Neural Networks for Reducing the MNIST Dataset Error Rate,” Research Bulletin of the National Technical University of Ukraine “Kyiv Polytechnic Institute,” vol. 0, no. 6, pp. 29-34, Dec. 2016. https://doi.org/10.20535/1810-0546.2016.6.84115

[17] V. V. Romanuke, “Appropriate Number and Allocation of RELUS in Convolutional Neural Networks,” Research Bulletin of the National Technical University of Ukraine “Kyiv Politechnic Institute,” vol. 0, no. 1, pp. 69-78, Mar. 2017. https://doi.org/10.20535/1810-0546.2017.1.88156

[18] Y. LeCun and Y. Bengio, “Convolutional Networks for Images, Speech, and Time-Series,” The Handbook of Brain Theory and Neural Networks, 1995.

[19] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. 2016.

Information Technology and Management Science

The Journal of Riga Technical University

Journal Information

Metrics

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 295 295 108
PDF Downloads 101 101 48