Gradient descent method is one of the popular methods to train feedforward neural networks. Batch and incremental modes are the two most common methods to practically implement the gradient-based training for such networks. Furthermore, since generalization is an important property and quality criterion of a trained network, pruning algorithms with the addition of regularization terms have been widely used as an efficient way to achieve good generalization. In this paper, we review the convergence property and other performance aspects of recently researched training approaches based on different penalization terms. In addition, we show the smoothing approximation tricks when the penalty term is non-differentiable at origin.
1. Hagan M. T., Demuth H. B., Beale M. H., 1996, Neural networks design. Boston ; London: PWS.
2. Haykin S. S., 1999, Neural networks : a comprehensive foundation, 2nd ed. Upper Saddle River, N.J. ; London: Prentice-Hall.
3. Hinton G. E.Salakhutdinov R. R., Jul 2006, Reducing the dimensionality of data with neural networks, Science, Vol. 313, No. 5786, pp. 504-507.
4. LeCun Y., Bengio Y., Hinton G., 05/28/ 2015, Deep learning, Nature, Vol. 521, No. 7553, pp. 436-444.
5. Sutskever I., Hinton G. E., Nov 2008, Deep Narrow Sigmoid Belief Networks Are Universal Approximators, Neural Computation, Vol. 20, No. 11, pp. 2629-2636.
6. Werbos P. J., 1974, Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences, Ph.D., Harvard University, Cambridge, MA.
7. Rumelhart D. E., Hinton G. E., Williams R. J., Oct 9 1986, Learning Representations by Back-Propagating Errors, Nature, Vol. 323, No. 6088, pp. 533-536.
8. Nakama T., Dec 2009, Theoretical analysis of batch and on-line training for gradient descent learning in neural networks, Neurocomputing, Vol. 73, No. 1-3, pp. 151-159.
9. Reed R., 1993, Pruning algorithms-a survey, Neural Networks, IEEE Transactions on, Vol. 4, No. 5, pp. 740-747.
10. Bishop C. M., 1993, Curvature-driven smoothing: a learning algorithm for feedforward networks, Neural Networks, IEEE Transactions on, Vol. 4, No. 5, pp. 882-884.
11. Wu W., Shao H., Li Z., 2006, Convergence of batch BP algorithm with penalty for FNN training, in Neural Information Processing, pp. 562-569.
12. Zhang H., Wu W., Yao M., 2007, Boundedness of a batch gradient method with penalty for feedforward neural networks, in Proceedings of the 12th WSEAS International Conference on Applied Mathematics, pp. 175-178.
13. Zhang H., Wu W., 2009, Boundedness and convergence of online gradient method with penalty for linear output feedforward neural networks, Neural Process Lett, Vol. 29, No. 3, pp. 205-212.
14. Zhang H., Wu W., Liu F., Yao M., 2009, Boundedness and convergence of online gradient method with penalty for feedforward neural networks, Neural Networks, IEEE Transactions on, Vol. 20, No. 6, pp. 1050-1054.
15. Shao H., Zheng G., 2011, Boundedness and convergence of online gradient method with penalty and momentum, Neurocomputing, Vol. 74, No. 5, pp. 765-770.
16. Wang J., Wu W., Zurada J. M., 2012, Computational properties and convergence analysis of BPNN for cyclic and almost cyclic learning with penalty, Neural Networks, Vol. 33, pp. 127-135.
17. Yu X., Chen Q., 2012, Convergence of gradient method with penalty for Ridge Polynomial neural network, Neurocomputing, Vol. 97, pp. 405-409.
18. Fan Q., Zurada J. M., Wu W., 2014, Convergence of online gradient method for feedforward neural networks with smoothing L 1/2 regularization penalty, Neurocomputing, Vol. 131, pp. 208-216.
19. Wu W., Fan Q., Zurada J. M., Wang J., Yang D., Liu Y., 2014, Batch gradient method with smoothing L1/2 regularization for training of feedforward neural networks, Neural Networks, Vol. 50, pp. 72-78.
20. Leung C. S., Tsoi A.-C., Chan L. W., 2001, Two regularizers for recursive least squared algorithms in feedforward multilayered neural networks, Neural Networks, IEEE Transactions on, Vol. 12, No. 6, pp. 1314-1332.
21. Sum J., Chi-Sing L., Ho K., 2012, Convergence Analyses on On-Line Weight Noise Injection-Based Training Algorithms for MLPs, Neural Networks and Learning Systems, IEEE Transactions on, Vol. 23, No. 11, pp. 1827-1840.
22. Sum J. P., Chi-Sing L., Ho K. I. J., 2012, On-Line Node Fault Injection Training Algorithm for MLP Networks: Objective Function and Convergence Analysis, Neural Networks and Learning Systems, IEEE Transactions on, Vol. 23, No. 2, pp. 211-222.
23. Weigend A. S., Rumelhart D. E., Huberman B., 1991, Generalization by weightelimination applied to currency exchange rate prediction, in Neural Networks, IJCNN 1991 International Joint Conference on, Seattle, pp. 837-841.
24. Weigend A. S.Rumelhart D. E., 1992, Generalization through minimal networks with application to forecasting: Defense Technical Information Center.
25. Rakitianskaia A., Engelbrecht A., 2014, Weight regularization in particle swarm optimization neural network training, in Swarm Intelligence (SIS), 2014 IEEE Symposium on, pp. 1-8.
26. Thomas P., Suhner M. C., 2015, A new multilayer perceptron pruning algorithm for classification and regression applications, Neural Process Lett, pp. 1-22.
27. Xu Z., Zhang H., Wang Y., Chang X., 2010, L(1/2) regularization, Science China- Information Sciences, Vol. 53, No. 6, pp. 1159-1165.
28. Xu Z., Chang X., Xu F., Zhang H., 2012, L(1/2) Regularization: A Thresholding Representation Theory and a Fast Solver, Neural Networks and Learning Systems, IEEE Transactions on, Vol. 23, No. 7, pp. 1013-1027.
29. Yuan M., Lin Y., 2006, Model selection and estimation in regression with grouped variables, Journal of the Royal Statistical Society: Series B (Statistical Methodology), Vol. 68, pp. 49-67.