A primal sub-gradient method for structured classification with the averaged sum loss

Bach, F. and Moulines, E. (2011). Non-asymptotic analysis of stochastic approximation algorithms for machine learning, in J. Shawe-Taylor, R.S. Zemel, P.L. Bartlett, F. Pereira and K.Q. Weinberger (Eds.), Advances in Neural Information Processing Systems (NIPS), Curran Associates, Inc., Red Hook, NY, pp. 451-459.Search in Google Scholar

Balamurugan, P., Shevade, S., Sundararajan, S. and Keerthi, S.S. (2011). A sequential dual method for structural SVMs, SDM 2011-Proceedings of the 11th SIAM International Conference on Data Mining, Mesa, AZ, USA.10.1137/1.9781611972818.20Search in Google Scholar

Bottou, L. (2008). SGD implementation, http://leon.bottou.org/projects/sgd.Search in Google Scholar

Boyd, S. and Vandenberghe, L. (2004). Convex Optimization, Cambridge University Press, New York, NY.10.1017/CBO9780511804441Search in Google Scholar

Collins, M. (2002). Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms, Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, Vol. 10, Association for Computational Linguistics, Stroudsburg, PA, pp. 1-8.Search in Google Scholar

Collins, M., Globerson, A., Koo, T., Carreras, X. and Bartlett, P.L. (2008). Exponentiated gradient algorithms for conditional random fields and max-margin Markov networks, Journal of Machine Learning Research 9: 1775-1822.Search in Google Scholar

Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S. and Singer, Y. (2006). Online passive-aggressive algorithms, Journal of Machine Learning Research 7: 551-585.Search in Google Scholar

Crammer, K., McDonald, R. and Pereira, F. (2005). Scalable large-margin online learning for structured classification NIPSWorkshop on Learning with Structured Outputs, Vancouver/ Whistler, Canada.Search in Google Scholar

Daume, III, H.C. (2006). Practical Structured Learning Techniques for Natural Language Processing, Ph.D. thesis, University of Southern California, Los Angeles, CA.Search in Google Scholar

Do, C.B., Le, Q.V., Teo, C.H., Chapelle, O. and Smola, A.J. (2008). Tighter bounds for structured estimation, in D. Koller (Ed.), Advances in Neural Information Processing Systems, Curran Associates, Inc., Red Hook, NY, pp. 281-288.Search in Google Scholar

Gimpel, K. and Smith, N.A. (2010). Softmax-margin CRFs: Training log-linear models with cost functions, Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Los Angeles, CA, USA, pp. 733-736.Search in Google Scholar

Jaggi, M., Lacoste-Julien, S., Schmidt, M. and Pletscher, P. (2012). Block-coordinate Frank-Wolfe for structural SVMS, NIPS Workshop on Optimization for Machine Learning, Lake Tahoe, NV, USA.Search in Google Scholar

Joachims, T., Finley, T. and Yu, C.-N.J. (2009). Cutting-plane training of structural SVMs, Machine Learning 77(1): 27-59.10.1007/s10994-009-5108-8Search in Google Scholar

Lafferty, J.D., McCallum, A. and Pereira, F.C.N. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data, Proceedings of the 18th International Conference on Machine Learning, ICML’01, San Francisco, CA, USA, pp. 282-289.Search in Google Scholar

Lee, C., Ryu, P.-M. and Kim, H. (2011). Named entity recognition using a modified Pegasos algorithm, Proceedings of the 20th ACM International Conference on Information and Knowledge Management, Glasgow, UK, pp. 2337-2340.Search in Google Scholar

Li, M., Lin, L., Wang, X. and Liu, T. (2007). Protein-protein interaction site prediction based on conditional random fields, Bioinformatics 23(5): 597-604.10.1093/bioinformatics/btl66017234636Search in Google Scholar

Lim, S., Lee, C. and Ra, D. (2013). Dependency-based semantic role labeling using sequence labeling with a structural SVM, Pattern Recognition Letters 34(6): 696-702.10.1016/j.patrec.2013.01.022Search in Google Scholar

Martins, A.F.T., Smith, N.A., Xing, E.P., Aguiar, P.M.Q. and Figueiredo, M.A.T. (2011). Online learning of structured predictors with multiple kernels, Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, Vol. 15, pp. 507-515.Search in Google Scholar

McDonald, R., Crammer, K. and Pereira, F. (2005). Online large-margin training of dependency parsers, Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, ACL’05, Ann Arbor, MI, USA, pp. 91-98.Search in Google Scholar

Nagata, M. (1994). A stochastic Japanese morphological analyzer using a forward-DP backward-A* N-best search algorithm, Proceedings of the 15th Conference on Computational Linguistics, COLING ’94, Kyoto, Japan, Vol. 1, pp. 201-207.Search in Google Scholar

Nemirovski, A., Juditsky, A., Lan, G. and Shapiro, A. (2009). Robust stochastic approximation approach to stochastic programming, SIAM Journal on Optimization 19(4): 1574-1609.10.1137/070704277Search in Google Scholar

Ni, Y., Saunders, C., Szedmak, S. and Niranjan, M. (2010). The application of structured learning in natural language processing, Machine Translation 24(2): 71-85.10.1007/s10590-010-9078-1Search in Google Scholar

Nowozin, S. and Lampert, C.H. (2011). Structured learning and prediction in computer vision, Foundations and Trends in Computer Graphics and Vision 6(3-4): 185-365.10.1561/0600000033Search in Google Scholar

Platt, J.C. (1999). Fast training of support vector machines using sequential minimal optimization, in B. Schölkopf, C.J.C.Search in Google Scholar

Burges and A.J. Smola (Eds.), Advances in Kernel Methods, MIT Press, Cambridge, MA, pp. 185-208.Search in Google Scholar

Rakhlin, A., Shamir, O. and Sridharan, K. (2012). Making gradient descent optimal for strongly convex stochastic optimization, in J. Langford and J. Pineau (Eds.), Proceedings of the 29th International Conference on Machine Learning (ICML-12), Edinburgh, UK, pp. 449-456.Search in Google Scholar

Ratliff, N.D., Bagnell, J.A. and Zinkevich, M.A. (2006). Subgradient methods for maximum margin structured learning, ICML Workshop on Learning in Structured Output Spaces, Pittsburgh, PA, USA.Search in Google Scholar

Sas, J. and Żołnierek, A. (2013). Pipelined language model construction for Polish speech recognition, International Journal of Applied Mathematics and Computer Science 23(3): 649-668, DOI: 10.2478/amcs-2013-0049.10.2478/amcs-2013-0049Search in Google Scholar

Shalev-Shwartz, S., Singer, Y. and Srebro, N. (2007). Pegasos: Primal estimated sub-gradient solver for SVM, Proceedings of the 24th International Conference on Machine Learning, ICML ’07, Corvalis, OR, USA, pp. 807-814.Search in Google Scholar

Shalev-Shwartz, S., Singer, Y., Srebro, N. and Cotter, A. (2011).Search in Google Scholar

Pegasos: Primal estimated sub-gradient solver for SVM, Mathematical Programming 127(1): 3-30.10.1007/s10107-010-0420-4Search in Google Scholar

Shamir, O. (2012). Open problem: Is averaging needed for strongly convex stochastic gradient descent?, Journal of Machine Learning Research 23: 47-1.Search in Google Scholar

Shamir, O. and Zhang, T. (2012). Stochastic gradient descent for non-smooth optimization: Convergence results and optimal averaging schemes, arXiv preprint, arXiv:1212.1824.Search in Google Scholar

Soong, F.K. and Huang, E.-F. (1991). A tree-trellis based fast search for finding the N-best sentence hypotheses in continuous speech recognition, Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Toronto, Canada, Vol. 1, pp. 705-708.Search in Google Scholar

Taskar, B., Guestrin, C. and Koller, D. (2004). Max-margin Markov networks, in S. Thrun, L. Saul and B. Schölkopf (Eds.), Advances in Neural Information Processing Systems 16, MIT Press, Cambridge, MA, pp. 25-32.Search in Google Scholar

Tjong Kim Sang, E.F. and Buchholz, S. (2000). Introduction to the CoNLL-2000 shared task: Chunking, Proceedings of the 2nd Workshop on Learning Language in Logic/4th Conference on Computational Natural Language Learning, Lisbon, Portugal, Vol. 7, pp. 127-132.Search in Google Scholar

Tsochantaridis, I., Joachims, T., Hofmann, T. and Altun, Y. (2005). Large margin methods for structured and interdependent output variables, Journal of Machine Learning Research 6: 1453-1484. Viterbi, A. (1967). Error bounds for convolutional codes and an asymptotically optimum decoding algorithm, IEEE Transactions on Information Theory 13(2): 260-269.Search in Google Scholar

Weston, J. and Watkins, C. (1998). Multi-class support vector machines, Technical report, Department of Computer Science, Royal Holloway, University of London, London.Search in Google Scholar

Xu, W. (2011). Towards optimal one pass large scale learning with averaged stochastic gradient descent, arXiv preprint, arXiv:1107.2490. Search in Google Scholar

eISSN:: 2083-8492
Language:: English

Publication timeframe:: 4 times per year
Journal Subjects:: Mathematics, Applied Mathematics

Journal RSS Feed

A primal sub-gradient method for structured classification with the averaged sum loss

Published Online: Dec 20, 2014

Page range: 917 - 930

Received: Nov 05, 2013

DOI: https://doi.org/10.2478/amcs-2014-0067

© by Dejan Mančev

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.