## Abstract

Learning vector quantization (LVQ) is one of the most powerful approaches for prototype based classification of vector data, intuitively introduced by Kohonen. The prototype adaptation scheme relies on its attraction and repulsion during the learning providing an easy geometric interpretability of the learning as well as of the classification decision scheme. Although deep learning architectures and support vector classifiers frequently achieve comparable or even better results, LVQ models are smart alternatives with low complexity and computational costs making them attractive for many industrial applications like intelligent sensor systems or advanced driver assistance systems.

Nowadays, the mathematical theory developed for LVQ delivers sufficient justification of the algorithm making it an appealing alternative to other approaches like support vector machines and deep learning techniques.

This review article reports current developments and extensions of LVQ starting from the generalized LVQ (GLVQ), which is known as the most powerful cost function based realization of the original LVQ. The cost function minimized in GLVQ is an soft-approximation of the standard classification error allowing gradient descent learning techniques. The GLVQ variants considered in this contribution, cover many aspects like bordersensitive learning, application of non-Euclidean metrics like kernel distances or divergences, relevance learning as well as optimization of advanced statistical classification quality measures beyond the accuracy including sensitivity and specificity or area under the ROC-curve.

According to these topics, the paper highlights the basic motivation for these variants and extensions together with the mathematical prerequisites and treatments for integration into the standard GLVQ scheme and compares them to other machine learning approaches. For detailed description and mathematical theory behind all, the reader is referred to the respective original articles.

Thus, the intention of the paper is to provide a comprehensive overview of the stateof- the-art serving as a starting point to search for an appropriate LVQ variant in case of a given specific classification problem as well as a reference to recently developed variants and improvements of the basic GLVQ scheme.

## References

[1] Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. Nature, 521:436-444, May 2015.

[2] P.J. Werbos. Beyond Regression: New Tools for Prediction and Analysis in the Behavorial Sciences. PhD thesis, Havard University, Cambridge, MA., 1974.

[3] G. Cybenko. Approximations by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems, 2(4): 303-314, 1989.

[4] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer Verlag, Heidelberg-Berlin, 2001.

[5] Y. Bengio. Learning deep architectures for AI. Foundations and Trends in Machine Learning, 2(1):1-127, 2009.

[6] Simon Haykin. Neural Networks - A Comprehensive Foundation. IEEE Press, New York, 1994.

[7] C.M. Bishop. Pattern Recognition and Machine Learning. Springer, 2006.

[8] R.O. Duda and P.E. Hart. Pattern Classification and Scene Analysis. Wiley, New York, 1973.

[9] K.L. Oehler and R.M. Gray. Combining image compressing and classification using vector quantization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(5):461-473, 1995.

[10] M. Biehl, B. Hammer, and T. Villmann. Prototypebased models in machine learning. Wiley Interdisciplinary Reviews: Cognitive Science, 7(2):92-111, 2016.

[11] P. L. Zador. Asymptotic quantization error of continuous signals and the quantization dimension. IEEE Transaction on Information Theory, IT-28:149-159, 1982.

[12] Y. Linde, A. Buzo, and R.M. Gray. An algorithm for vector quantizer design. IEEE Transactions on Communications, 28:84-95, 1980.

[13] T. Lehn-Schiler, A. Hegde, D. Erdogmus, and J.C. Principe. Vector quantization using information theoretic concepts. Natural Computing, 4(1):39-51, 2005.

[14] J.C. Principe. Information Theoretic Learning. Springer, Heidelberg, 2010.

[15] Teuvo Kohonen. Self-Organizing Maps, volume 30 of Springer Series in Information Sciences. Springer, Berlin, Heidelberg, 1995. (Second Extended Edition 1997).

[16] Thomas M. Martinetz, Stanislav G. Berkovich, and Klaus J. Schulten. ’Neural-gas’ network for vector quantization and its application to time-series prediction. IEEE Trans. on Neural Networks, 4(4):558-569, 1993.

[17] B. Schlkopf and A. Smola. Learning with Kernels. MIT Press, Cambridge, 2002.

[18] Teuvo Kohonen. Learning vector quantization for pattern recognition. Report TKK-F-A601, Helsinki University of Technology, Espoo, Finland, 1986.

[19] Teuvo Kohonen. Learning Vector Quantization. Neural Networks, 1(Supplement 1):303, 1988.

[20] Teuvo Kohonen. Improved versions of Learning Vector Quantization. In Proc. IJCNN-90, International Joint Conference on Neural Networks, San Diego, volume I, pages 545-550, Piscataway, NJ, 1990. IEEE Service Center.

[21] D. Nova and P.A. Est´evez. A review of learning vector quantization classifiers. Neural Computation and Applications, 25(511-524), 2013.

[22] M. Kaden, M. Lange, D. Nebel, M. Riedel, T. Geweniger, and T. Villmann. Aspects in classification learning - Review of recent developments in Learning Vector Quantization. Foundations of Computing and Decision Sciences, 39(2):79-105, 2014.

[23] B. Fritzke. The LBG-U method for vector quantization - an improvement over LBG inspired from neural networks. Neural Processing Letters, 5(1):35-45, 1997.

[24] H.-U. Bauer and Th. Villmann. Growing a Hypercubical Output Space in a Self-Organizing Feature Map. IEEE Transactions on Neural Networks, 8(2):218-226, 1997.

[25] F. Hamker. Life-long learning cell structures - continuously learning without catastrophic interference. Neural Networks, 14:551-573, 2001.

[26] H. Robbins and S. Monro. A stochastic approximation method. Ann. Math. Stat., 22:400-407, 1951.

[27] H.J. Kushner and D.S. Clark. Stochastic Approximation Methods for Constrained and Unconstrained Systems. Springer-Verlag, New York, 1978.

[28] S. Graf and H. Luschgy. Foundations of Quantization for Probability Distributions, volume 1730 of Lect. Notes in Mathematics. Springer, Berlin, 2000.

[29] G. Voronoi. Nouvelles aoolications des parametres la theorie des formes quadratiques. deuxime mmorie: Recherches sur les paralllodres primitifs. J. reine angew. Math., 134:198-287, 1908.

[30] A. Sato and K. Yamada. Generalized learning vector quantization. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems 8. Proceedings of the 1995 Conference, pages 423-9. MIT Press, Cambridge, MA, USA, 1996.

[31] K. Crammer, R. Gilad-Bachrach, A. Navot, and A.Tishby. Margin analysis of the LVQ algorithm. In S. Becker, S. Thrun, and K. Obermayer, editors, Advances in Neural Information Processing (Proc. NIPS 2002), volume 15, pages 462-469, Cambridge, MA, 2003. MIT Press.

[32] S. Seo and K. Obermayer. Soft learning vector quantization. Neural Computation, 15:1589-1604, 2003.

[33] S. Seo, M. Bode, and K. Obermayer. Soft nearest prototype classification. IEEE Transaction on Neural Networks, 14:390-398, 2003.

[34] A. Boubezoul, S. Paris, and M. Ouladsine. Application of the cross entropy method to the GLVQ algorithm. Pattern Recognition, 41:3173-3178, 2008.

[35] B. Hammer, M. Strickert, and T. Villmann. Supervised neural gas with general similarity measure. Neural Processing Letters, 21(1):21-44, 2005.

[36] A.K. Qin and P.N. Suganthan. Initialization insensitive LVQ algorithm based on cost-function adaptation. Pattern Recognition, 38:773-776, 2004.

[37] Keren O. Perlmutter, Sharon M. Perlmutter, Robert M. Gray, Richard A. Olshen, and Karen L. Oehler. Bayes risk weighted vector quantization with posterior estimation for image compression and classification. IEEE Trans. on Image Processing, 5(2):347-360, February 1996.

[38] B. Hammer, D. Nebel, M. Riedel, and T. Villmann. Generative versus discriminative prototype based classification. In T. Villmann, F.-M. Schleif, M. Kaden, and M. Lange, editors, Advances in Self- Organizing Maps and Learning Vector Quantization: Proceedings of 10th InternationalWorkshopWSOM 2014, Mittweida, volume 295 of Advances in Intelligent Systems and Computing, pages 123-132, Berlin, 2014. Springer.

[39] M. Kaden, M. Riedel, W. Hermann, and T. Villmann. Border-sensitive learning in generalized learning vector quantization: an alternative to support vector machines. Soft Computing, 19(9):2423-2434, 2015.

[40] E. Pekalska and R.P.W. Duin. The Dissimilarity Representation for Pattern Recognition: Foundations and Applications. World Scientific, 2006.

[41] T. Villmann, M. Kaden, D. Nebel, and A. Bohnsack. Data similarities, dissimilarities and types of inner products - a mathematical characterization in the context of machine learning. Machine Learning Reports, 9(MLR-04-015):19-29, 2015. ISSN:1865-3960, http://www.techfak.unibielefeld.de/˜fschleif/mlr/mlr042015.pdf.

[42] M. Lange, D. Zühlke, O. Holz, and T. Villmann. Applications of lp-norms and their smooth approximations for gradient based learning vector quantization. In M. Verleysen, editor, Proc. of European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN’2014), pages 271-276, Louvain-La- Neuve, Belgium, 2014. i6doc.com.

[43] K. Bunte, F.-M. Schleif, and M. Biehl. Adaptive learning for complex-valued data. In M. Verleysen, editor, Proc. of European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN’2012), pages 381-386, Louvain-La-Neuve, Belgium, 2012. i6doc.com.

[44] M. Gay, M. Kaden, M. Biehl, A. Lampe, and T. Villmann. Complex variants of GLVQ based on Wirtingers calculus. In E. Mer´enyi, M.J. Mendenhall, and P. O’Driscoll, editors, Advances in Self- Organizing Maps and Learning Vector Quantization: Proceedings of 11th International Workshop WSOM 2016, volume 428 of Advances in Intelligent Systems and Computing, pages 293-303, Berlin-Heidelberg, 2016. Springer.

[45] T. Villmann and S. Haase. Divergence based vector quantization. Neural Computation, 23(5):1343-1392, 2011.

[46] E. Mwebaze, P. Schneider, F.-M. Schleif, J.R. Aduwo, J.A. Quinn, S. Haase, T. Villmann, and M. Biehl. Divergence based classification in learning vector quantization. Neurocomputing, 74(9):1429-1435, 2011.

[47] M. Kästner, B. Hammer, M. Biehl, and T. Villmann. Functional relevance learning in generalized learning vector quantization. Neurocomputing, 90(9):85-95, 2012.

[48] F. Rossi, N. Delannay, B. Conan-Gueza, and M. Verleysen. Representation of functional data in neural networks. Neurocomputing, 64:183-210, 2005.

[49] F. Melchert, U. Seiffert, and M. Biehl. Functional representation of prototypes in lvq and relevance learning. In E. Mer´enyi, M.J. Mendenhall, and P. O’Driscoll, editors, Advances in Self-Organizing Maps and Learning Vector Quantization: Proceedings of 11th International Workshop WSOM 2016, volume 428 of Advances in Intelligent Systems and Computing, pages 317-327, Berlin-Heidelberg, 2016. Springer.

[50] M. Strickert, U. Seiffert, N. Sreenivasulu, W. Weschke, T. Villmann, and B. Hammer. Generalized relevance LVQ (GRLVQ) with correlation measures for gene expression analysis. Neurocomputing, 69(6-7):651-659, March 2006.

[51] S. Saralajew and T. Villmann. Adaptive tangent metrics in generalized learning vector quantization for transformation and distortion invariant classification learning. In Proceedings of the International Joint Conference on Neural networks (IJCNN) , Vancover, pages 2672-2679. IEEE Computer Society Press, 2016.

[52] S. Saralajew, D. Nebel, and T. Villmann. Adaptive Hausdorff distances and tangent distance adaptation for transformation invariant classification learning. In A. Hirose, editor, Proceedings of the International Conference on Neural Information Processing (ICONIP) , Kyoto, volume 9949 of LNCS, pages 362-371. Springer, 2016.

[53] I. Steinwart. On the influence of the kernel on the consistency of support vector machines. Journal of Machine Learning Research, 2:67-93, 2001.

[54] I. Steinwart and A. Christmann. Support Vector Machines. Information Science and Statistics. Springer Verlag, Berlin-Heidelberg, 2008.

[55] A.K. Qin and P.N. Suganthan. A novel kernel prototype-based learning algorithm. In Proceedings of the 17th International Conference on Pattern Recognition (ICPR’04), volume 4, pages 621-624, 2004.

[56] F.-M. Schleif, T. Villmann, B. Hammer, and P. Schneider. Efficient kernelized prototype based classification. International Journal of Neural Systems, 21(6):443-457, 2011.

[57] T. Villmann, S. Haase, and M. Kaden. Kernelized vector quantization in gradient-descent learning. Neurocomputing, 147:83-95, 2015.

[58] D. Hofmann, A. Gisbrecht, and B. Hammer. Efficient approximations of robust soft learning vector quantization for non-vectorial data. Neurocomputing, 147:96-106, 2015.

[59] D. Nebel, M. Kaden, A. Bohnsack, and T. Villmann. Types of (dis-)similarities and adaptive mixtures thereof for improved classification learning. Neurocomputing, page in press, 2017.

[60] B. Hammer, D. Hofmann, F.-M. Schleif, and X. Zhu. Learning vector quantization for (dis- )similarities. Neurocomputing, 131:43-51, 2014.

[61] D. Nebel, B. Hammer, K. Frohberg, and T. Villmann. Median variants of learning vector quantization for learning of dissimilarity data. Neurocomputing, 169:295-305, 2015.

[62] B. Hammer and T. Villmann. Generalized relevance learning vector quantization. Neural Networks, 15(8-9):1059-1068, 2002.

[63] B. Hammer, M. Strickert, and T. Villmann. On the generalization ability of GRLVQ networks. Neural Processing Letters, 21(2):109-120, 2005.

[64] T. Villmann, M. Kästner, D. Nebel, and M. Riedel. Lateral enhancement in adaptative metric learning for functional data. Neurocomputing, 131:23-31, 2014.

[65] P. Schneider, B. Hammer, and M. Biehl. Adaptive relevance matrices in learning vector quantization. Neural Computation, 21:3532-3561, 2009.

[66] P. Schneider, K. Bunte, H. Stiekema, B. Hammer, T. Villmann, and Michael Biehl. Regularization in matrix relevance learning. IEEE Transactions on Neural Networks, 21(5):831-840, 2010.

[67] M. Biehl, B. Hammer, F.-M. Schleif, P. Schneider, and T. Villmann. Stationarity of matrix relevance LVQ. In Proc. of the International Joint Conference on Neural Networks 2015 (IJCNN), pages 1-8, Los Alamitos, 2015. IEEE Computer Society Press.

[68] K. Bunte, P. Schneider, B. Hammer, F.-M. Schleif, T. Villmann, and M. Biehl. Limited rank matrix learning, discriminative dimension reduction and visualization. Neural Networks, 26(1):159-173, 2012.

[69] E. Mwebaze, G. Bearda, M. Biehl, and D. Zühlke. Combining dissimilarity measures for prototypebased classification. In M. Verleysen, editor, Proceedings of the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN’2015), pages 31-36, Louvain-La-Neuve, Belgium, 2015. i6doc.com.

[70] D. Zühlke, F.-M. Schleif, T. Geweniger, S. Haase, and T. Villmann. Learning vector quantization for heterogeneous structured data. In M. Verleysen, editor, Proc. of European Symposium on Artificial Neural Networks (ESANN’2010), pages 271-276, Evere, Belgium, 2010. d-side publications.

[71] J. Schmidhuber. Deep learning in neural networks: An overview. Neural Networks, 61:85-117, 2015.

[72] U. Knauer, A. Backhaus, and U. Seiffert. Beyond standard metrics - on the selection and combination of distance metrics for an improved classification of hyperspectral data. In T. Villmann, F.-M. Schleif, M. Kaden, and M. Lange, editors, Advances in Self- Organizing Maps and Learning Vector Quantization: Proceedings of 10th InternationalWorkshopWSOM 2014, Mittweida, volume 295 of Advances in Intelligent Systems and Computing, pages 167-177, Berlin, 2014. Springer.

[73] M. Kaden, D. Nebel, and T. Villmann. Adaptive dissimilarity weighting for prototype-based classification optimizing mixtures of dissimilarities. In M. Verleysen, editor, Proceedings of the European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN’2016), pages 135-140, Louvain-La- Neuve, Belgium, 2016. i6doc.com.

[74] D.G. Lowe. Object recognition from local scaleinvariant features. In The Proceedings of the Seventh IEEE International Conference on Computer Vision, volume 2, pages 1150-1157, 1999.

[75] D.G. Lowe. Distinctive image features from scaleinvariant keypoints. International Journal of Computer Vision, 60(2):91-110, 2004.

[76] P. Simard, Y. LeCun, and J.S. Denker. Efficient pattern recognition using a new transformation distance. In S.J. Hanson, J.D. Cowan, and C.L. Giles, editors, Advances in Neural Information Processing Systems 5, pages 50-58. Morgan-Kaufmann, 1993.

[77] T. Hastie, P. Simard, and E. S¨ackinger. Learning prototype models for tangent distance. In G. Tesauro, D.S. Touretzky, and T.K. Leen, editors, Advances in Neural Information Processing Systems 7, pages 999-1006. MIT Press, 1995.

[78] S.J. Pan and Q. Yang. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10):13-451359, 2010.

[79] Y. Bengio, A. Courville, and P. Vincent. Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8):1798-1828, 2013.

[80] C. Prahm, B. Paassen, A. Schulz, B. Hammer, and O. Aszmann. Transfer learning for rapid recalibration of a myoelectric prosthesis after electrode shift. In J. Ibanez, J. Gonzales-Vargas, J.M. Azorin, M.Akay, and J.L. Pons, editors, Proceedings of the 3rd International Conference on NeuroRehabilitation (ICNR2016), volume 15 of Biosystems and Biorobotics, pages 153-157. Springer, 2016.

[81] Y.Tang, Y.Q. Zangh, N.V. Chawla, and S. Krasser. SVMs modeling for highly imbalanced classification. IEEE Transactions on Systems Man and Cybernetics, Part B, 39(1):281-288, 2009.

[82] T. Fawcett. An introduction to ROC analysis. Pattern Recognition Letters, 27:861-874, 2006.

[83] P. Baldi, S. Brunak, Y. Chauvin, and C. Andersen H. Nielsen. Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics, 16(5):412-424, 2000.

[84] L. Sachs. Angewandte Statistik. Springer Verlag, 7-th edition, 1992.

[85] C.J. Rijsbergen. Information Retrieval. Butterworths, London, 2nd edition edition, 1979.

[86] M. Kaden, W. Hermann, and T. Villmann. Optimization of general statistical accuracy measures for classification based on learning vector quantization. In M. Verleysen, editor, Proc. of European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN’2014), pages 47-52, Louvain-La-Neuve, Belgium, 2014. i6doc.com.

[87] A.P. Bradley. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30(7):1149-1155, 1997.

[88] J. Keilwagen, I. Grosse, and J. Grau. Area under precision-recall curves for weighted and unweighted data. PLOS|ONE, 9(3 / e92209):1-13, 2014.

[89] S. Vanderlooy and E. H¨ullermeier. A critical analysis of variants of the AUC. Machine Learning, 72:247-262, 2008.

[90] T. Villmann, M. Kaden, W. Hermann, and M. Biehl. Learning vector quantization classifiers for ROC-optimization. Computational Statistics, 2016.

[91] J.A. Hanley and B.J. McNeil. The meaning and use of the area under a receiver operating characteristic. Radiology, 143:29-36, 1982.

[92] U. Brefeld and T. Scheffer. AUC maximizing support vector learning. In Proceedings of ICML 2005 workshop on ROC Analysis in Machine Learning, pages 377-384, 2005.

[93] T. Calders and S. Jaroszewicz. Efficient AUC optimization for classification. In J.N. Kok, J. Koronacki, R. Lopez de Mantaras, S. Matwin, D. Mladenic, and A. Skowron, editors, Knowledge Discovery in Databases: PKDD 2007, volume 4702 of LNCS, pages 42-53. Springer-Verlag, 2007.

[94] M. Biehl, M. Kaden, P. St¨urmer, and T. Villmann. ROC-optimization and statistical quality measures in learning vector quantization classifiers. Machine Learning Reports, 8(MLR-01-2014):23-34, 2014. ISSN:1865-3960, http://www.techfak.unibielefeld.de/˜fschleif/mlr/mlr012014.pdf.

[95] R. Senge, S. Bösner, Dembczyński K, J. Haasenritter, O. Hirsch, N. Donner-Banzhoff, and E. H¨ullermeier. Reliable classification: Learning classifiers that distinguish aleatoric and epistemic uncertainty. Information Sciences, 255:16-29, 2014.

[96] A. Vailaya and A.K. Jain. Reject option for VQbased Bayesian classification. In International Conference on Pattern Recognition (ICPR), pages 2048-2051, 2000.

[97] L. Fischer, B. Hammer, and H. Wersing. Efficient rejection strategies for prototype-based classification. Neurocomputing, 169:334-342, 2015.

[98] G. Fumera, F. Roli, and G. Giacinto. Reject option with multiple thresholds. Pattern Recognition, 33(12):2099-2101, 2000.

[99] I. Pillai, G. Fumera, and F. Roli. Multi-label classification with a reject option. Pattern Recognition, 46:2256-2266, 2013.

[100] R. Herbei and M.H. Wegkamp. Classification with reject option. The Canadian Journal of Statistics, 34(4):709-721, 2006.

[101] P. L. Bartlett and M.H. Wegkamp. Classification with a reject option using a hinge loss. Journal of Machine Learning Research, 9:1823-1840, 2008.

[102] M. Yuan and M.H. Wegkamp. Classification methods with reject option based on convex risk minimization. Journal of Machine Learning Research, 11:111-130, 2010.

[103] L.P. Cordella, C. deStefano, C. Sansone, and M. Vento. An adaptive reject option for LVQ classifiers. In C. Braccini, L. deFloriani, and G. Vernazza, editors, Proceedings of the International Conference on Image Analysis and Processing (ICIAP), San Remo, volume 974 of LNCS, pages 68-73, Berlin, 1995. Springer.

[104] J. Suutala, S. Pirttikangas, J. Riekki, and J. R¨oning. Reject-optional LVQ-based two-level classifier to improve reliability in footstep identification. In A. Ferscher and F. Mattern, editors, Pervasive Computing, Proccedings on the Second International Conference PERVASIVE, Vienna, volume 3001 of LNCS, pages 182-187. Springer, 2004.

[105] G. Fumera and F. Roli. Support vector machines with embedded reject option. In S.-W. Lee and A. Verri, editors, Proceeedings of the First Interantional Workshop on Pattern Recognition with Support Vector Machines, Niagara Falls, volume 2388 of LNCS, pages 68-82. Springer, 2002.

[106] C.K. Chow. On optimum recognition error and reject tradeoff. IEEE Transactions in Information Theory, 16(1):41-46, 1970.

[107] C.K. Chow. An optimum character recognition system using decision functions. IRE Transactions on Electronic Computers, EC-6:247-254, 1957.

[108] T. Villmann, M. Kaden, D. Nebel, and M. Biehl. Learning vector quantization with adaptive costbased outlier-rejection. In G. Azzopardi and N. Petkov, editors, Proceedings of 16th International Conference on Computer Analysis of Images and Pattern, CAIP 2015, Valetta - Malta, volume Part II of LNCS 9257, pages 772 - 782, Berlin-Heidelberg, 2015. Springer.

[109] T. Villmann, M. Kaden, A. Bohnsack, S. Saralajew, J.-M. Villmann, T. Drogies, and B. Hammer. Self-adjusting reject options in prototype based classification. In E. Merényi, M.J. Mendenhall, and P. O’Driscoll, editors, Advances in Self-Organizing Maps and Learning Vector Quantization: Proceedings of 11th International Workshop WSOM 2016, volume 428 of Advances in Intelligent Systems and Computing, pages 269-279, Berlin-Heidelberg, 2016. Springer.

[110] L. Fischer and T. Villmann. A probabilistic classifier model with adaptive rejection option. Machine Learning Reports, 10(MLR-01-2016):1-16, 2016. ISSN:1865-3960, http://www.techfak.unibielefeld.de/˜fschleif/mlr/mlr012016.pdf.

[111] V. Vovk, A. Gammerman, and G. Shafer. Algorithmic learning in a random world. Springer, Berlin, 2005.

[112] X. Zhu, F.-M. Schleif, and B. Hammer. Adaptive conformal semi-supervised vector quantization for dissimilarity data. Pattern Recognition Letters, 49:138-145, 2014.

[113] D. Erhan, Y. Bengio, A. Courville, P.-A. Manzagol, and P. Vincent. Why does unsupervised pretraining help deep learning. Journal of Machine Learning Research, 11:625-660, 2010.

[114] D. Ciresan, U. Meier, J. Masci, and J. Schmidhuber. Multi-column deep neural network for traffic sign classification. Neural Networks, 32:333-338, 2012.

[115] Helge Ritter, Thomas Martinetz, and Klaus Schulten. Neural Computation and Self-Organizing Maps: An Introduction. Addison-Wesley, Reading, MA, 1992.