The paper is focused on the problem of multi-class classification of composite (piecewise-regular) objects (e.g., speech signals, complex images, etc.). We propose a mathematical model of composite object representation as a sequence of independent segments. Each segment is represented as a random sample of independent identically distributed feature vectors. Based on this model and a statistical approach, we reduce the task to a problem of composite hypothesis testing of segment homogeneity. Several nearest-neighbor criteria are implemented, and for some of them the well-known special cases (e.g., the Kullback–Leibler minimum information discrimination principle, the probabilistic neural network) are highlighted. It is experimentally shown that the proposed approach improves the accuracy when compared with contemporary classifiers.
Asadpour, V., Homayounpour, M.M. and Towhidkhah, F. (2011). Audio-visual speaker identification using dynamic facial movements and utterance phonetic content, Applied Soft Computing11(2): 2083–2093.
Benesty, J., Sondhi, M.M. and Huang, Y. (2008). Springer Handbook of Speech Processing, Springer, Berlin.
Borovkov, A.A. (1998). Mathematical Statistics, Gordon and Breach Science Publishers, Amsterdam.
Bottou, L., Fogelman Soulie, F., Blanchet, P. and Lienard, J. (1990). Speaker-independent isolated digit recognition: Multilayer perceptrons vs. dynamic time warping, Neural Networks3(4): 453–465.
Ciresan, D., Meier, U., Masci, J. and Schmidhuber, J. (2012). Multi-column deep neural network for traffic sign classification, Neural Networks32: 333–338.
Dalal, N. and Triggs, B. (2005). Histograms of oriented gradients for human detection, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2005, San Diego, CA, USA, pp. 886–893.
Gray, R., Buzo, A., Gray, A., Jr. and Matsuyama, Y. (1980). Distortion measures for speech processing, IEEE Transactions on Acoustics, Speech and Signal Processing28(4): 367–376.
Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T. and Kingsbury, B. (2012). Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups, IEEE Signal Processing Magazine29(6): 82–97.
Hinton, G.E., Osindero, S. and Teh, Y.-W. (2006). A fast learning algorithm for deep belief nets, Neural Computation18(7): 1527–1554.
Huang, J.-T., Li, J., Yu, D., Deng, L. and Gong, Y. (2013). Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers, Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2013, Vancouver, BC, Canada, pp. 7304–7308.
Janakiraman, R., Kumar, J. and Murthy, H. (2010). Robust syllable segmentation and its application to syllable-centric continuous speech recognition, Proceedings of the National Conference on Communications, NCC 2010, Chennai, India, pp. 1–5.
Kullback, S. (1997). Information Theory and Statistics, Dover Publications, New York, NY.
LeCun, Y., Bengio, Y. and Hinton, G. (2015). Deep learning, Nature521(7553): 436–444.
LeCun, Y., Bottou, L., Bengio, Y. and Haffner, P. (1998). Gradient-based learning applied to document recognition, Proceedings of the IEEE86(11): 2278–2324.
Liao, S., Zhu, X., Lei, Z., Zhang, L. and Li, S.Z. (2007). Learning multi-scale block local binary patterns for face recognition, in S.-W. Lee and S.Z. Li (Eds.), Advances in Biometrics, Lecture Notes in Computer Science, Vol. 4642, Springer, Berlin/Heidelberg, pp. 828–837.
Lowe, D.G. (2004). Distinctive image features from scale-invariant keypoints, International Journal of Computer Vision60(2): 91–110.
Martins, A.F.T., Figueiredo, M.A.T., Aguiar, P.M.Q., Smith, N.A. and Xing, E.P. (2008). Nonextensive entropic kernels, Proceedings of the 25th International Conference on Machine Learning, ICML ’2008, New York, NY, USA, pp. 640–647.
Merialdo, B. (1988). Multilevel decoding for very-large-size-dictionary speech recognition, IBM Journal of Research and Development32(2): 227–237.
Pfau, T. and Ruske, G. (1998). Estimating the speaking rate by vowel detection, Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 1998, Seattle, WA, USA, Vol. 2, pp. 945–948.
Rutkowski, L. (2008). Computational Intelligence: Methods and Techniques, Springer-Verlag, Berlin/Heidelberg.
Sas, J. and Żołnierek, A. (2013). Pipelined language model construction for Polish speech recognition, International Journal of Applied Mathematics and Computer Science23(3): 649–668, DOI: 10.2478/amcs-2013-0049.
Savchenko, A.V. (2013a). Phonetic words decoding software in the problem of Russian speech recognition, Automation and Remote Control74(7): 1225–1232.
Savchenko, A.V. (2013b). Probabilistic neural network with homogeneity testing in recognition of discrete patterns set, Neural Networks46: 227–241.
Savchenko, A.V. and Khokhlova, Y.I. (2014). About neural-network algorithms application in viseme classification problem with face video in audiovisual speech recognition systems, Optical Memory and Neural Networks (Information Optics)23(1): 34–42.
Świercz, E. (2010). Classification in the Gabor time-frequency domain of non-stationary signals embedded in heavy noise with unknown statistical distribution, International Journal of Applied Mathematics and Computer Science20(1): 135–147, DOI: 10.2478/v10006-010-0010-x.
Tan, X., Chen, S., Zhou, Z.-H. and Zhang, F. (2006). Face recognition from a single image per person: A survey, Pattern Recognition39(9): 1725–1745.
Theodoridis, S. and Koutroumbas, K. (2008). Pattern Recognition, 4th Edn., Academic Press, Burlington, MA/London.
Zhou, E., Cao, Z. and Yin, Q. (2015). Naive-deep face recognition: Touching the limit of LFW benchmark or not?, CoRRabs/1501.04690.