Pipelined language model construction for Polish speech recognition

Open access


The aim of works described in this article is to elaborate and experimentally evaluate a consistent method of Language Model (LM) construction for the sake of Polish speech recognition. In the proposed method we tried to take into account the features and specific problems experienced in practical applications of speech recognition in the Polish language, reach inflection, a loose word order and the tendency for short word deletion. The LM is created in five stages. Each successive stage takes the model prepared at the previous stage and modifies or extends it so as to improve its properties. At the first stage, typical methods of LM smoothing are used to create the initial model. Four most frequently used methods of LM construction are here. At the second stage the model is extended in order to take into account words indirectly co-occurring in the corpus. At the next stage, LM modifications are aimed at reduction of short word deletion errors, which occur frequently in Polish speech recognition. The fourth stage extends the model by insertion of words that were not observed in the corpus. Finally the model is modified so as to assure highly accurate recognition of very important utterances. The performance of the methods applied is tested in four language domains.

Brown, P., de Souza, P.V., Mercer, R.L., Pietra, V.J.D. and Lai, J.C. (1992). Class-based n-gram models of natural language, Computational Linguistics 18(1): 467-479.

Brychcin, T. and Konopik, M. (2011). Morphological based language models for inflectional languages, Proceedings ofthe 6th IEEE International Conference on Intelligent DataAcquisition and Advanced Computing Systems, Praque,Czech Republic, pp. 560-563.

Chen, S. and Goodman, S. (1999). An empirical study of smoothing techniques for language modeling, ComputerSpeech and Language 1(13): 359-394.

Chen, Y. and Chan, K. (2003). Extended multi-word trigger pair language model using data mining technique, Systems,Man and Cybernetics 1(1): 262-267.

Devine, E., Gaehde, S. and Curtis, A. (2007). Comparative evaluation of three continuous speech recognition software packages in the generation of medical reports, Journal ofAmerican Medical Informatics Association 1(7): 462-468.

Gale, A. and Sampson, G. (1995). Good-Turing frequency estimation without tears, Journal of Quantitative Linguistics2(1): 217-239.

Goodman, J. (2001). A bit of progress in language modeling extended version, Technical Report MSR-TR-2001-72, Machine Learning and Applied Statistics Group, Microsoft Research, Redmond, WA.

Iyer, R. and Ostendorf, M. (1999). Modeling long distance dependence in language: Topic mixtures versus dynamic cache models, IEEE Transactions on Speech and AudioProcessing 7(1): 30-39.

Jelinek, F., Merialdo, B., Roukos, S. and Strauss, M. (2001). A dynamic language model for speech recognition, Proceedingsof the Workshop on Speech and Natural Language,HLT’91, Pacific Grove, CA, USA, pp. 293-295.

Jurafsky, D. and Matrin, J. (2009). Speech and Language Processing. An Introduction to Natural Language Processing,Computational Linguistics and Speech Recognition, Pearson Prentice Hall, Englewood Cliffs, NJ.

Kasprzak, W., Wilkowski, A. and Czapnik, K. (2012). Hand gesture recognition based on free-form contours and probabilistic inference, International Journal of AppliedMathematics and Computer Science 22(2): 437-448, DOI: 10.2478/v10006-012-0033-6.

Katz, S. (1987). Estimation of probabilities from sparse data for the language model component of a speech recognizer, IEEE Transactions on Acoustics, Speech, and Signal Processing35(3): 400-401.

Koehn, P. (2005). Europarl: A parallel corpus for statistical machine translation, MIT Summit 2005, Phuket, Thailand, pp. 79-86.

Kolorenc, J., Nouza, J. and Cerva, P. (2006). Multi-words in the Czech TV and radio news transcription system, Proceedingsof SPECOM 2006, St. Petersburg, Russia, pp. 70-74.

Lee, A., Kawahara, T. and Shikano, K. (2001). Julius-an open source real-time large vocabulary recognition engine, Proceedingsof the European Conference on Speech Communicationand Technology (EUROSPEECH), Aalborg, Denmark, pp. 1691-1694.

Mauces, M., Rotownik, T. and Zemljak, M. (2003). Modelling highly inflected Slovenian language, International Journalof Speech Technology 1(6): 254-257.

Mikolov, T., Deoras, A., Kombrink, S., Burget, L. and Cernocky, J. (2011). Empirical evaluation and combination of advanced language modeling techniques, INTERSPEECH,ISCA, Florence, Italy, pp. 605-608.

Niesler, T., Whittaker, E.W.D. and Woodland, P. (1998). Comparison of part-of-speech and automatically derived category-based language models for speech recognition, Proceedings of ICASSP 98, Seattle, WA, USA, pp. 177-180.

Piasecki, M. (2007). Polish tagger TaKIPI: Rule based construction and optimisation, Task Quarterly11(1): 151-167.

Piasecki, M. and Broda, B. (2007). Correction of medical handwriting OCR based on semantic similarity, in H. Yin, P. Tino, E. Corchado, W. Byrne and X. Yao (Eds.), IntelligentData Engineering and Automated Learning-IDEAL2007, Lecture Notes in Computer Science, Vol. 4881, Springer Verlag, Heidelberg, pp. 437-446.

Piasecki, M. and Radziszewski, A. (2008). Morphological prediction for Polish by a statistical a tergo index, SystemsScience 34(4): 7-17.

Sarukkai, R. and Ballard, D. (1996). Word set probability boosting for improved spontaneous dialogue recognition. The ab and tab algorithms, Technical Report TR-601, University of Rochester, New York, NY.

Sas, J. (2009). Optimal spoken dialog control in hands-free medical information systems, Journal of Medical Informaticsand Technologies 13: 113-120.

Sas, J. (2010). Application of local bidirectional language model to error correction in Polish medical speech recognition, Journal of Medical Informatics and Technologies15(1): 127-134.

Sas, J. and Żołnierek, A. (2011). Distant co-occurrence language model for ASR in loose word order languages, Proceedingsof the International Conference on Computer RecognitionSystems Cores 2011, Wrocław, Poland, pp. 767-778.

Vaiciunas, A., Kaminskas, V. and Raskinis, G. (2004).Statistical language models of Lithuanian based on word clustering and morphological decomposition, Informatica15(4): 565-580.

Ward, W. and Issar, S. (1996). A class based language model for speech recognition, Acoustics, Speech, and Signal Processing,ICASSP 96, Atlanta, GA, USA, pp. 416-418.

Whittaker, E. and Woodland, P. (2003). Language modelling for Russian and English using words and classes, ComputerSpeech and Language 17(1): 87-104.

Woliński, M. (2006). Morfeusz-a practical tool for the morphological analysis of Polish, Inteligent Processingand Web Mining: IIPWM06, Ustro´n, Poland, pp. 503-512.

Wózniak, M. and Krawczyk, B. (2012). Combined classifier based on feature space partitioning, International Journalof Applied Mathematics and Computer Science22(4): 855-866, DOI: 10.2478/v10006-012-0063-0.

Young, S. and Everman, G. (2009). The HTK Book (for HTKVersion 3.4), Cambridge University, Cambridge.

Zółko, B., Skurzok, D. and Ziółko, M. (2010). Word n-grams for Polish, Proceedings of the 10th IASTED InternationalConference on Artificial Intelligence and Applications(AIA 2010), Innsbruck, Austria, pp. 197-201.

Ziółko, J., Gałka, J., Jadczyk, T., Skurzok, D. and Masior, M. (2011). Automatic speech recognition system dedicated for Polish, Proceedings of the INTERSPEECH 2011 Conference,Florence, Italy, pp. 3315-3316.

Ziółko, J., Gałka, J. and Skurzok, D. (2010). Speech modelling using phoneme segmentation and modified weighted Levenshtein distance, Proceedings of the ICALP2010 Colloquium,Bordeaux, France, pp. 743-746.

International Journal of Applied Mathematics and Computer Science

Journal of the University of Zielona Góra

Journal Information

IMPACT FACTOR 2017: 1.694
5-year IMPACT FACTOR: 1.712

CiteScore 2017: 2.20

SCImago Journal Rank (SJR) 2017: 0.729
Source Normalized Impact per Paper (SNIP) 2017: 1.604

Mathematical Citation Quotient (MCQ) 2017: 0.13


All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 144 144 10
PDF Downloads 51 51 2