GMM-based speaker age and gender classification in Czech and Slovak

Open access


The paper describes an experiment with using the Gaussian mixture models (GMM) for automatic classification of the speaker age and gender. It analyses and compares the influence of different number of mixtures and different types of speech features used for GMM gender/age classification. Dependence of the computational complexity on the number of used mixtures is also analysed. Finally, the GMM classification accuracy is compared with the output of the conventional listening tests. The results of these objective and subjective evaluations are in correspondence.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • 1] M. Li K. J. Han and S. Narayanan ”Automatic Speaker Age and Gender Recognition Using Acoustic and Prosodic Level In formation Fusion” Computer Speech and Language vol. 27 2013 151-167.

  • [2] T. Bocklet A. Maier J. G. Bauer F. Burkhardt and E. N¨oth ”Age and Gender Recognition for Telephone Applications Based on GMM Supervectors and Support Vector Machines” IEEE International Conference on Acoustics Speech and Signal Pro- cessing 31 March - 4 April 2008 1605-1608 Las Vegas NV: IEEE.

  • [3] G. Dobry R. M. Hecht M. Avigal and Y. Zigel ”Supervector Dimension Reduction for Efficient Speaker Age Estimation Based on Acoustic Speech Signal” IEEE Transactions on Au- dio Speech and Language Processing vol. 19 no. 7 2011 1975-1985.

  • [4] C. Van heerden E. Barnard M. Davel C. Van der Walt E. Van dyk M. Feld and C. M¨uller ”Combining Regression and Classification Methods for Improving Automatic Speaker Age Recognition” IEEE International Conference on Acoustics Speech and Signal Processing 14-19 March 2010 5174-5177 Dallas TX: IEEE.

  • [5] M. H. Bahari M. Mclaren H. Van Hamme and D. A. Van-Leeuwen ”Speaker Age Estimation Using i-Vectors” Engineering Applications of Artificial Intelligence vol. 34 2014 99-108.

  • [6] M. Fairhurst M. Erbilek and M. Da Costa-Abreu ”Selective Review and Analysis of Aging Effects in Biometric System Implementation” IEEE Transactions on Human-Machine Systems vol. 45 no. 3 2015 294-303.

  • [7] N. Minematsu M. Sekiguchi and K. Hirose ”Automatic Estimation of One’s Age with His/her Speech Based upon Acoustic Modeling Techniques of Speakers” IEEE International Con- ference on Acoustics Speech and Signal Processin 13-17 May 2002 I-137-I-140 Orlando FL USA: IEEE.

  • [8] H. Meinedo and I. Trancoso ”Age and Gender Classification using Fusion of Acoustic and Prosodic Features” Interspeech 2010 26-30 September 2010 Makuhari Japan 2822-2825.

  • [9] B. D. Barkana and J. Zhou ”A new Pitch-Range Based Feature Set for a Speaker’s Age and Gender Classification” Applied Acoustics vol. 98 2015 52-61.

  • [10] A. Fedorova O. Glembek T. Kinnunen and P. Matˇejka ”Exploring ANN Back-Ends for i-Vector Based Speaker Age Estimation” Interspeech 2015 6-10 September 2015 Dresden Germany 3036-3040.

  • [11] D. Tihelka M. Gr°uber and Z. Hanzl´ıˇcek ”Robust Methodology for TTS Enhancement Evaluation” Text Speech and Dialogue I. Habernal V. Matouˇsek 2013 LNAI 8082 Berlin Heidelberg Springer 442-449.

  • [12] J. Přibil A. Přibilov´a and J. Matoušek ”Experiment with GMM Based Artefact Localization in Czech Synthetic Speech” Text Speech and Dialogue (TSD) P. Král V. Matouˇsek LNAI 9302 Springer 2015 23-31.

  • [13] D. A. Reynolds R. C. Rose ”Robust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models” IEEE Transactions on Speech and Audio Processing vol. 3 1995 72-83.

  • [14] A. Venturini L. Zao and X. Coelho ”On Speech Features Fusion -Integration Gaussian Modeling and Multi-Style Training for Noise Robust Speaker Classification” IEEE/ACM Transac- tions on Audio Speech and Language Processing vol. 22 no. 12 2014 1951-1964.

  • [15] M. Shah C. Chakrabarti and A. Spanias ”Within and Cross -Corpus Speech Emotion Recognition Using Latent Topic Model -Based Features” EURASIP Journal on Audio Speech and Music Processing 2015 vol. 4 2015 1-17.

  • [16] J. Přibil A. Přibilová and D. Ďuračková ”Storytelling Voice Conversion: Evaluation experiment using Gaussian mixture models” Journal of Electrical Engineering vol. 66 2015 194-202 DOI: 10.1515/jee-2015-0032/.

  • [17] J. Přibil and A. Přibilová ”GMM-Based Evaluation of Emotional Style Transformation in Czech and Slovak” Cognitive Computation 2014 DOI: 10.1007/s12559-014-9283-y.

  • [18] B. Božilović B. Todorović and B. M. Obradović ”Text-Independent Speaker Recognition Using Two-Dimensional Information Entropy” Journal of Electrical Engineering vol. 66 no. 3 2015 167-173.

  • [19] P. Boersma and D. Weenink ”Praat: Doing Phonetics by Computer” (Version 5.4.22) [Computer Program] Retrieved 8 October 2015 from

  • [20] I. T. Nabney ”Netlab Pattern Analysis Toolbox” Copyright (1996-2001) Retrieved February 16 2012 from

  • [21] S. E. Linville ”Source Characteristics of Aged Voice Assessed from Long-Term Average Spectra” Journal of Voice vol. 16 no. 4 2002 472-479.

  • [22] R. J. Baken ”The Aged Voice: A New Hypothesis” Journal of Voice vol. 19 no. 3 2005 317-325.

  • [23] J. D. Harnsberger R. Shrivastav W. S. Brown H. Rothman and H. Hollien ”Speaking Rate and Fundamental Frequency as Speech Cues to Perceived Age” Journal of Voice vol. 22 no. 1 2008 58-69.

  • [24] J. D. HarnsbergerW. S. Brown R. Shrivastav and H. Rothman ”Noise and Tremor in the Perception of Vocal Aging in Males” Journal of Voice vol. 24 no. 5 2010 523-530.

  • [25] G. Gosztolya and T. Grósz ”Domain Adaptation of Deep Neural Networks for Automatic Speech Recognition via Wireless Sensors” Journal of Electrical Engineering vol. 67 no. 2 2016 124-130.

Journal information
Impact Factor

IMPACT FACTOR 2018: 0.636
5-year IMPACT FACTOR: 0.663

CiteScore 2018: 0.88

SCImago Journal Rank (SJR) 2018: 0.200
Source Normalized Impact per Paper (SNIP) 2018: 0.771

Cited By
All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 225 133 2
PDF Downloads 134 111 4