Speaker Identification Using Data-Driven Score Classification

Open access


We present a comparative evaluation of different classification algorithms for a fusion engine that is used in a speaker identity selection task. The fusion engine combines the scores from a number of classifiers, which uses the GMM-UBM approach to match speaker identity. The performances of the evaluated classification algorithms were examined in both the text-dependent and text-independent operation modes. The experimental results indicated a significant improvement in terms of speaker identification accuracy, which was approximately 7% and 14.5% for the text-dependent and the text-independent scenarios, respectively. We suggest the use of fusion with a discriminative algorithm such as a Support Vector Machine in a real-world speaker identification application where the text-independent scenario predominates based on the findings.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • [1] Altman N.S. (1992). An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician 46(3) 175–185

  • [2] Beigi H. (2011). Speaker Recognition Encyclopedia of Cryptography and Security Springer pp. 1232–1242

  • [3] Bimbot F. Bonastre J.F. Fredouille C. Gravier G. Magrin-Chagnolleau I. Meignier S. Reynolds D.A. (2004). A tutorial on text-independent speaker verification. EURASIP journal on applied signal processing 2004 430–451

  • [4] Bishop C.M. (2008 June). A new framework for machine learning. In IEEE World Congress on Computational Intelligence (pp. 1–24). Springer Berlin Heidelberg

  • [5] Bouchard G. (2007). Bias-variance tradeoff in hybrid generative-discriminative models. In Machine Learning and Applications. ICMLA 2007. Sixth International Conference on (pp. 124–129). IEEE

  • [6] Burges C.J.C. Ben J.I. Denker J.S. LeCun Y. Nohl C.R. (1993). Off line recognition of hand-written postal words using neural networks. International Journal of Pattern Recognition and Artificial Intelligence 7(04) 689–704

  • [7] Campbell J.P. (1997). Speaker recognition: a tutorial. Proceedings of the IEEE 85(9) 1437–1462

  • [8] Campbell J.P. Reynolds D A. (1999 March). Corpora for the evaluation of speaker recognition systems. In Acoustics Speech and Signal Processing 1999. Proceedings. 1999 IEEE International Conference on (Vol. 2 pp. 829–832). IEEE

  • [9] Dehak N. Kenny P.J. Dehak R. Dumouchel P. Ouellet P. (2011). Front-end factor analysis for speaker verification. IEEE Transactions on Audio Speech and Language Processing 19(4) 788–798

  • [10] Damper R.I. Higgins J.E. (2003). Improving speaker identification in noise by subband processing and decision fusion. Pattern Recognition Letters 24(13) 2167–2173

  • [11] Furui S. (1981). Cepstral analysis technique for automatic speaker verification. IEEE Transactions on Acoustics Speech and Signal Processing 29(2) 254–272

  • [12] Ganchev T. Siafarikas M. Mporas I. Stoyanova T. (2014). Wavelet basis selection for enhanced speech parametrization in speaker verification. International Journal of Speech Technology 17(1) 27–36

  • [13] Hermansky H. Morgan N. (1994). RASTA processing of speech. IEEE transactions on speech and audio processing 2(4) 578–589

  • [14] Hsu C.W. Lin C.J. (2002). A comparison of methods for multiclass support vector machines. IEEE transactions on Neural Networks 13(2) 415–425

  • [15] Kittler J. Hatef M. Duin R.P. Matas J. (1998). On combining classifiers. IEEE transactions on pattern analysis and machine intelligence 20(3) 226–239

  • [16] Kuncheva L.I. Alpaydin E. (2007). Combining Pattern Classifiers: Methods and Algorithms IEEE Transactions on Neural Networks 18(3) 964–964

  • [17] Kung S.Y. (2014). Kernel methods and machine learning. Cambridge University Press. pp. 341–342

  • [18] Larcher A. Lee K.A. Ma B. Li H. (2014). Text-dependent speaker verification: Classifiers databases and RSR2015. Speech Communication 60 56–77

  • [19] Mitchell H. B. (2007). Multi-sensor data fusion: an introduction. Springer Science & Business Media

  • [20] Monte-Moreno E. Chetouani M. Faundez-Zanuy M. Sole-Casals J. (2009). Maximum likelihood linear programming data fusion for speaker recognition. Speech Communication 51(9) 820–830

  • [21] Najafian M. Safavi S. Weber P. Russell M. (2016). Identification of British English regional accents using fusion of i-vector and multi-accent phonotactic systems. ODYSSEY

  • [22] Nandakumar K. Jain A. K. (2008 September). Multibiometric template security using fuzzy vault. In Biometrics: Theory Applications and Systems 2008. BTAS 2008. 2nd IEEE International Conference on (pp. 1–6). IEEE

  • [23] Pal S.K. Mitra S. (1996). Noisy fingerprint classification using multilayer perceptron with fuzzy geometrical and textural features. Fuzzy sets and systems 80(2) 121–132

  • [24] Ramachandran R.P. Farrell K.R. Ramachandran R. Mammone R.J. (2002). Speaker recognition–general classifier approaches and data fusion methods. Pattern Recognition 35(12) 2801–2821

  • [25] Raudys Š. (2006). Trainable fusion rules. I. Large sample size case. Neural Networks 19(10) 1506–1516

  • [26] Reynolds D.A. Rose R. C. (1995). Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE transactions on speech and audio processing 3(1) 72–83

  • [27] Reynolds D.A. Quatieri T.F. Dunn R.B. (2000). Speaker verification using adapted Gaussian mixture models. Digital signal processing 10(1) 19–41

  • [28] Safavi S. Gan H. Mporas I. Sotudeh R. Fraud Detection in Voice-based Identity Authentication Applications and Services. In The IEEE International Conference on Data Mining series (ICDM) 2016

  • [29] Safavi S. Hanani A. Russell M. Jancovic P. Carey M.J. (2012). Contrasting the effects of different frequency bands on speaker and accent identification. IEEE Signal Processing Letters 19(12) 829–832.

  • [30] Safavi S. Jancovic P. Russell M.J. Carey M.J. (2013). Identification of gender from children’s speech by computers and humans. In INTERSPEECH (pp. 2440–2444)

  • [31] Safavi S. Najafian M. Hanani A. Russell M.J. Jancovic P. Carey M.J. (2012). Speaker Recognition for Children’s Speech. In INTERSPEECH (pp. 1836–1839)

  • [32] Safavi S. Russell M.J. Jancovic P. (2014 September). Identification of age-group from children’s speech by computers and humans. In INTERSPEECH (pp. 243–247)

  • [33] Soong F.K. Rosenberg A.E. Juang B.H. Rabiner L.R. (1987). Report: A vector quantization approach to speaker recognition. AT&T technical journal 66(2) 14–26

  • [34] Sukkar R.A. Lee C.H. (1996). Vocabulary independent discriminative utterance verification for nonkeyword rejection in subword based speech recognition. IEEE Transactions on Speech and Audio Processing 4(6) 420–429

  • [35] Witten I.H. Frank E. Hall M.A. (20011). Embedded Machine Learning. Data Mining: Practical machine learning tools and techniques. Elsevier BV pp. 531–538

  • [36] Viikki O. Laurila K. (1998). Cepstral domain segmental feature vector normalization for noise robust speech recognition. Speech Communication 25(1) 133–147

  • [37] Zhang S. Zhu L. (2013). A packet classification algorithm based on improved decision tree. Journal of Networks 8(12) 2864–2871

Journal information
All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 152 72 1
PDF Downloads 96 52 0