Real Time Recognition Of Speakers From Internet Audio Stream

Open access

Abstract

In this paper we present an automatic speaker recognition technique with the use of the Internet radio lossy (encoded) speech signal streams. We show an influence of the audio encoder (e.g., bitrate) on the speaker model quality. The model of each speaker was calculated with the use of the Gaussian mixture model (GMM) approach. Both the speaker recognition and the further analysis were realized with the use of short utterances to facilitate real time processing. The neighborhoods of the speaker models were analyzed with the use of the ISOMAP algorithm. The experiments were based on four 1-hour public debates with 7–8 speakers (including the moderator), acquired from the Polish radio Internet services. The presented software was developed with the MATLAB environment.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • [1] S. Araki T. Hori M. Fujimoto S. Watanabe T. Yoshioka T. Nakatani and A. Nakamura. Online meeting recognizer with multichannel speaker diarization. In Signals Systems and Computers (ASILOMAR) 2010 Conference Record of the Forty Fourth Asilomar Conference on pages 1697–1701 Nov 2010.

  • [2] D. Blatt and A. Hero. On tests for global maximum of the log-likelihood function. Information Theory IEEE Transactions on 53(7):2510–2525 July 2007.

  • [3] M. Bosi K. Brandenburg S. Quackenbush L. Fielder K. Akagiri H. Fuchs and M. Dietz. ISO/IEC MPEG-2 Advanced Audio Coding. J. Audio Eng. Soc 45(10):789–814 1997.

  • [4] M. Brookes. VOICEBOX: Speech Processing Toolbox for MATLAB 2005.

  • [5] J. Dattorro. Convex optimization and Euclidean distance geometry. Lulu. com 2008.

  • [6] J. R. Hershey and R. A. Olsen. Approximating the Kullback Leibler divergence between gaussian mixture models. In ICASSP (4) pages 317–320 2007.

  • [7] T. Jiang and J. Han. Map-based audio coding compensation for speaker recognition. Journal of Signal and Information Processing 2:165 2011.

  • [8] R. D. Maesschalck D. Jouan-Rimbaud and D. Massart. The Mahalanobis distance. Chemometrics and Intelligent Laboratory Systems 50(1):1 – 18 2000.

  • [9] T. Marciniak R. Weychan A. Dabrowski and A. Krzykowska. Speaker recognition based on short Polish sequences. IEEE SPA: Signal Processing Algorithms Architectures Arrangements and Applications Conference Proceedings pages 95–98 2010.

  • [10] T. Marciniak R. Weychan A. Dabrowski and A. Krzykowska. Influence of silence removal on speaker recognition based on short Polish sequences. IEEE SPA: Signal Processing Algorithms Architectures Arrangements and Applications Conference Proceedings pages 159–163 2011.

  • [11] T. Marciniak R. Weychan A. Stankiewicz and A. Dabrowski. Biometric speech signal processing in a system with digital signal processor. Bulletin of the Polish Academy of Sciences. Technical Sciences Vol. 62 nr 3:589–594 2014.

  • [12] S. Molau M. Pitz R. Schluter and H. Ney. Computing Mel-frequency cepstral coefficients on the power spectrum. In Acoustics Speech and Signal Processing 2001. Proceedings. (ICASSP '01). 2001 IEEE International Conference on volume 1 pages 73–76 2001.

  • [13] K. Park J.-S. Park and Y.-H. Oh. GMM adaptation based online speaker segmentation for spoken document retrieval. Consumer Electronics IEEE Transactions on 56(2):1123–1129 2010.

  • [14] Z. Piotrowski J. Wojtun and K. Kaminski. Subscriber authentication using GMM and tms320c6713dsp. Przeglad Elektrotechniczny (12a/2012):127–130 2012.

  • [15] A. Plinge and G. A. Fink. Online multi-speaker tracking using multiple microphone arrays informed by auditory scene analysis. In Signal Processing Conference (EUSIPCO) 2013 Proceedings of the 21st European pages 1–5 Sept 2013.

  • [16] D. Reynolds. Gaussian mixture models. Encyclopedia of Biometrics pages 659–663 2009.

  • [17] J. B. Tenenbaum V. D. Silva and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323 2000.

  • [18] G. Wen L. Jiang and J. Wen. Using locally estimated geodesic distance to optimize neighborhood graph for isometric data embedding. Pattern Recognition 41(7):2226 – 2236 2008.

  • [19] R. Weychan T. Marciniak and A. Dabrowski. Analysis of differences between MFCC after multiple GSM transcodings. Przeglad Elektrotechniczny pages 24–29 2012.

  • [20] R. Weychan T. Marciniak A. Stankiewicz and A. Dabrowski. Real time speaker recognition from internet radio. IEEE SPA: Signal Processing Algorithms Architectures Arrangements and Applications Conference Proceedings pages 128–132 2014.

  • [21] R. Weychan A. Stankiewicz T. Marciniak and A. Dabrowski. Improving of speaker identification from mobile telephone calls. In Multimedia Communications Services and Security volume 429 of Communications in Computer and Information Science pages 254–264. 2014.

Search
Journal information
Impact Factor


CiteScore 2018: 0.61

SCImago Journal Rank (SJR) 2018: 0.152
Source Normalized Impact per Paper (SNIP) 2018: 0.463

Mathematical Citation Quotient (MCQ) 2018: 0.08

Metrics
All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 325 173 1
PDF Downloads 161 94 2