Evaluation of speaker de-identification based on voice gender and age conversion

Open access


Two basic tasks are covered in this paper. The first one consists in the design and practical testing of a new method for voice de-identification that changes the apparent age and/or gender of a speaker by multi-segmental frequency scale transformation combined with prosody modification. The second task is aimed at verification of applicability of a classifier based on Gaussian mixture models (GMM) to detect the original Czech and Slovak speakers after applied voice deidentification. The performed experiments confirm functionality of the developed gender and age conversion for all selected types of de-identification which can be objectively evaluated by the GMM-based open-set classifier. The original speaker detection accuracy was compared also for sentences uttered by German and English speakers showing language independence of the proposed method.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • [1] S. Ribaric A. Ariyaeeinia and N. Pavesic “De-identification for privacy protection in multimedia content: A survey” Signal Processing: Image Communication 2016 47 131–151.

  • [2] A. Sayadian and F. Mozaffari “A novel method for voice conversion based on non-parallel corpus” International Journal of Speech Technology 2017 20 (3) 587–592.

  • [3] H. Valbret E. Moulines and J. P. Tubach “Voice transformation using PSOLA technique” Speech Communication 1992 11 (2-3) 175–187.

  • [4] Q. Jin A. R. Toth T. Schultz et al “Voice convergin: Speaker de-identification by voice transformation” Proc. 2009 IEEE Int. Conf. Acoustics Speech and Signal Processing (ICASSP 2009) Taipei Taiwan April 2009 pp. 3909–3912.

  • [5] T. Justin V. Štruc S. Dobrišek et al “Speaker de-identification using diphone recognition and speech synthesis” Proc. 11th IEEE Int. Conf. and Workshops Automatic Face and Gesture Recognition (FG 2015) Ljubljana Slovenia May 2015 pp. 1–7.

  • [6] M. Faundez-Zanuy E. Sesa-Nogueras and S. Marinozzi “Speaker identification experiments under gender de-identification” xperiments under gender de-identification. Proc. 49th Annual IEEE Int. Carnahan Conf. Security Technology ICCST 2015 Taipei Taiwan September 2015 pp. 309–314.

  • [7] C. Magarinos P. Lopez-Otero L. Docio-Fernandez et al “Reversible speaker de-identification using pre-trained transformation functions” Computer Speech and Language 2017 46 pp. 36–52.

  • [8] M. Abou-Zleikha Z. -H. Tan M. G. Christensen et al “A discriminative approach for speaker selection in speaker de-identification systems” Proc. 23rd European Signal Processing Conf. (EUSIPCO 2015) Nice France August 2015 pp. 2102-2106.

  • [9] R. Vích J. Přibil and Z. Smékal “New cepstral zero-pole vocal tract models for TTS synthesis” Proc. IEEE Region 8 EURO-CON’2001; vol. 2 Section S22-Speech Compression and DSP Bratislava Slovakia July 2001 pp. 458–62.

  • [10] D. A. Reynolds and R. C. Rose “Robust text-independent speaker identification using Gaussian mixture speaker models” IEEE Transactions on Speech and Audio Processing 1995 3 72–83.

  • [11] F. Burkhardt A. Paeschke M. Rolfes et al “A database of German emotional speech” Proc. 9th European Conf. Speech Communication and Technology (INTERSPEECH 2005) Lisbon Portugal September 2005 pp. 1517–1520.

  • [12] P. Klosowski A. Dustor and J. Izydorczyk “Speaker verification performance evaluation based on open source speech processing software and TIMIT speech corpus” P. Gaj et al Communications in Computer and Information Science 522 (Springer International Publishing Switzerland 2015) pp. 400–409.

  • [13] M. Fleischer S. Pinkert W. Mattheus et al “Formant frequencies and bandwidths of the vocal tract transfer function are affected by the mechanical impedance of the vocal tract wall” Biomechanics and Modeling in Mechanobiology 2015 14 (4) 719–733.

  • [14] M. P. Gelfer and Q. E. Bennett “Speaking fundamental frequency and vowel formant frequencies: Effects on perception of gender” Journal of Voice 2013 27 (5) 556–566.

  • [15] K. Pisanski B. C. Jones B. Fink et al. “Voice parameters predict sex-specific body morphology in men and women” Animal Behaviour 2016 112 13–32.

  • [16] U. Reubold J. Harrington and F. Kleber “Vocal aging effects on F0 and the first formant: A longitudinal analysis in adult speakers” Speech Communication 2010 52 (7-8) 638–651.

  • [17] C. M. Bishop “Pattern Recognition and Machine Learning” Springer.

  • [18] G. Muhammad and K. Alghathbar “Environment recognition for digital audio forensics using MPEG-7 and mel cepstral features” Journal of Electrical Engineering 2011 62 (4) 199–205.

  • [19] J. Přibil and A. Přibilová “GMM-based evaluation of emotional style transformation in Czech and Slovak” Cognitive Computation 2014 6 (4) 928–939.

  • [20] J. Přibil and A. Přibilová “Comparison of text-independent original speaker recognition from emotionally converted speech” A. Esposito et al Smart Innovation Systems and Technologies 2016 48 pp. 137–149.

  • [21] J. Přibil an d A. Přibilová J. Matoušek “GMM-based speaker age and gender classification in Czech and Slovak” Journal of Electrical Engineering 2017 68 (1) 3–12.

  • [22] B. Božilovic B. M. Todorovic and M. Obradovic “Text independent speaker recognition using two-dimensional information entropy” Journal of Electrical Engineering 2015 66 (3) 169–173.

  • [23] I. T. Nabney “Netlab Pattern Analysis Toolbox Release 3” http://www.aston.ac.uk/eas/research/groups/ncrg/resources/netlab/downloads accessed 2 October 2015.

Journal information
Impact Factor

IMPACT FACTOR 2018: 0.636
5-year IMPACT FACTOR: 0.663

CiteScore 2018: 0.88

SCImago Journal Rank (SJR) 2018: 0.200
Source Normalized Impact per Paper (SNIP) 2018: 0.771

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 292 175 10
PDF Downloads 249 157 7