Evaluation of speaker de-identification based on voice gender and age conversion

Jiří Přibil 1 , Anna Přibilová 2 , and Jindřich Matoušek 3
  • 1 Institute of Measurement Science, Slovak Academy of Sciences, , Bratislava, Slovakia
  • 2 Slovak University of Technology in Bratislava, Faculty of Electrical Engineering and Information Technology, Bratislava, Slovakia
  • 3 Department of Cybernetics, Faculty of Applied Sciences, University of West Bohemia, Plzeň


Two basic tasks are covered in this paper. The first one consists in the design and practical testing of a new method for voice de-identification that changes the apparent age and/or gender of a speaker by multi-segmental frequency scale transformation combined with prosody modification. The second task is aimed at verification of applicability of a classifier based on Gaussian mixture models (GMM) to detect the original Czech and Slovak speakers after applied voice deidentification. The performed experiments confirm functionality of the developed gender and age conversion for all selected types of de-identification which can be objectively evaluated by the GMM-based open-set classifier. The original speaker detection accuracy was compared also for sentences uttered by German and English speakers showing language independence of the proposed method.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • [1] S. Ribaric, A. Ariyaeeinia and N. Pavesic, “De-identification for privacy protection in multimedia content: A survey”, Signal Processing: Image Communication, 2016, 47, 131–151.

  • [2] A. Sayadian and F. Mozaffari, “A novel method for voice conversion based on non-parallel corpus”, International Journal of Speech Technology, 2017, 20, (3), 587–592.

  • [3] H. Valbret, E. Moulines and J. P. Tubach, “Voice transformation using PSOLA technique”, Speech Communication, 1992, 11, (2-3), 175–187.

  • [4] Q. Jin, A. R. Toth, T. Schultz et al, “Voice convergin: Speaker de-identification by voice transformation”, Proc. 2009 IEEE Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP 2009), Taipei, Taiwan, April 2009, pp. 3909–3912.

  • [5] T. Justin, V. Štruc, S. Dobrišek et al, “Speaker de-identification using diphone recognition and speech synthesis”, Proc. 11th IEEE Int. Conf. and Workshops Automatic Face and Gesture Recognition (FG 2015), Ljubljana, Slovenia, May 2015, pp. 1–7.

  • [6] M. Faundez-Zanuy, E. Sesa-Nogueras and S. Marinozzi, “Speaker identification experiments under gender de-identification”, xperiments under gender de-identification. Proc. 49th Annual IEEE Int. Carnahan Conf. Security Technology ICCST 2015, Taipei, Taiwan, September 2015, pp. 309–314.

  • [7] C. Magarinos, P. Lopez-Otero, L. Docio-Fernandez et al, “Reversible speaker de-identification using pre-trained transformation functions”, Computer Speech and Language, 2017, 46, pp. 36–52.

  • [8] M. Abou-Zleikha, Z. -H. Tan, M. G. Christensen et al, “A discriminative approach for speaker selection in speaker de-identification systems”, Proc. 23rd European Signal Processing Conf. (EUSIPCO 2015), Nice, France, August 2015, pp. 2102-2106.

  • [9] R. Vích, J. Přibil and Z. Smékal, “New cepstral zero-pole vocal tract models for TTS synthesis”, Proc. IEEE Region 8 EURO-CON’2001; vol. 2, Section S22-Speech Compression and DSP, Bratislava, Slovakia, July 2001, pp. 458–62.

  • [10] D. A. Reynolds and R. C. Rose, “Robust text-independent speaker identification using Gaussian mixture speaker models”, IEEE Transactions on Speech and Audio Processing, 1995, 3, 72–83.

  • [11] F. Burkhardt, A. Paeschke, M. Rolfes et al, “A database of German emotional speech”, Proc. 9th European Conf. Speech Communication and Technology (INTERSPEECH 2005), Lisbon, Portugal, September 2005, pp. 1517–1520.

  • [12] P. Klosowski, A. Dustor and J. Izydorczyk, “Speaker verification performance evaluation based on open source speech processing software and TIMIT speech corpus”, P. Gaj et al, Communications in Computer and Information Science 522 (Springer International Publishing Switzerland, 2015), pp. 400–409.

  • [13] M. Fleischer, S. Pinkert, W. Mattheus et al, “Formant frequencies and bandwidths of the vocal tract transfer function are affected by the mechanical impedance of the vocal tract wall”, Biomechanics and Modeling in Mechanobiology, 2015, 14, (4), 719–733.

  • [14] M. P. Gelfer and Q. E. Bennett, “Speaking fundamental frequency and vowel formant frequencies: Effects on perception of gender”, Journal of Voice, 2013, 27, (5), 556–566.

  • [15] K. Pisanski, B. C. Jones, B. Fink et al. “Voice parameters predict sex-specific body morphology in men and women”, Animal Behaviour, 2016, 112, 13–32.

  • [16] U. Reubold, J. Harrington and F. Kleber, “Vocal aging effects on F0 and the first formant: A longitudinal analysis in adult speakers”, Speech Communication, 2010, 52, (7-8), 638–651.

  • [17] C. M. Bishop, “Pattern Recognition and Machine Learning”, Springer,.

  • [18] G. Muhammad and K. Alghathbar, “Environment recognition for digital audio forensics using MPEG-7 and mel cepstral features”, Journal of Electrical Engineering, 2011, 62, (4), 199–205.

  • [19] J. Přibil and A. Přibilová, “GMM-based evaluation of emotional style transformation in Czech and Slovak”, Cognitive Computation, 2014, 6, (4), 928–939.

  • [20] J. Přibil and A. Přibilová, “Comparison of text-independent original speaker recognition from emotionally converted speech”, A. Esposito et al, Smart Innovation, Systems and Technologies 2016, 48, pp. 137–149.

  • [21] J. Přibil an d A. Přibilová, J. Matoušek, “GMM-based speaker age and gender classification in Czech and Slovak”, Journal of Electrical Engineering, 2017, 68, (1), 3–12.

  • [22] B. Božilovic, B. M. Todorovic and M. Obradovic, “Text independent speaker recognition using two-dimensional information entropy”, Journal of Electrical Engineering, 2015, 66, (3), 169–173.

  • [23] I. T. Nabney, “Netlab Pattern Analysis Toolbox, Release 3”, http://www.aston.ac.uk/eas/research/groups/ncrg/resources/netlab/downloads, accessed 2 October 2015.


Journal + Issues