Storytelling Voice Conversion: Evaluation Experiment Using Gaussian Mixture Models

Jiří Přibil; Anna Přibilová; Daniela Ďuračková

Accès libre

Storytelling Voice Conversion: Evaluation Experiment Using Gaussian Mixture Models

Jiří Přibil

Anna Přibilová

Daniela Ďuračková

| 19 sept. 2015

Journal of Electrical Engineering

Édition 66 (2015): Edition 4 (July 2015)

À propos de cet article

Article précédent

Article suivant

Citez

Partagez

Publié en ligne: 19 sept. 2015

Pages: 194 - 202

Reçu: 01 déc. 2014

DOI: https://doi.org/10.2478/jee-2015-0032

Mots clés
storytelling voice conversion, spectral and prosodic features of speech, evaluation of speech quality, GMM classifier

© Faculty of Electrical Engineering and Information Technology, Slovak University of Technology

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

[1] LEE, H. J. : Fairy Tale Storytelling System: Using Both Prosody and Text for Emotional Speech Synthesis, In: Convergence and Hybrid Information Technology (Lee, G., Howard, D., Ślȩzak, D., Hong, Y.S., eds.), Communications in Computer and Information Science, vol. 310, Springer, Berlin Heidelberg, 2012, pp. 317–324.10.1007/978-3-642-32692-9_41Search in Google Scholar

[2] ALCANTARA, J. A.—LU, L. P.—MAGNO, J. K.—SORIANO, Z.—ONG, E.—RESURRECCION, R. : Emotional Narration of Children’s Stories,, In: Theory and Practice of Computation (Nishizaki, S.Y., Numao, M., Caro, J., Suarez, M.T., eds.), Proceedings in Information and Communication Technology, vol. 5, Springer, Japan, 2012, pp. 1–14.10.1007/978-4-431-54106-6_1Search in Google Scholar

[3] DOUKHAN, D.—ROSSET, S.—RILLIARD, A.—D’ALESSANDRO, C.—ADDA-DECKER, M. : Text and Speech Corpora for Text-to-Speech Synthesis of Tales,, In: Proceedings of the 8-th International Conference on Language Resources and Evaluation, Istanbul, Turkey, 2012, pp. 1003–1010.Search in Google Scholar

[4] MAENO, Y.—NOSE, T.—KOBAYASHI, T.—KORIYAMA, T.—IJIMA, Y.—NAKAJIMA, H.—MIZUNO, H.—YOSHIOKA, O. : Prosodic Variation Enhancement Using Unsupervised Context Labeling for HMM-based Expressive Speech Synthesis, Speech Communication 57 (2014), 144–154.10.1016/j.specom.2013.09.014Search in Google Scholar

[5] PŘIBIL, J.—PŘIBILOVÁ, A. : Czech TTS Engine for Braille Pen Device Based on Pocket PC Platform, Proc. of the 16th Conference Electronic Speech Signal Processing ESSP 05 joined with the 15th Czech-GermanWorkshop Speech Processing (Vch, R., ed.), 2005, pp. 402–408.Search in Google Scholar

[6] PŘIBILOVÁ, A.—PŘIBIL, J. : Spectrum Modification for Emotional Speech Synthesis, In: Multimodal Signals: Cognitive and Algorithmic Issues (Esposito, A., Hussain, A., Marinaro, M., Martone, R., eds.), LNAI 5398, Springer-Verlag Berlin Heidelberg, 2009, pp. 232–241.10.1007/978-3-642-00525-1_23Search in Google Scholar

[7] PŘIBIL, J.—PŘIBILOVÁ, A. : Application of Expressive Speech in TTS System with Cepstral Description., In: Verbal and Nonverbal Features of Human-Human and Human-Machine Interaction (Esposito, A., Bourbakis, N., Avouris, N., Hatrzilygeroudis, I., eds.), LNAI 5042, Springer-Verlag, Berlin Heidelberg, 2008, pp. 201–213.Search in Google Scholar

[8] BLAUERT, J.—JEKOSCH, U. : A Layer Model of Sound Quality, Journal of the Audio Engineering Society 60 (2012), 4–12.Search in Google Scholar

[9] LEGÁT, M.—MATOUŠEK, J. : Design of the Test Stimuli for the Evaluation of Concatenation Cost Functions, In: Text, Speech and Dialogue 2009 (MATOUŠEK, V. et al, eds.), LNCS 5729, Springer, Heidelberg, 2009, pp. 339–346.10.1007/978-3-642-04208-9_47Search in Google Scholar

[10] BELLO, C.—RIBAS, D.—CALVO, J. R.—FERRER, C. A. : From Speech Quality Measures to Speaker Recognition Performance., In: Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications (Bayro-Corrochano, E., Hancock, E., eds.), LNCS 8827, Springer International Publishing Switzerland, 2014, pp. 199–206.Search in Google Scholar

[11] ROMPORT, J.—MATOUŠEK, J. : Formal Prosodic Structures and Their Application in NLP, In: Text, Speech and Dialogue 2005 (Matouek, V. et al., eds.), LNCS 3658, Springer-Verlag, Berlin Heidelberg, 2005, pp. 371–378.10.1007/11551874_48Search in Google Scholar

[12] JEONG, Y. : Joint Speaker and Environment Adaptation Using TensorVoice for Robust Speech Recognition, Speech Communication 58 (2014), 1–10.10.1016/j.specom.2013.10.001Search in Google Scholar

[13] REYNOLDS, D. A.—ROSE, R. C. : Robust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models, IEEE Transactions on Speech and Audio Processing 3 (1995), 72–83.10.1109/89.365379Search in Google Scholar

[14] MUHAMMAD, G.—ALGHATHBAR, K. : Environment Recognition for Digital Audio Forensics Using MPEG-7 and Mel Cepstral Features, Journal of Electrical Engineering 62 No. 4 (2011), 199–205.10.2478/v10187-011-0032-0Search in Google Scholar

[15] PISHRAVIAN, A.—SAHAF, M. R. A. : Application of Independent Component Analysis for Speech-Music Separation Using An Efficient Score Function Estimation, Journal of Electrical Engineering 63 No. 6 (2012), 380–385.10.2478/v10187-012-0056-0Search in Google Scholar

[16] PŘIBIL, J.—PŘIBILOVÁ, A. : Emotional Style Conversion in the TTS System with Cepstral Description, In: Verbal and Nonverbal Communication Behaviours (Esposito, A., Faundez-Zanuy, M., Keller, E., Marinaro, M., eds.), LNAI 4775, Springer-Verlag, Berlin Heidelberg New York, 2007, pp. 65–73.10.1007/978-3-540-76442-7_6Search in Google Scholar

[17] VCH, R.—PŘIBIL, J.—SMÉKAL, Z. : New Cepstral Zero-Pole Vocal Tract Models for TTS Synthesis, Proc. of IEEE Region 8 EUROCON’2001, vol. 2, 2001, pp. 458–462.Search in Google Scholar

[18] MADHU, N. : Note on Measures for Spectral Flatness, Electronics Letters 45 No. 23 (2009), 1195–1196.10.1049/el.2009.1977Search in Google Scholar

[19] SHAH, N. H. : Numerical Methods with C++ Programming, Prentice-Hall Of India Learning Private Limited, New Delhi, 2009.Search in Google Scholar

[20] HOSSEINZADEH, D.—KRISHNAN, S. : On the Use of Complementary Spectral Features for Speaker Recognition, EURASIP Journal on Advances in Signal Processing (2008), Article ID 258184.10.1155/2008/258184Search in Google Scholar

[21] SOUSA. R.—FERREIRA, A.—ALKU, P. : The Harmonic and Noise Information of the Glottal Pulses, Speech, Biomedical Signal Processing and Control 10 (2014), 137–143.10.1016/j.bspc.2013.12.004Search in Google Scholar

[22] LECLERC, I.—DAJANI, H. R.—GIGUERE, C. : Differences in Shimmer Across Formant Regions, Journal of Voice 27 No. 6 (2013), 685–690.10.1016/j.jvoice.2013.05.00224070592Search in Google Scholar

[23] PŘIBIL, J.—PŘIBILOVÁ, A.—ĎURAČKOVÁ, D. : Evaluation of Spectral and Prosodic Features of Speech Affected by Orthodontic Appliances using the GMM Classifier, Journal of Electrical Engineering 65 (2014), 30–36.10.2478/jee-2014-0004Search in Google Scholar

[24] PŘIBIL, J.—PŘIBILOVÁ, A. : Determination of Formant Features in Czech and Slovak for GMM Emotional Speech Classifier, Radioengineering 22 (2013), 52–59.Search in Google Scholar

[26] PŘIBIL, J.—PŘIBILOVÁ, A.—MATOUŠEK, J. : Experiment with Evaluation of Quality of the Synthetic Speech by the GMM Classifier, In: Text, Speech and Dialogue, Proc. of the 16th International Conference TSD 2013, Plzen, Czech Republic September 2013 (Habernal, I., Matoušek, V., eds.), LNAI 8082, Springer-Verlag, Berlin Heidelberg, 2013, pp. 241–248.10.1007/978-3-642-40585-3_31Search in Google Scholar

[27] DILEEP, A. D.—SEKHAR, C. CH. : Class-Specific GMM Based Intermediate Matching Kernel for Classification of Varying Length Patterns of Long Duration Speech Using Support Vector Machines, Speech Communication 57 (2014), 126–143.10.1016/j.specom.2013.09.010Search in Google Scholar

[28] ZHAO, J.—JIANG, Q. : Probabilistic PCA for t-Distributions, Neurocomputing 69 No. 16-18 (2006), 2217–2226.10.1016/j.neucom.2005.07.011Search in Google Scholar

eISSN:: 1339-309X
Langue:: Anglais

Périodicité:: 6 fois par an
Sujets de la revue:: Engineering, Introductions and Overviews, other

RSS Feed de la revue

Storytelling Voice Conversion: Evaluation Experiment Using Gaussian Mixture Models

Publié en ligne: 19 sept. 2015

Pages: 194 - 202

Reçu: 01 déc. 2014

DOI: https://doi.org/10.2478/jee-2015-0032

Mots clésstorytelling voice conversion, spectral and prosodic features of speech, evaluation of speech quality, GMM classifier

© Faculty of Electrical Engineering and Information Technology, Slovak University of Technology

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

Mots clés
storytelling voice conversion, spectral and prosodic features of speech, evaluation of speech quality, GMM classifier