A New Approach to Parametric Modeling of Glottal Flow

Open access

Abstract

Glottal waveform models have long been employed in improving the quality of speech synthesis. This paper presents a new approach for modeling the glottal flow. The model is based on three control volumes that strike a one-mass and two-springs system sequentially and generate a glottal pulse. The first, second and third control volumes represent the opening, closing and closed phases of the vocal folds, respectively. The masses of the three control volumes and the size of the first one are the four parameters that define the shape, pitch and amplitude of the glottal pulse. The model may be viewed as parametric approach governed by second order differential equations rather than analytical functions and is very flexible for designing a glottal pulse. The glottal pulse generated by the present model, when compared with those generated by Rosenberg, LF and mucosal wave propagation models demonstrates that it appropriately represents the opening, closing and closed phases of the vocal fold oscillation. This leads to the validity of our model. Numerical solution of the present model has been found to be very efficient as compared to its analytical solution and two other well-known parametric models Rosenberg++ and LF. The accuracy of the numerical solution has been illustrated with the help of analytical solution. It has been observed that the accuracy improves by increasing the size of the first control volume and may decrease insignificantly with increase in the mass of any of the control volumes. Two experiments with the present model support its successful implementation as a voice source in speech synthesis. Thus our model renders itself as an efficient, accurate and realistic choice as a voice source to be employed in real-time speech production.

Ananthapadmanabha T.V. (1984), Acoustic analysis of voice source dynamics, Speech Transmission Laboratory-Quarterly Progress and Status Report, 2, 3, 1-24.

Berry D.A., Titze I.R. (1996), Normal modes in a continuum model of vocal fold tissues, Journal of the Acoustical Society of America, 100, 5, 3345-3354.

de Vries M.P., Schutte H.K., Verkerke G.J. (1999), Determination of parameters for lumped parameter model of the vocal fold using a finite-element method approach, Journal of the Acoustical Society of America, 106, 6, 3620-3628.

Drioli C. (2002), A flow waveform adaptive mechanical glotal model, Speech, Music and Hearing Quarterly Progress and Status Report, 43, 69-79.

Fant G. (1979a), Glottal source and excitation analysis, Speech Transmission Laboratory-Quarterly Progress and Status Report, 20, 1, 85-107.

Fant G. (1979b), Vocal source analysis - a progress report, Speech Transmission Laboratory-Quarterly Progress and Status Report, 20, 3-4, 31-54.

Fant G. (1982b), The voice source - acoustic modeling, Speech Transmission Laboratory-Quarterly Progress and Status Report, 23, 4, 28-48.

Fant G., Liljencrants J., Guang Lin Q. (1985), A four-parameter model of glottal flow, Speech Transmission Laboratory-Quarterly Progress and Status Report, 26, 4, 1-13.

Flanagan J.L., Landgraf L.L. (1968), Self-oscillating source for vocal tract synthesizers, IEEE Transactions on Audio and Electroacoustics, 16, 1, 57-64.

Fujisaki H., Ljungqvist M. (1986), Proposal and evaluation of models for the glottal source waveform, Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 1605-1608, Tokyo.

Gunter H.E. (2003), A mechanical model of vocal-fold collision with high spatial and temporal resolution, Journal of the Acoustical Society of America, 113, 2, 994-1000.

Hedelin P. (1984), A glottal LPC-vocoder, Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 161-164, San Diego.

Herzel H., Knudsen C. (1995), Bifurcations in a vocal fold model, Nonlinear Dynamics, 7, 53-64.

Ishizaka K., Flanagan J.L. (1972), Synthesis of voiced sounds from a two mass model of the vocal cords, Bell System Technical Journal, 51, 6, 1233-1268.

Klatt D.H., Klatt L.C. (1990), Analysis synthesis, and perception of voice quality variations among female and male talkers, Journal of the Acoustical Society of America, 87, 2, 820-856.

Kob M. (2004), Aachen University, Dep. of Phoniatrics, Pedaudiology, and Communication Disorders, Aachen, Germany http://www.kunst.psychosomatik.ukaachen.de/go/show?ID=3668027&DV=1&COMP=page&ALTNAVID=1251124&ALTNAVDV=1

Liljencrants J. (1991), A translating and rotating mass model of the vocal folds, Speech Transmission Laboratory-Quarterly Progress and Status Report, 32, 1, 1-18.

Lous N.J.C., Hofmans G.C.J., Veldhuis R.N.J., Hirschberg A. (1998), A symmetrical two-mass vocal-fold model coupled to vocal tract and trachea, with application to prosthesis design, Acta Acustica (united with Acustica), 84, 1135-1150.

Pelorson X., Hirschberg A., van Hassel R.R., Wijnands A.P.J. (1994), Theoretical and experimental study of quasisteady-flow separation within the glottis during phonation. Application to a modified two mass model, Journal of the Acoustical Society of America, 96, 6, 3416-3431.

Price P.J. (1989), Male and female voice source characteristics: Inverse filtering results, Speech Communication, 8, 261-277.

Qi Y.Y., Bi N. (1994), Simplified approximation of the 4-parameter LF model of voice source, Journal of the Acoustical Society of America, 96, 1182-1185.

Rosenberg A. (1971), Effect of glottal pulse shape on the quality of natural vowels, Journal of the Acoustical Society of America, 49, 2, 583-590.

Rothenberg M., Carlson R., Granström B., Lindqvist-Gauffin J. (1975), A three-parameter voice source for speech synthesis, In G. Fant (Ed.) Proceedings of the Speech Communication Seminar, vol. 2, pp. 235-243, Stockholm.

Schoentgen J. (1993), Modelling the glottal pulse with a self-excited threshold autoregressive model, Proceedings of Eurospeech'93, pp. 107-110, Berlin.

Story B.H., Titze I.R. (1995), Voice simulation with a body-cover model of the vocal folds, Journal of the Acoustical Society of America, 97, 2, 1249-1260.

Veldhuis R. (1998), A computationally efficient alternative for the Liljencrants-Fant model and its perceptual evaluation, Journal of the Acoustical Society of America, 103, 1, 566-571.

Archives of Acoustics

The Journal of Institute of Fundamental Technological of Polish Academy of Sciences

Journal Information


IMPACT FACTOR 2016: 0.816
5-year IMPACT FACTOR: 0.835

CiteScore 2016: 1.15

SCImago Journal Rank (SJR) 2016: 0.432
Source Normalized Impact per Paper (SNIP) 2016: 0.948

Metrics

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 100 99 9
PDF Downloads 37 36 3