Duration and speed of speech events: A selection of methods

Open access


The study of speech timing, i.e. the duration and speed or tempo of speech events, has increased in importance over the past twenty years, in particular in connection with increased demands for accuracy, intelligibility and naturalness in speech technology, with applications in language teaching and testing, and with the study of speech timing patterns in language typology. H owever, the methods used in such studies are very diverse, and so far there is no accessible overview of these methods. Since the field is too broad for us to provide an exhaustive account, we have made two choices: first, to provide a framework of paradigmatic (classificatory), syntagmatic (compositional) and functional (discourse-oriented) dimensions for duration analysis; and second, to provide worked examples of a selection of methods associated primarily with these three dimensions. Some of the methods which are covered are established state-of-the-art approaches (e.g. the paradigmatic Classification and Regression Trees, CART , analysis), others are discussed in a critical light (e.g. so-called ‘rhythm metrics’). A set of syntagmatic approaches applies to the tokenisation and tree parsing of duration hierarchies, based on speech annotations, and a functional approach describes duration distributions with sociolinguistic variables. Several of the methods are supported by a new web-based software tool for analysing annotated speech data, the Time Group Analyser.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • Arnold Denis & W agner P etra & Möbius Bernd. 2011. E valuating different rating scales for obtaining judgments of syllable prominence from naive listeners. In Proceedings of XVIIth International Congress of Phonetic Sciences 253-255. H ong K ong.

  • Auran Cyril & Bouzon Caroline & H irst Daniel. 2004. T he A ix-MARSEC project: an evolutive database of spoken E nglish. In Bel Bernard & Marlien Isabelle (eds.) Proceedings of the Second International Conference on Speech Prosody 561-564. N ara J apan.

  • Bachan J olanta. 2011. Communicative alignment of synthetic speech. P oznań: A dam Mickiewicz U niversity in Poznań. (Doctoral dissertation.) Barbosa P linio. 2009. Measuring speech rhythm variation in an oscillator-based framework. In Proceedings of Interspeech 2009. Brighton: International Speech Communication A ssociation.

  • Breiman L eo & Friedman J erome & O lshen R. A . & Stone Charles. 1984. Classification and regression trees. Monterey CA: W adsworth & Brooks/Cole A dvanced Books & Software.

  • Buchsbaum A dam & van Santen L . J an P . H . 1997. Methods for O ptimal T ext Selection. In Proceedings 5th Euro. Conf. on Speech Communication and Technology Vol 2 553-556. Rhodes G reece.

  • Campbell N ick. 1992. Multi-level timing in speech. Brighton UK : U niversity of Sussex (Exp. P sychol). (Doctoral dissertation.)

  • Carson-Berndsen J ulie. 1998. Time map phonology: Finite state models and event logics in speech recognition. Dordrecht: K luwer A cademic P ublishers.

  • Cummins Fred. 1999. Some lengthening factors in E nglish speech combine additively at most rates. The Journal of the Acoustical Society of America 105. 476-480.

  • Dechert H ans W . & Raupach Manfred (eds.) Temporal Variables in Speech. Studies in Honour of Frieda Goldman- Eisler. T he H ague: Mouton.

  • Demenko G rażyna & K lessa K atarzyna & Szymański Marcin & Breuer Stefan & H ess W olfgang. 2010. P olish unit selection speech synthesis with BOSS: extensions and speech corpora. International Journal of Speech Technology 13(2). 85-99.

  • Everitt Brian S. & L andau Sabine & L eese Morven & Stahl Daniel 2011. Cluster Analysis 5th Edition. King’s College L ondon: J ohn W iley & Sons.

  • Gibbon Dafydd. 1992. P rosody time types and linguistic design factors in spoken language system architectures. Proceedings of KONVENS 1992. 90-99.

  • Gibbon Dafydd. 2003. Computational modelling of rhythm as alternation iteration and hierarchy. In Proceedings of International Congress of Phonetic Sciences III. Barcelona 2489-2492.

  • Gibbon Dafydd. 2006. T ime types and time trees: P rosodic mining and alignment of temporally annotated data. In Sudhoff Stefan et al. 2006. Methods in Empirical Prosody Research 281-209. Berlin: W alter de G ruyter.

  • Gibbon Dafydd. 2013. TGA : a web tool for T ime G roup A nalysis. In Proceedings of Tools and Resources for the Analysis of Speech Prosody (TRASP). A ix-en-Provence.

  • Gibbon Dafydd & Fernandes Flaviane Romani. 2005. A nnotation-mining for rhythm model comparison in Brazilian P ortuguese. Proceedings of Interspeech 2005 3289-3292.

  • Gibbon Dafydd & H irst Daniel & Campbell N ick (eds.). 2012. Rhythm melody and harmony in speech. Studies in honour of Wiktor Jassem. Speech and Language Technology 14/15. P oznań.

  • Grosjean François H . & L ass N orman J . 1977. Some factors affecting the listener’s perception of reading rate in English and French. Language and Speech 20(3). 198-208.

  • Gut U lrike. 2012. Rhythm in L 2 speech. In G ibbon Dafydd & H irst Daniel & Campbell N ick (eds.) Rhythm melody and harmony in speech. Studies in honour of Wiktor Jassem. Speech and Language Technology 14/15. 105-114. P oznań.

  • ‘t Hart J ohan & Collier Rene & Cohen A ntonie. 1990. A Perceptual Study of Intonation: An Experimental- Phonetic Approach to Speech Melody. Cambridge: Cambridge U niversity P ress.

  • Hirst Daniel & Di Cristo A lbert (eds.). 1998. Intonation Systems. A survey of Twenty Languages. Cambridge: Cambridge U niversity P ress.

  • Inden Benjamin & Malisz Z ofia & W agner P etra & W achsmuth Ipke. 2012. Rapid entrainment to spontaneous speech: A comparison of oscillator models. In Miyake N . & P eebles D. & Cooper R. P . (eds.) Proceedings of 34th Annual Conference of the Cognitive Science Society. A ustin T X: Cognitive Science Society.

  • Jassem W iktor. 2003. IPA : Polish. Journal of the International Phonetic Association 33(1). 103-107.

  • Jassem W iktor & K rzyśko Mirosław & Stolarski P rzemysław. 1981. Regression model of isochrony in speech signal IPPT PAN 33. W arszawa.

  • Jassem W iktor & H ill David R. & W itten Ian H . 1984. Isochrony in E nglish speech: its statistical validity and linguistic relevance. In G ibbon Dafydd & Richter H elmut (eds.) Intonation accent and rhythm. Studies in Discourse Phonology 8. 203-225.

  • King Simon & P ortele T homas & H öfer Florian. 1997. Speech synthesis using non-uniform units in the Verbmobil project. Proceedings Eurospeech 2. 569-572. Rhodes.

  • King Simon & Black A lan W . & T aylor P aul & Caley Richard & Clark Rob. 2003. E dinburgh Speech T ools. System Documentation E dition 1.2 for 1.2.3 24th J an 2003. (Retrieved from: http://www.cstr.ed.ac.uk/projects/speech_tools/manual-1.2.0 on 27 A pril 2013).

  • Klatt Dennis. H . 1976. L inguistic uses of segmental duration in E nglish: A coustic and perceptual evidence. The Journal of the Acoustical Society of America 59. 1208‑1221.

  • Klatt Dennis. H . 1987. Review of text-to-speech conversion for E nglish. The Journal of the Acoustical Society of America 88(3). 737-793.

  • Klessa K atarzyna & Szymański Marcin & Breuer S. & Demenko G rażyna. 2007. O ptimization of P olish segmental duration prediction with CART. In Proceedings of 6th ISCA Workshop on Speech Synthesis (SSW-6). Vol. 1. Bonn.

  • Klessa K atarzyna & W agner A gnieszka O leśkowicz-Popiel Magdalena & K arpiński Maciej. 2013. “Paralingua” - a new speech corpus for the studies of paralinguistic features. In Vargas-Sierra Chelo (ed.) Corpus Resources for Descriptive and Applied Studies. Current Challenges and Future Directions: Selected Papers from the 5th International Conference on Corpus Linguistics (CILC2013). Procedia - Social and Behavioral Science. Vol. 95 48-58.

  • Koreman J acques. 2006. P erceived speech rate: T he effects of articulation rate and speaking style in spontaneous speech. Journal of the Acoustical Society of America 119. 582-596.

  • Lehiste Ilse. 1970. Suprasegmentals. Cambridge Massachusetts-London: M.I.T. P ress.

  • Lehiste Ilse. 1977. Isochrony reconsidered. Journal of Phonetics 5.

  • Low E e L ing & G rabe E sther & N olan Francis. 2001. Quantitative characterisations of speech rhythm: Syllabletiming in Singapore E nglish. Language and Speech 43(4). 377-401.

  • Łobacz P iotra. 1976a. O bjective and subjective speech tempo in P olish. Speech Analysis and Synthesis 4. 173-186.

  • Łobacz P iotra. 1976b. Speech rate and vowel formants. Speech Analysis and Synthesis 4. 187-218.

  • Möbius Bernd & van Santen J an P . H . 1996. Modeling segmental duration in G erman text-to-speech synthesis. Spoken Language 1996. Proceedings of ICSLP. Vol. 4 2395-2398. P hiladelphia PA : IEEE .

  • Möbius Bernd. 2001. Rare events and closed domains: two delicate concepts in speech synthesis. 4th ISCA ITRW on Speech Synthesis. P erthshire.

  • Moers Donata & J auk Igor & Möbius Bernd & W agner P etra. 2010. Synthesizing Fast Speech by Implementing Multi-Phone U nits in U nit Selection Speech Synthesis. In Proceedings of 7th ISCA Tutorial and Research Workshop on Speech Synthesis (SSW-7).

  • Moos A nja & T rouvain J ürgen. 2007. Comprehension of U ltra-Fast Speech-Blind vs. ‘Normally H earing’ P ersons. In Proceedings of the 16th International Congress of Phonetic Sciences 677-680.

  • Olaszy G ábor. 2002. P redicting H ungarian sound durations for continuous speech. Acta Linguistica Hungarica 49(3-4). 321-345.

  • OʼShaughnessy Douglas. 1984. A multispeaker analysis of duration in read French paragraphs. Journal of the Acoustical Society of America 76(6). 1664-1672.

  • Pfitzinger H artmut R. 1996. T wo approaches to speech rate estimation. In Proceedings SST. Vol. 96 421-426.

  • Portele T homas & Sendlemeier W alter & H ess W olfgang. 1990. A system for G erman speech synthesis based on demisyllables diphones and suffixes. In ESCA Workshop on Speech Synthesis Autrans 161-164.

  • Richter L utosława. 1973. T he duration of P olish vowels. Speech Analysis and Synthesis 3. 87-115. W arszawa.

  • Richter L utosława. 1974. P orównanie iloczasu samogłosek polskich wymówionych w logatomach oraz w wyrazach. Biuletyn Polskiego Towarzystwa Fonetycznego 32. 173-178.

  • Richter L utosława. 1987. Modelling of the rhythmic structure of utterances in P olish. Studia Phonetica Posnaniensia 1. 91-125.

  • Roach P eter. 1982. O n the distinction between ‘stress-timed’ and ‘syllable-timed’ languages. In Crystal David (ed.) Linguistic Controversies: Essays in Linguistic Theory and Practice 73-79. L ondon: E dward A rnold.

  • Scott Donia R. & Isard S. D. & de Boysson-Bardies Bénédicte. 1986. O n the measurement of rhythmic irregularity: a reply to Benguerel. Journal of Phonetics 14. 327-330.

  • Siegler Matthiew A . & Stern Richard M. 1995. O n the effects of speech rate in large vocabulary speech recognition systems. In International Conference on Acoustics Speech and Signal Processing 1995. ICASSP-95. Vol. 1 612-615.

  • Syrdal A nn K . & Bunnell T imothy & H ertz Susan R. & Mishra T aniya & Spiegel Murray & Bickley Corine & Rekart Deborah & Makashay Matthew J . 2012. T ext-To-Speech Intelligibility across Speech Rates. In Proceedings of Interspeech. P ortland O regon.

  • Szymański Marcin & K lessa K atarzyna & Breuer Stefan & Demenko G rażyna. 2011. O ptimization of unit selection speech synthesis. In Proceedings of XVIIth International Congress of Phonetic Sciences 1930-1933. Hong K ong.

  • Treiblmaier H orst & Filzmoser P eter. 2009. Benefits from using continuous rating scales in online survey research. Technische U niversitt W ien Forschungsbericht SM-2009-4.

  • Vainio Martti. 2001. Artificial neural network based prosody models for Finnish text-to-speech synthesis. Helsinki: U niversity of H elsinki. (Doctoral dissertation.)

  • van Santen J an P . H . 1993. Quantitative modeling of segmental duration. In Proceedings of the workshop on Human Language Technology 323-328. A ssociation for Computational L inguistics.

  • Wagner P etra & W indmann A ndreas. 2011. T he shrinking effects on speech tempo perception. In Proceedings of XVIIth International Congress of Phonetic Sciences 2082-2085. H ong K ong.

  • Zee E ric. 2002. T he effect of speech rate on the temporal organization of syllable production in cantonese. Proceedings of Speech Prosody. Aix-en-Provence.

Journal information
Impact Factor

CiteScore 2018: 0.22

SCImago Journal Rank (SJR) 2018: 0.124
Source Normalized Impact per Paper (SNIP) 2018: 0.828

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 443 104 2
PDF Downloads 166 63 0