Automatic bird song and syllable segmentation with an open-source deep-learning object detection method – a case study in the Collared Flycatcher (Ficedula albicollis)

Sándor Zsebők 1 , Máté Ferenc Nagy-Egri 2 , Gergely Gábor Barnaföldi 2 , Miklós Laczi 1 , 3 , Gergely Nagy 1 , Éva Vaskuti 1 , and László Zsolt Garamszegi 1 , 4 , 5
  • 1 Behavioural Ecology Group, Department of Systematic Zoology and Ecology, Eötvös Loránd University, 1117, Budapest, Hungary
  • 2 Wigner Research Centre for Physics, , 1121, Budapest, Hungary
  • 3 , 8744, Orosztony, Hungary
  • 4 MTA-ELTE, Theoretical Biology and Evolutionary Ecology Research Group, Department of Plant Systematics, Ecology and Theoretical Biology, Eötvös Loránd University, 1117, Budapest, Hungary
  • 5 Evolutionary Ecology Group, Centre for Ecological Research, Institute of Ecology and Botany, , Hungary


The bioacoustic analyses of animal sounds result in an enormous amount of digitized acoustic data, and we need effective automatic processing to extract the information content of the recordings. Our research focuses on the song of Collared Flycatcher (Ficedula albicollis) and we are interested in the evolution of acoustic signals. During the last 20 years, we obtained hundreds of hours of recordings of bird songs collected in natural environment, and there is a permanent need for the automatic process of recordings. In this study, we chose an open-source, deep-learning image detection system to (1) find the species-specific songs of the Collared Flycatcher on the recordings and (2) to detect the small, discrete elements so-called syllables within the song. For these tasks, we first transformed the acoustic data into spectrogram images, then we trained two deep-learning models separately on our manually segmented database. The resulted models detect the songs with an intersection of union higher than 0.8 and the syllables higher than 0.7. This technique anticipates an order of magnitude less human effort in the acoustic processing than the manual method used before. Thanks to the new technique, we are able to address new biological questions that need large amount of acoustic data.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • Bioacoustics Research Program 2014. Raven Pro: Interactive Sound Analysis Software (Version 1.5) [Computer software]. – Ithaca, NY: The Cornell Lab of Ornithology Available from

  • Borker, A. L., Halbert, P., McKown, M. W., Tershy, B. R. & Croll, D. A. 2015. A comparison of automated and traditional monitoring techniques for marbled murrelets using passive acoustic sensors. – Wildlife Society Bulletin 39: 813–818. DOI: 10.1002/wsb.608

  • Catchpole, C. K., Slater, P. J. B. 2008. Bird song: biological themes and variations, 2nd ed. – Cambridge University Press, Cambridge

  • Garamszegi, L. Zs., Eens, M. & Török, J. 2008. Birds Reveal their Personality when Singing. – PLoS One 3(7). DOI: 10.1371/journal.pone.0002647

  • Garamszegi, L. Zs., Török, J., Hegyi, G., Szöllõsi, E., Rosivall, B. & Eens, M. 2007. Age-dependent expression of song in the Collared Flycatcher, Ficedula albicollis. – Ethology 113: 246–256. DOI: 10.1111/j.1439-0310.2007.01337.x

  • Garamszegi, L. Zs., Zagalska-Neubauer, M., Canal, D., Blazi, Gy., Laczi, M., Nagy, G., Szőllősi, E., Vaskuti, É. Török, J. & Zsebők, S. 2018. MHC-mediated sexual selection on birdsong: Generic polymorphism, particular alleles and acoustic signals. – Molecular Ecology 27: 2620–2633. DOI: 10.1111/mec.14703

  • Garamszegi, L. Zs., Zsebők, S. & Török, J. 2012. The relationship between syllable repertoire similarity and pairing success in a passerine bird species with complex song. – Journal of Theoretical Biology 295: 68–76. DOI: 10.1016/j.jtbi.2011.11.011

  • Haavie, J., Borge, T., Bures, S., Garamszegi, L. Zs., Lampe, H. M., Moreno, J., Qvarnström, A., Török, J. & Sætre, G. P. 2004. Flycatcher song in allopatry and sympatry – Convergence, divergence and reinforcement. – Journal of Evolutionary Biology 17: 227–237. DOI: 10.1111/j.1420-9101.2003.00682.x

  • Hafner, S. D. & Katz, J. 2017. {monitoR}: Acoustic template detection in R. Retrieved from

  • Hopp, S. L., Owren, M. J. & Evans, C. S. 1998. Animal acoustic communication: sound analysis and research methods. – Springer-Verlag Berlin Heidelberg

  • Lachlan, R. F., Ratmann, O. & Nowicki, S. 2018. Cultural conformity generates extremely stable traditions in bird song. – Nature Communications 9: 2417. DOI: 10.1038/s41467-018-04728-1

  • Laiolo, P. 2010. The emerging significance of bioacoustics in animal species conservation. – Biological Conservation 143: 1635–1645. DOI: 10.1016/j.biocon.2010.03.025

  • Mac Aodha, O., Gibb, R., Barlow, K. E., Browning, E., Firman, M., Freeman, R., Harder, B., Kinsey, L., Mead, G. R., Newson, S. E., Pandourski, I., Parsons, S., Russ, J., Szodoray-Paradi, A., Szodoray-Paradi, F., Tilova, E., Girolami, M., Brostow, G. & Jones, K. E. 2018. Bat detective-Deep learning tools for bat acoustic signal detection. – PLoS Computational Biology 14: 1–19. DOI: 10.1371/journal.pcbi.1005995

  • Priyadarshani, N., Marsland, S. & Castro, I. 2018. Automated birdsong recognition in complex acoustic environments: a review. – Journal of Avian Biology 49(5): 1–27. DOI: 10.1111/jav.01447

  • R Core Team 2018. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria – Available online at

  • Rahman, M. A. & Wang, Y. 2016. Optimizing intersection-over-union in deep neural networks for image segmentation. – Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 10072 LNCS: 234–244. DOI: 10.1007/978-3-319-50835-1_22

  • Redmon, J., Divvala, S., Girshick, R. & Farhadi, A. 2016. You Only Look Once: Unified, Real-Time Object Detection. Retrieved from

  • Redmon, J. & Farhadi, A. 2018. YOLOv3: An Incremental Improvement. – Retrieved from

  • Stowell, D., Petrusková, T., Šálek, M. & Linhart, P. 2018. Automatic acoustic identification of individual animals: Improving generalisation across species and recording conditions. – Retrieved from

  • Stowell, D., Wood, M. D., Pamuła, H., Stylianou, Y. & Glotin, H. 2019. Automatic acoustic detection of birds through deep learning: The first Bird Audio Detection challenge. – Methods in Ecology and Evolution 10: 368–380. DOI: 10.1111/2041-210X.13103

  • Sueur, J., Aubin, T. & Simonis. C. 2008. Seewave, a Free Modular Tool for Sound Analysis and Synthesis. Bio-acoustics The International Journal of Animal Sound and its Recording 18:213–226. DOI: 10.1080/09524622. 2008.9753600

  • Tchernichovski, O., Nottebohm, F., Ho, C. E., Pesaran, B. & Mitra, P. P. 2000. A procedure for an automated measurement of song similarity. – Animal Behaviour 59: 1167–1176. DOI: 10.1006/anbe.1999.1416

  • Vellema, M., Diales Rocha, M., Bascones, S., Zsebők, S., Dreier, J., Leitner, S., Van der Linden, A., Brewer, J. & Gahr, M. 2019. Accelerated redevelopment of vocal skills is preceded by lasting reorganization of the song motor circuitry. – Elife 8: 1–46. DOI: 10.7554/elife.43194

  • Zachar, G., Tóth, A. S., Gerecsei, L. I., Zsebők, S., Ádám, Á. & Csillag, A. 2019. Valproate exposure in ovo attenuates the acquisition of social preferences of young post-hatch Domestic Chicks. – Frontiers in Physiology 10: 881. DOI: 10.3389/fphys.2019.00881

  • Zsebők, S., Blázi, G., Laczi, M., Nagy, G., Vaskuti, É. & Garamszegi, L. Zs. 2018a “Ficedula”: an open-source MATLAB toolbox for cutting, segmenting and computer-aided clustering of bird song. – Journal of Ornithology 159: 1105–1111. DOI: 10.1007/s10336-018-1581-9

  • Zsebők, S., Herczeg, G., Blázi, G., Laczi, M., Nagy, G., Török, J. & Garamszegi, L. Zs. 2018b Minimum spanning tree as a new, robust repertoire size comparison method: simulation and test on birdsong. – Behavioral Ecology and Sociobiology 72: 48. DOI: 10.1007/s00265-018-2467-9

  • Zsebők, S., Herczeg, G., Blázi, G., Laczi, M., Nagy, G., Szász, E., Markó, G., Török, J. & Garamszegi, L. Zs. 2017. Short- and long-term repeatability and pseudo-repeatability of bird song: sensitivity of signals to varying environments. – Behavioral Ecology and Sociobiology 71: 154. DOI: 10.1007/s00265-017-2379-0


Journal + Issues