Environment Recognition for Digital Audio Forensics Using MPEG-7 and MEL Cepstral Features

Ghulam Muhammad 1  and Khalid Alghathbar 2
  • 1 Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, PO Box: 51178, Riyadh 11543, Saudi Arabia
  • 2 Center of Excellence in Information Assurance, King Saud University, Riyadh, Saudi Arabia

Environment Recognition for Digital Audio Forensics Using MPEG-7 and MEL Cepstral Features

Environment recognition from digital audio for forensics application is a growing area of interest. However, compared to other branches of audio forensics, it is a less researched one. Especially less attention has been given to detect environment from files where foreground speech is present, which is a forensics scenario. In this paper, we perform several experiments focusing on the problems of environment recognition from audio particularly for forensics application. Experimental results show that the task is easier when audio files contain only environmental sound than when they contain both foreground speech and background environment. We propose a full set of MPEG-7 audio features combined with mel frequency cepstral coefficients (MFCCs) to improve the accuracy. In the experiments, the proposed approach significantly increases the recognition accuracy of environment sound even in the presence of high amount of foreground human speech.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • DELP, E.—MEMON, N.—WU, M. : Digital Forensics, IEEE Signal Process. Magazine (March 2009), 14-15.

  • BROEDERS, A. P. A. : Forensics Speech and Audio Analysis: the State of the Art in 2000 AD, Actas del I Congreso de la Sociedad Espanola de Acustica Forense, March 2000, pp. 13-24.

  • CAMPBELL, W.—BRADY, K.—CAMPBELL, J.—REYNOLDS, D.—GRANVILLE, R. : Understanding Scores in Forensics Speaker Recognition, ISCA Speaker RecognitionWorkshop, June 2006, pp. 1-8.

  • AES, AES43-2000: AES Standard for Forensics Purposes - Criteria for the Authentication of Analog Audio Tape Recordings, Journal of the Audio Engineering Society 48(3) (June 2000), 204-214.

  • RABINER, L. R.—JUANG, B. H.: Fundamentals of Speech Recognition, Prentice Hall, Upper-Saddle River, NJ, 1993.

  • ERONEN, A. J.—PELTONEN, V. T.—TUOMI, J. T.—KLAPURI, A. P.—FAGERLUND, S.—SORSA, T.—LORHO, G.— HUOPANIEMI, J. : Audio-Based Context Recognition, IEEE Trans. Audio, Speech and Language Process. 14(1) (Jan 2006), 321-329.

  • ZENG, Z.—LI, X.—MA, X.—JI, Q. : Adaptive Context Recognition based on Audio Signal, Proc. 19th International Conference on Pattern Recognition'08, 2008.

  • SELINA, C.—NARAYANAN, S.—KUO, J. : Environmental Sound Recognition using MP-Based Features, Proc. IEEE International Conference on Acoustics, Speech and Signal Process. (ICASSP08), 2008, pp. 1-4.

  • MALLAT, S.—ZHANG, Z. : Matching Pursuits with Time-Frequency Dictionaries, IEEE Trans. Signal Processing 41(12) (1993),. 3397-3415.

  • MALKIN, R. G.—WAIBEL, A. : Classifying User Environment for Mobile Applications using Linear Autoencoding of Ambient Audio, Proc. IEEE International Conference on Acoustics, Speech and Signal Process. (ICASSP05), 2005, pp. 509-512.

  • MA, L.—SMITH, D. J.—MILNER, B. P. : Context Awareness using Environmental Noise Classification, Proc. Eurospeech03, 2003, pp. 2237-2240.

  • WANG, J. C.—WANG, J. F.—HE, K. W.—HSU, C. S. : Environmental Sound Classification using Hybrid SVM/KNN Classifier and MPEG-7 Audio Low-Level Descriptor, Proc. IEEE International Joint Conference on Neural Networks, 2006, pp. 1731-1735.

  • NTALAMPIRA, S.—POTAMITIS, N.—FAKOTAKIS, N. : Automatic Recognition of Urban Environmental Sounds Events, Proc. CIP2008, 2008, pp. 110-113.

  • KRAETZER, C.—OERMANN, A.—DITTMANN, J.—LANG, A. : Digital Audio Forensics: a First Practical Evaluation on Microphone and Environmental Classification, Proc. ACM Multi Media and Security, (MMSec'07), 2007, pp. 63-73.

  • DUDA, R. O.—HART, P. E.—STORK, D. G. : Pattern Classification, 2nd Ed., Willey, New York, 2001.

  • SELINA, C.—NARAYANAN, S.—JAY KUO—MATARIC, M. J. : Where am I? Scene Recognition for Mobile Robots using Audio Features, Proc. IEEE International Conference on Multimedia Expo06, 2006, pp. 885-888.

  • MAHER, R. C. : Audio Enhancement using Nonlinear Time-Frequency Filtering, Proc. Audio Engineering Society 26th Conf., Audio Forensics in the Digital Age, Denver, CO, July 2005, pp. 104-112.

  • MUSIALIK, C.—HATJE, U. : Frequency-Domain Processors for Efficient Removal of Noise and Unwanted Audio Events, Proc. Audio Engineering Society 26th Conf, Audio Forensics in the Digital Age, Denver, CO, July 2005, pp. 65-77.

  • CHAMPOD, C.—MEUWLY, D. : The Inference of Identity in Forensics Speaker Recognition, Speech Communication 31 (2000), 193-203.

  • CAMPBELL, J. P. et al : Forensics Speaker Recognition: A Need for Caution, IEEE Signal Process. Magazine (March 2009), 95-103.

  • TU-Berlin MPEG-7 Audio Analyzer. http://mpeg7lld.nue.tu-berlin.de/


Journal + Issues