Environment Recognition for Digital Audio Forensics Using MPEG-7 and MEL Cepstral Features
Environment recognition from digital audio for forensics application is a growing area of interest. However, compared to other branches of audio forensics, it is a less researched one. Especially less attention has been given to detect environment from files where foreground speech is present, which is a forensics scenario. In this paper, we perform several experiments focusing on the problems of environment recognition from audio particularly for forensics application. Experimental results show that the task is easier when audio files contain only environmental sound than when they contain both foreground speech and background environment. We propose a full set of MPEG-7 audio features combined with mel frequency cepstral coefficients (MFCCs) to improve the accuracy. In the experiments, the proposed approach significantly increases the recognition accuracy of environment sound even in the presence of high amount of foreground human speech.
., eds.), LNCS 3658, Springer-Verlag, Berlin Heidelberg, 2005, pp. 371–378.  JEONG, Y. : Joint Speaker and Environment Adaptation Using TensorVoice for Robust Speech Recognition, Speech Communication 58 (2014), 1–10.  REYNOLDS, D. A.—ROSE, R. C. : Robust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models, IEEE Transactions on Speech and Audio Processing 3 (1995), 72–83.  MUHAMMAD, G.—ALGHATHBAR, K. : EnvironmentRecognition for Digital Audio Forensics Using MPEG-7 and Mel Cepstral Features, Journal of Electrical Engineering
speakers”, Speech Communication , 2010, 52, (7-8), 638–651.  C. M. Bishop, “Pattern Recognition and Machine Learning”, Springer ,.  G. Muhammad and K. Alghathbar, “Environmentrecognition for digital audio forensics using MPEG-7 and mel cepstral features”, Journal of Electrical Engineering , 2011, 62, (4), 199–205.  J. Přibil and A. Přibilová, “GMM-based evaluation of emotional style transformation in Czech and Slovak”, Cognitive Computation , 2014, 6, (4), 928–939.  J. Přibil and A. Přibilová, “Comparison of text-independent original speaker