Search Results

1 - 9 of 9 items :

Clear All

References 1. Lazarowska A.: Decision support system for collision avoidance at sea. Polish Maritime Research, 2012 (Special Issue), pp.19-24. 2. Lazarowska A.: Swarm intelligence approach to safe ship control. Polish Maritime Research, 2015(4), pp. 33-40. 3. Zhizeng L., Jinghing Z.: Speech recognition and its application in voiced-based robot control system. International Conference on Intelligent Mechatronics and Automation, 0-7803-8748-1, 2004. 4. Bala A., Kumar A., Birla N.: Voice Command Recognition System Based on MFCC and DTW. International Journal of

Environment Recognition for Digital Audio Forensics Using MPEG-7 and MEL Cepstral Features

Environment recognition from digital audio for forensics application is a growing area of interest. However, compared to other branches of audio forensics, it is a less researched one. Especially less attention has been given to detect environment from files where foreground speech is present, which is a forensics scenario. In this paper, we perform several experiments focusing on the problems of environment recognition from audio particularly for forensics application. Experimental results show that the task is easier when audio files contain only environmental sound than when they contain both foreground speech and background environment. We propose a full set of MPEG-7 audio features combined with mel frequency cepstral coefficients (MFCCs) to improve the accuracy. In the experiments, the proposed approach significantly increases the recognition accuracy of environment sound even in the presence of high amount of foreground human speech.

References [1] Z. Jiang, H. Huang, S. Yang, S. Lu, and Z. Hao, “Acoustic Feature Comparison of MFCC and CZT-Based Cepstrum for Speech Recognition,” in Proceedings of 5th International Conference on Natural Computation, 2009, pp. 55-59. [2] L. Deng, J. Wu, J. Droppo, and A. Acero, “Analysis and comparison of two speech feature extraction/compensation algorithms,” IEEE Signal Processing Letters, vol. 12, no. 6, pp. 477-480, Jun. 2005. [3] S. Davis and P. Mermelstein, “Comparison of parametric representations for monosyllabic word recognition in continuously spoken

”, Physiological measurement 31 (4) (2010) 513. [32] D. B. Springer, L. Tarassenko, and G. D. Clifford, “Logistic regression-HSMM-based heart sound segmentation”, IEEE Transactions on Biomedical Engineering 63 (4) (2016) 822-832. [33] H. Uguz, “Adative neuro-fuzzy inference system for diagnosis of the heart valve diseases using wavelet transform with entropy”, Neural Computing Applications 21 (7) (2012) 1617-1628. [34] M. R. Hasan, M. Jamil, M. Rahman et al , “Speaker identification using MEL frequency cepstral coeficients”, 1(4). [35] V. Tiwari, “MFCC its applications in

using auditory inspired time-frequency analysis . Applied Acoustics, No. 4, Vol. 78, p. 68–76. 21. Wang, Wenbo, et al. (2016): Feature ex5traction of underwater target in auditory sensation area based on MFCC . Ocean Acoustics, IEEE, p. 1–6. 22. Zhang, Lanyue, et al. (2016): Feature Extraction of Underwater Target Signal Using Mel Frequency Cepstrum Coefficients Based on Acoustic Vector Sensor . Journal of Sensors, Vol. 4, p. 1–11.

Abstract

Automatic classification methods, such as artificial neural networks (ANNs), the k-nearest neighbor (kNN) and self-organizing maps (SOMs), are applied to allophone analysis based on recorded speech. A list of 650 words was created for that purpose, containing positionally and/or contextually conditioned allophones. For each word, a group of 16 native and non-native speakers were audio-video recorded, from which seven native speakers’ and phonology experts’ speech was selected for analyses. For the purpose of the present study, a sub-list of 103 words containing the English alveolar lateral phoneme /l/ was compiled. The list includes ‘dark’ (velarized) allophonic realizations (which occur before a consonant or at the end of the word before silence) and 52 ‘clear’ allophonic realizations (which occur before a vowel), as well as voicing variants. The recorded signals were segmented into allophones and parametrized using a set of descriptors, originating from the MPEG 7 standard, plus dedicated time-based parameters as well as modified MFCC features proposed by the authors. Classification methods such as ANNs, the kNN and the SOM were employed to automatically detect the two types of allophones. Various sets of features were tested to achieve the best performance of the automatic methods. In the final experiment, a selected set of features was used for automatic evaluation of the pronunciation of dark /l/ by non-native speakers.

–5, Sept 2013. [16] D. Reynolds. Gaussian mixture models. Encyclopedia of Biometrics , pages 659–663, 2009. [17] J. B. Tenenbaum, V. D. Silva, and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science , 290(5500):2319–2323, 2000. [18] G. Wen, L. Jiang, and J. Wen. Using locally estimated geodesic distance to optimize neighborhood graph for isometric data embedding. Pattern Recognition , 41(7):2226 – 2236, 2008. [19] R. Weychan, T. Marciniak, and A. Dabrowski. Analysis of differences between MFCC after multiple GSM transcodings

(2013), 36-43. [19] TÓTH, L. : Phone Recognition with Deep Sparse Rectifier Neural Networks, Proceedings of ICASSP (2013), 6985-6989. [20] SELTZER, M.-YU, D.-WANG, Y. : An Investigation of Deep Neural Networks for Noise Robust Speech Recognition, Proceedings of ICASSP (2013), 7398-7402. [21] KOVÁCS, GY.-TÓTH, L. : Joint Optimization of Spectro- Temporal Features and Deep Neural Nets for Robust Automatic Speech Recognition, Acta Cybernetica 22 No. 1 (2015), 117-134. [22] JAIN, P.-HERMANSKY, H.-KINGSBURY, B. : Distributed Speech Recognition Using Noise-Robust MFCC and

., Mansor, W., Yassin, I. M., & Sahak, R. (2011). Binary particle swarm optimization for feature selection in detection of infants with hypothyroidism. In Engineering in Medicine and Biology Society, EMBC, 2011 Annual International Conference of the IEEE (pp. 2772-2775). Zabidi, A., Mansor, W., Lee Yoot Khuan, Yassin, I., & Sahak, R. (2010). Discrete Mutative Particle Swarm Optimisation of MFCC computation for classifying hypothyroidal infant cry. In Computer Applications and Industrial Electronics (ICCAIE), 2010 International Conference on (pp. 588-592). Zabidi, A