Two Methods of Automatic Evaluation of Speech Signal Enhancement Recorded in the Open-Air MRI Environment

Open access

Abstract

The paper focuses on two methods of evaluation of successfulness of speech signal enhancement recorded in the open-air magnetic resonance imager during phonation for the 3D human vocal tract modeling. The first approach enables to obtain a comparison based on statistical analysis by ANOVA and hypothesis tests. The second method is based on classification by Gaussian mixture models (GMM). The performed experiments have confirmed that the proposed ANOVA and GMM classifiers for automatic evaluation of the speech quality are functional and produce fully comparable results with the standard evaluation based on the listening test method.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • [1] Wei J. Liu J. Fang Q. Lu W. Dang J. Honda K. (2016). A novel method for constructing 3D geometric articulatory models. Journal of Signal Processing Systems 82 295-302.

  • [2] Aalto D. Aaltonen O. Happonen R.-P. et al. (2014). Large scale data acquisition of simultaneous MRI and speech. Applied Acoustics 83 64-75.

  • [3] Kuorti J. Malinen J. Ojalammi A. (2018). Postprocessing speech recordings during MRI. Biomedical Signal Processing and Control 39 11-22.

  • [4] Tomasi D. Ernst T. (2006). A simple theory for vibration of MRI gradient coils. Brazilian Journal of Physics 36 34-39.

  • [5] Burdumy M. Traser L. Richter B. et al. (2015). Acceleration of MRI of the vocal tract provides additional insight into articulator modifications. Journal of Magnetic Resonance Imaging 42 925-935.

  • [6] Lee N. Park Y. Lee G.W. (2017). Frequencydomain active noise control for magnetic resonance imaging acoustic noise. Applied Acoustics 118 30-38.

  • [7] Wu Z. Kim Y.C. Khoo M.C.K. Nayak K.S. (2014). Evaluation of an independent linear model for acoustic noise on a conventional MRI scanner and implications for acoustic noise reduction. Magnetic Resonance in Medicine 71 1613-1620.

  • [8] Oveisi A. Nestorović T. (2016). Mu-synthesis based active robust vibration control of an MRI inlet. Facta Universitatis Series: Mechanical Engineering 14 (1) 37-53.

  • [9] Sun G. Li M. Rudd B.W. et al. (2015). Adaptive speech enhancement using directional microphone in a 4-T scanner. Magnetic Resonance Materials in Physics Biology and Medicine 28 473-484.

  • [10] Patil D. Das N. Routray A. (2011). Implementation of Fast-ICA: A performance based comparison between floating point and fixed point DSP platform. Measurement Science Review 11 (4) 118-124.

  • [11] Přibil J. Horáček J. Horák P. (2011). Two methods of mechanical noise reduction of recorded speech during phonation in an MRI device. Measurement Science Review 11 (3) 92-98.

  • [12] Přibil J. Přibilová A. Frollo I. (2016). Analysis of acoustic noise and its suppression in speech recorded during scanning in the open-air MRI. In Advances in Noise Analysis Mitigation and Control. Rijeka Croatia: InTech 205-228.

  • [13] Grůber M. Matoušek J. (2010). Listening-test-based annotation of communicative functions for expressive speech synthesis. In Text Speech and Dialogue (TSD) 2010 LNCS 6231 Springer 283-290.

  • [14] Sen D. Lu W. (2017). Systems and methods for measuring speech signal quality. US Patent 9679555.

  • [15] Rencher A.C. Schaalje G.B. (2008). Linear Models in Statistics Second Edition. John Wiley & Sons.

  • [16] Lee C.Y. Lee Z.J. (2012). A novel algorithm applied to classify unbalanced data. Applied Soft Computing 12 2481-2485.

  • [17] Mizushima T. (2000). Multisample tests for scale based on kernel density estimation. Statistics & Probability Letters 49 81-91.

  • [18] Altman D.G. Machin D. Bryant T.N. Gardner M.J. (2000). Statistics with Confidence: Confidence Intervals and Statistical Guidelines 2nd edition. London: BMJ Books.

  • [19] Glowacz A. Glowacz Z. (2017). Diagnosis of stator faults of the single-phase induction motor using acoustic signals. Applied Acousticss 117 20-27.

  • [20] Bapat O.A. Fastow R.M. Olson J. (2013). Acoustic coprocessor for HMM based embedded speech recognition systems. IEEE Transactions on Consumer Electronics 59 (3) 629-633.

  • [21] Bhardwaj S. Srivastava S. Hanmandlu M. Gupta J.R.P. (2013). GFM-based methods for speaker identification. IEEE Transaction on Cybernetics 43 (3) 1047-1058.

  • [22] Vít J. Matoušek J. (2013). Concatenation artifact detection trained from listeners evaluations. In Text Speech and Dialogue 2013 LNAI 8082 Springer 169-176.

  • [23] Reynolds D.A. Rose R.C. (1995). Robust textindependent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing 3 72-83.

  • [24] Campbell W.M. Campbell J.P. Reynolds D.A. Singer E. Torres-Carrasquillo P.A. (2006). Support vector machines for speaker and language recognition. Computer Speech and Language 20 (2-3) 210-229.

  • [25] Rodellar-Biarge V. Palacios-Alonso D. Nieto-Lluis V. Gómez-Vilda P. (2015). Towards the search of detection in speech-relevant features for stress. Expert Systems 32 (6) 710-718.

  • [26] Mekyska J. Janousova E. Gomez-Vilda P. et al. (2015). Robust and complex approach of pathological speech signal analysis. Neurocomputing 167 94-111.

  • [27] Bishop C.M. (2006). Pattern Recognition and Machine Learning. Springer.

  • [28] Venturini A. Zao L. Coelho R. (2014). On speech features fusion α-integration Gaussian modeling and multi-style training for noise robust speaker classification. IEEE/ACM Transactions on Audio Speech and Language Processing 22 (12) 1951-1964.

  • [29] Chakroun R. Zouari L.B. Frikha M. (2016). An improved approach for text-independent speaker recognition. International Journal of Advanced Computer Science and Applications 7 (8) 343-348.

  • [30] Sharma R. Prasanna S.R.M. Bhukya R.K. Das R.K. (2017). Analysis of the intrinsic mode functions for speaker information. Speech Communication 91 1-16.

  • [31] Glowacz A. (2015) Recognition of acoustic signals of synchronous motors with the use of MoFS and selected classifiers. Measurement Science Review 15 (4) 167-175.

  • [32] Esaote S.p.A. (2008). E-scan Opera. Image Quality and Sequences Manual. 830023522 Rev. A.

  • [33] Přibil J. Gogola D. Dermek T. Frollo I. (2012). Design realization and experiments with a new RF head probe coil for human vocal tract imaging in an NMR device. Measurement Science Review 12 (3) 98-103.

  • [34] Nabney I.T. (2004). Netlab Pattern Analysis Toolbox Release 3.3. http://www.aston.ac.uk/eas/research/groups/ncrg/resources/netlab/downloads.

  • [35] Přibil J. Přibilová A. (2013). Internet application for collective realization of speech evaluation by listening tests. In Proceedings of the International Conference on Applied Electronics (AE2013) Plzeň Czech Republic 225-228.

Search
Journal information
Impact Factor

IMPACT FACTOR 2018: 1.122
5-year IMPACT FACTOR: 1.157

CiteScore 2018: 1.39

SCImago Journal Rank (SJR) 2018: 0.325
Source Normalized Impact per Paper (SNIP) 2018: 0.881

Cited By
Metrics
All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 263 121 3
PDF Downloads 110 72 1