In the development of the voice conversion and personification of the text-to-speech (TTS) systems, it is very necessary to have feedback information about the users’ opinion on the resulting synthetic speech quality. Therefore, the main aim of the experiments described in this paper was to find out whether the classifier based on Gaussian mixture models (GMM) could be applied for evaluation of different storytelling voices created by transformation of the sentences generated by the Czech and Slovak TTS system. We suppose that it is possible to combine this GMM-based statistical evaluation with the classical one in the form of listening tests or it can replace them. The results obtained in this way were in good correlation with the results of the conventional listening test, so they confirm practical usability of the developed GMM classifier. With the help of the performed analysis, the optimal setting of the initial parameters and the structure of the input feature set for recognition of the storytelling voices was finally determined.
The paper focuses on investigation of influence of the volume inserted in the scanning area of the magnetic resonance imaging (MRI) device working with a low magnetic field generated by a pair of permanent magnets on vibration and acoustic noise. In addition, its aim is to evaluate the influence of different types of used scan sequences, different settings of slice orientation and scan parameters on the energy and spectral properties of vibration and noise generated by the gradient coil system of the MRI device. Two basic measurements were performed within this work: mapping of sound pressure levels in the MRI device vicinity and parallel acquisition of vibration signals by sensors mounted on the lower and upper parts of the MRI gradient system. The paper next analyzes changes in properties of the vibration signals for the examined person lying in the scanning area compared with the situation of using only the testing phantom. Spectral characteristics of the recorded vibration signals are then analyzed statistically, and compared visually and numerically. The obtained results of the detailed analysis will be used for improvement of noise suppression algorithms applied to a speech signal recorded simultaneously with scanning of the human vocal tract for its 3D modeling.
The paper focuses on two methods of evaluation of successfulness of speech signal enhancement recorded in the open-air magnetic resonance imager during phonation for the 3D human vocal tract modeling. The first approach enables to obtain a comparison based on statistical analysis by ANOVA and hypothesis tests. The second method is based on classification by Gaussian mixture models (GMM). The performed experiments have confirmed that the proposed ANOVA and GMM classifiers for automatic evaluation of the speech quality are functional and produce fully comparable results with the standard evaluation based on the listening test method.
The contribution describes the effect of the fixed and removable orthodontic appliances on spectral properties of emotional speech. Spectral changes were analyzed and evaluated by spectrograms and mean Welch’s periodograms. This alternative approach to the standard listening test enables to obtain objective comparison based on statistical analysis by ANOVA and hypothesis tests. Obtained results of analysis performed on short sentences of a female speaker in four emotional states (joyous, sad, angry, and neutral) show that, first of all, the removable orthodontic appliance affects the spectrograms of produced speech.
The paper describes an experiment with using the Gaussian mixture models (GMM) for automatic classification of the speaker age and gender. It analyses and compares the influence of different number of mixtures and different types of speech features used for GMM gender/age classification. Dependence of the computational complexity on the number of used mixtures is also analysed. Finally, the GMM classification accuracy is compared with the output of the conventional listening tests. The results of these objective and subjective evaluations are in correspondence.
Quality of speech synthesis is a crucial issue in comparison of various text-to-speech (TTS) systems. We proposed a system for automatic evaluation of speech quality by statistical analysis of temporal features (time duration, phrasing, and time structuring of an analysed sentence) together with standard spectral and prosodic features. This system was successfully tested on sentences produced by a unit selection speech synthesizer with a male as well as a female voice using two different approaches to prosody manipulation. Experiments have shown that for correct, sharp, and stable results all three types of speech features (spectral, prosodic, and temporal) are necessary. Furthermore, the number of used statistical parameters has a significant impact on the correctness and precision of the evaluated results. It was also demonstrated that the stability of the whole evaluation process is improved by enlarging the used speech material. Finally, the functionality of the proposed system was verified by comparison of the results with those of the standard listening test.
Two basic tasks are covered in this paper. The first one consists in the design and practical testing of a new method for voice de-identification that changes the apparent age and/or gender of a speaker by multi-segmental frequency scale transformation combined with prosody modification. The second task is aimed at verification of applicability of a classifier based on Gaussian mixture models (GMM) to detect the original Czech and Slovak speakers after applied voice deidentification. The performed experiments confirm functionality of the developed gender and age conversion for all selected types of de-identification which can be objectively evaluated by the GMM-based open-set classifier. The original speaker detection accuracy was compared also for sentences uttered by German and English speakers showing language independence of the proposed method.
The paper describes our experiment with using the Gaussian mixture models (GMM) for classification of speech uttered by a person wearing orthodontic appliances. For the GMM classification, the input feature vectors comprise the basic and the complementary spectral properties as well as the supra-segmental parameters. Dependence of classification correctness on the number of the parameters in the input feature vector and on the computation complexity is also evaluated. In addition, an influence of the initial setting of the parameters for GMM training process was analyzed. Obtained recognition results are compared visually in the form of graphs as well as numerically in the form of tables and confusion matrices for tested sentences uttered using three configurations of orthodontic appliances.
Circular samples are the frequent objects of "in-vitro" investigation using imaging method based on magnetic resonance principles. The goal of our investigation is imaging of thin planar layers without using the slide selection procedure, thus only 2D imaging or imaging of selected layers of samples in circular vessels, eppendorf tubes,.. compulsorily using procedure "slide selection". In spite of that the standard imaging methods was used, some specificity arise when mathematical modeling of these procedure is introduced. In the paper several mathematical models were presented that were compared with real experimental results. Circular magnetic samples were placed into the homogenous magnetic field of a low field imager based on nuclear magnetic resonance. For experimental verification an MRI 0.178 Tesla ESAOTE Opera imager was used.