Performance Evaluation of Deep Neural Networks Applied to Speech Recognition: RNN, LSTM and GRU

Deep Neural Networks (DNN) are nothing but neural networks with many hidden layers. DNNs are becoming popular in automatic speech recognition tasks which combines a good acoustic with a language model. Standard feedforward neural networks cannot handle speech data well since they do not have a way to feed information from a later layer back to an earlier layer. Thus, Recurrent Neural Networks (RNNs) have been introduced to take temporal dependencies into account. However, the shortcoming of RNNs is that long-term dependencies due to the vanishing/exploding gradient problem cannot be handled. Therefore, Long Short-Term Memory (LSTM) networks were introduced, which are a special case of RNNs, that takes long-term dependencies in a speech in addition to short-term dependencies into account. Similarily, GRU (Gated Recurrent Unit) networks are an improvement of LSTM networks also taking long-term dependencies into consideration. Thus, in this paper, we evaluate RNN, LSTM, and GRU to compare their performances on a reduced TED-LIUM speech data set. The results show that LSTM achieves the best word error rates, however, the GRU optimization is faster while achieving word error rates close to LSTM.

eISSN:: 2083-2567
Language:: English

Publication timeframe:: 4 times per year
Journal Subjects:: Computer Sciences, Artificial Intelligence, Databases and Data Mining

Journal RSS Feed

Performance Evaluation of Deep Neural Networks Applied to Speech Recognition: RNN, LSTM and GRU

Published Online: Aug 30, 2019

Page range: 235 - 245

Received: Sep 29, 2018

Accepted: Mar 10, 2019

DOI: https://doi.org/10.2478/jaiscr-2019-0006

Keywords
Spectrogram, Connectionist Temporal Classification, TED-LIUM data set

© 2019 Apeksha Shewalkar et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Performance Evaluation of Deep Neural Networks Applied to Speech Recognition: RNN, LSTM and GRU

Published Online: Aug 30, 2019

Page range: 235 - 245

Received: Sep 29, 2018

Accepted: Mar 10, 2019

DOI: https://doi.org/10.2478/jaiscr-2019-0006

KeywordsSpectrogram, Connectionist Temporal Classification, TED-LIUM data set

© 2019 Apeksha Shewalkar et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Keywords
Spectrogram, Connectionist Temporal Classification, TED-LIUM data set