An Optimized Parallel Implementation of Non-Iteratively Trained Recurrent Neural Networks

[1] Yoshua Bengio, Patrice Simard, Paolo Frasconi, et al. Learning long-term dependencies with gradient descent is difficult. IEEE transactions on neural networks, 5(2):157–166, 1994.10.1109/72.279181Search in Google Scholar

[2] Stephen A Billings. Nonlinear system identification: NARMAX methods in the time, frequency, and spatio-temporal domains. John Wiley & Sons, 2013.10.1002/9781118535561Search in Google Scholar

[3] Armando Blanco, Miguel Delgado, and Maria C Pegalajar. A real-coded genetic algorithm for training recurrent neural networks. Neural networks, 14(1):93–105, 2001.10.1016/S0893-6080(00)00081-2Search in Google Scholar

[4] Kyunghyun Cho, Bart Van Merriënboer, Dzmitry Bahdanau, and Yoshua Bengio. On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259, 2014.Search in Google Scholar

[5] Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014.Search in Google Scholar

[6] Jan K Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, and Yoshua Bengio. Attention-based models for speech recognition. In Advances in neural information processing systems, pages 577–585, 2015.Search in Google Scholar

[7] Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555, 2014.Search in Google Scholar

[8] Jerome T Connor, R Douglas Martin, and Les E Atlas. Recurrent neural networks and robust time series prediction. IEEE transactions on neural networks, 5(2):240–254, 1994.10.1109/72.27918818267794Search in Google Scholar

[9] Jeffrey L Elman. Finding structure in time. Cognitive science, 14(2):179–211, 1990.10.1207/s15516709cog1402_1Search in Google Scholar

[10] Ömer Faruk Ertugrul. Forecasting electricity load by a novel recurrent extreme learning machines approach. International Journal of Electrical Power & Energy Systems, 78:429–435, 2016.10.1016/j.ijepes.2015.12.006Search in Google Scholar

[11] Martín Abadi et al. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org.Search in Google Scholar

[12] Alex Graves, Navdeep Jaitly, and Abdel-rahman Mohamed. Hybrid speech recognition with deep bidirectional lstm. In 2013 IEEE workshop on automatic speech recognition and understanding, pages 273–278. IEEE, 2013.10.1109/ASRU.2013.6707742Search in Google Scholar

[13] Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton. Speech recognition with deep recurrent neural networks. In 2013 IEEE international conference on acoustics, speech and signal processing, pages 6645–6649. IEEE, 2013.10.1109/ICASSP.2013.6638947Search in Google Scholar

[14] Qing He, Tianfeng Shang, Fuzhen Zhuang, and Zhongzhi Shi. Parallel extreme learning machine for regression based on mapreduce. Neurocomputing, 102:52–58, 2013.10.1016/j.neucom.2012.01.040Search in Google Scholar

[15] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.10.1162/neco.1997.9.8.17359377276Search in Google Scholar

[16] Guang-Bin Huang, Qin-Yu Zhu, Chee-Kheong Siew, et al. Extreme learning machine: a new learning scheme of feedforward neural networks. Neural networks, 2:985–990, 2004.Search in Google Scholar

[17] Shan Huang, Botao Wang, Junhao Qiu, Jitao Yao, Guoren Wang, and Ge Yu. Parallel ensemble of online sequential extreme learning machine based on mapreduce. Neurocomputing, 174:352–367, 2016.10.1016/j.neucom.2015.04.105Search in Google Scholar

[18] Weikuan Jia, Dean Zhao, Yuanjie Zheng, and Sujuan Hou. A novel optimized ga–elman neural network algorithm. Neural Computing and Applications, 31(2):449–459, 2019.10.1007/s00521-017-3076-7Search in Google Scholar

[19] Michael I Jordan. Serial order: A parallel distributed processing approach. In Advances in psychology, volume 121, pages 471–495. Elsevier, 1997.10.1016/S0166-4115(97)80111-2Search in Google Scholar

[20] Viacheslav Khomenko, Oleg Shyshkov, Olga Radyvonenko, and Kostiantyn Bokhan. Accelerating recurrent neural network training using sequence bucketing and multi-gpu data parallelization. In IEEE First International Conference on Data Stream Mining & Processing, pages 100–103. IEEE, 2016.10.1109/DSMP.2016.7583516Search in Google Scholar

[21] Siu Kwan Lam, Antoine Pitrou, and Stanley Seibert. Numba: A llvm-based python jit compiler. In Proceedings of the second Workshop on the LLVM Compiler Infrastructure in HPC, pages 1–6. ACM, 2015.Search in Google Scholar

[22] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521(7553):436, 2015.10.1038/nature14539Search in Google Scholar

[23] Jun Liu, Amir Shahroudy, Dong Xu, and Gang Wang. Spatio-temporal lstm with trust gates for 3d human action recognition. In European Conference on Computer Vision, pages 816–833. Springer, 2016.10.1007/978-3-319-46487-9_50Search in Google Scholar

[24] Jun Liu, Gang Wang, Ling-Yu Duan, Kamila Abdiyeva, and Alex C Kot. Skeleton-based human action recognition with global context-aware attention lstm networks. IEEE Transactions on Image Processing, 27(4):1586–1599, 2017.10.1109/TIP.2017.2785279Search in Google Scholar

[25] James Martens and Ilya Sutskever. Learning recurrent neural networks with hessian-free optimization. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), pages 1033–1040. Citeseer, 2011.Search in Google Scholar

[26] Travis Oliphant. Guide to NumPy. 01 2006.10.1142/S1793048006000136Search in Google Scholar

[27] Peng Ouyang, Shouyi Yin, and Shaojun Wei. A fast and power efficient architecture to parallelize lstm based rnn for cognitive intelligence applications. In Proceedings of the 54th Annual Design Automation Conference 2017, pages 1–6. ACM, 2017.10.1145/3061639.3062187Search in Google Scholar

[28] Yoh-Han Pao, Gwang-Hoon Park, and Dejan J Sobajic. Learning and generalization characteristics of the random vector functional-link net. Neurocomputing, 6(2):163–180, 1994.10.1016/0925-2312(94)90053-1Search in Google Scholar

[29] Jin-Man Park and Jong-Hwan Kim. Online recurrent extreme learning machine and its application to time-series prediction. In 2017 International Joint Conference on Neural Networks (IJCNN), pages 1983–1990. IEEE, 2017.10.1109/IJCNN.2017.7966094Search in Google Scholar

[30] Yara Rizk and Mariette Awad. On extreme learning machines in sequential and time series prediction: A non-iterative and approximate training algorithm for recurrent neural networks. Neurocomputing, 325:1–19, 2019.10.1016/j.neucom.2018.09.012Search in Google Scholar

[31] Jürgen Schmidhuber. Deep learning in neural networks: An overview. Neural networks, 61:85–117, 2015.10.1016/j.neunet.2014.09.003Search in Google Scholar

[32] Wouter F Schmidt, Martin A Kraaijveld, and Robert PW Duin. Feedforward neural networks with random weights. In 11th IAPR International Conference on Pattern Recognition. Vol. II. Conference B: Pattern Recognition Methodology and Systems, pages 1–4. IEEE, 1992.Search in Google Scholar

[33] Xavier Sierra-Canto, Francisco Madera-Ramirez, and Victor Uc-Cetina. Parallel training of a back-propagation neural network using cuda. In 2010 Ninth International Conference on Machine Learning and Applications, pages 307–312. IEEE, 2010.10.1109/ICMLA.2010.52Search in Google Scholar

[34] Zhiyuan Tang, Ying Shi, Dong Wang, Yang Feng, and Shiyue Zhang. Memory visualization for gated recurrent neural networks in speech recognition. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2736–2740. IEEE, 2017.10.1109/ICASSP.2017.7952654Search in Google Scholar

[35] Hubert AB Te Braake and Gerrit Van Straten. Random activation weight neural net (rawn) for fast non-iterative training. Engineering Applications of Artificial Intelligence, 8(1):71–80, 1995.10.1016/0952-1976(94)00056-SSearch in Google Scholar

[36] Mark Van Heeswijk, Yoan Miche, Erkki Oja, and Amaury Lendasse. Gpu-accelerated and parallelized elm ensembles for large-scale regression. Neurocomputing, 74(16):2430–2437, 2011.10.1016/j.neucom.2010.11.034Search in Google Scholar

[37] Botao Wang, Shan Huang, Junhao Qiu, Yu Liu, and Guoren Wang. Parallel online sequential extreme learning machine based on mapreduce. Neurocomputing, 149:224–232, 2015.10.1016/j.neucom.2014.03.076Search in Google Scholar

[38] Shang Wang, Yifan Bai, and Gennady Pekhimenko. Scaling back-propagation by parallel scan algorithm. arXiv preprint arXiv:1907.10134, 2019.Search in Google Scholar

[39] Xiaoyu Wang and Yong Huang. Convergence study in extended kalman filter-based training of recurrent neural networks. IEEE Transactions on Neural Networks, 22(4):588–600, 2011.10.1109/TNN.2011.210973721402512Search in Google Scholar

[40] Paul J Werbos et al. Backpropagation through time: what it does and how to do it. Proceedings of the IEEE, 78(10):1550–1560, 1990.10.1109/5.58337Search in Google Scholar

[41] Ronald J Williams and David Zipser. Gradient-based learning algorithms for recurrent. Backpropagation: Theory, architectures, and applications, 433, 1995.Search in Google Scholar

[42] Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144, 2016.Search in Google Scholar

[43] Feng Zhang, Jidong Zhai, Marc Snir, Hai Jin, Hironori Kasahara, and Mateo Valero. Guest editorial: Special issue on network and parallel computing for emerging architectures and applications, 2019.10.1007/s10766-019-00634-1Search in Google Scholar

[44] Shunlu Zhang, Pavan Gunupudi, and Qi-Jun Zhang. Parallel back-propagation neural network training technique using cuda on multiple gpus. In IEEE MTT-S International Conference on Numerical Electromagnetic and Multiphysics Modeling and Optimization, pages 1–3. IEEE, 2015.10.1109/NEMO.2015.7415056Search in Google Scholar

eISSN:: 2083-2567
Language:: English

Publication timeframe:: 4 times per year
Journal Subjects:: Computer Sciences, Databases and Data Mining, Artificial Intelligence

Journal RSS Feed

An Optimized Parallel Implementation of Non-Iteratively Trained Recurrent Neural Networks

Published Online: Dec 03, 2020

Page range: 33 - 50

Received: May 07, 2020

Accepted: Sep 14, 2020

DOI: https://doi.org/10.2478/jaiscr-2021-0003

KeywordsGPU implementation, parallelization, Recurrent Neural Network (RNN), Long-short Term Memory (LSTM), Gated Recurrent Unit (GRU), Extreme Learning Machines (ELM), non-iterative training

© 2021 Julia El Zini et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Keywords
GPU implementation, parallelization, Recurrent Neural Network (RNN), Long-short Term Memory (LSTM), Gated Recurrent Unit (GRU), Extreme Learning Machines (ELM), non-iterative training