Hand Gesture Recognition in Video Sequences Using Deep Convolutional and Recurrent Neural Networks

  • 1 Electrical and Computer Engineering Department, University of Kashan, Kashan
  • 2 Department of Computer Science, Aalto University, Helsinki, Finland


Deep learning is a new branch of machine learning, which is widely used by researchers in a lot of artificial intelligence applications, including signal processing and computer vision. The present research investigates the use of deep learning to solve the hand gesture recognition (HGR) problem and proposes two models using deep learning architecture. The first model comprises a convolutional neural network (CNN) and a recurrent neural network with a long short-term memory (RNN-LSTM). The accuracy of model achieves up to 82 % when fed by colour channel, and 89 % when fed by depth channel. The second model comprises two parallel convolutional neural networks, which are merged by a merge layer, and a recurrent neural network with a long short-term memory fed by RGB-D. The accuracy of the latest model achieves up to 93 %.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • [1] P. Premaratne, “Historical development of hand gesture recognition”, in Human Computer Interaction Using Hand Gestures. Cognitive Science and Technology. Singapore: Springer, 2014, pp. 5–29. https://doi.org/10.1007/978-981-4585-69-9_2

  • [2] C. S. Chua, H. Guan, Y. K. Ho, “Model-based 3D hand posture estimation from a single 2D image”, Image and Vision computing, vol. 20, no. 3, 2002, pp. 191–202. https://doi.org/10.1016/S0262-8856(01)00094-4

  • [3] Z. Lai, Z. Yao, C. Wang, H. Liang, H. Chen, W. Xia, “Fingertips detection and hand gesture recognition based on discrete curve evolution with a kinect sensor”, 2016 Visual Communications and Image Processing (VCIP), IEEE, pp. 1–4, 2016. https://doi.org/10.1109/VCIP.2016.7805464

  • [4] C. Wang, Z. Liu, M. Zhu, J. Zhao, S. C. Chan, “A hand gesture recognition system based on canonical superpixel-graph”, Signal Processing: Image Communication, vol. 58, pp. 87–98, 2017. https://doi.org/10.1016/j.image.2017.06.015

  • [5] A. Joshi, C. Monnier, M. Betke, S. Sclaro, “A random forest approach to segmenting and classifying gestures”, 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), IEEE. pp. 1–7, 2015. https://doi.org/10.1109/FG.2015.7163126

  • [6] A. Ghotkar, P. Vidap, K. Deo, “Dynamic hand gesture recognition using hidden Markov Model by Microsoft Kinect Sensor”, International Journal of Computer Applications, vol. 150, no. 5, pp. 5–9, 2016. https://doi.org/10.5120/ijca2016911498

  • [7] H. D. Yang, “Sign language recognition with the kinect sensor based on conditional random fields”, Sensors, vol. 15, no. 1, pp. 135–147, 2015. https://doi.org/10.3390/s150100135

  • [8] A. Joshi, S. Ghosh, M. Betke, S. Sclaro, H. Pfister, “Personalizing gesture recognition using hierarchical bayesian neural networks”, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 6513–6522. https://doi.org/10.1109/CVPR.2017.56

  • [9] F. J. Ordó˜nez, D. Roggen, “Deep convolutional and LSTM recurrent neural networks for multimodal wearable activity recognition”, Sensors, vol. 16, no. 1, p. 115, 2016. https://doi.org/10.3390/s16010115

  • [10] P. Molchanov, S. Gupta, K. Kim, J. Kautz, “Hand gesture recognition with 3D convolutional neural networks”, in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition workshops, 2015, pp. 1–7. https://doi.org/10.1109/CVPRW.2015.7301342

  • [11] N. C. Camgoz, S. Hadfield, O. Koller, R. Bowden, “Using convolutional 3D neural networks for user-independent continuous gesture recognition”, in 23rd International Conference on Pattern Recognition (ICPR), IEEE, 2016, pp. 49–54. https://doi.org/10.1109/ICPR.2016.7899606

  • [12] V. John, A. Boyali, S. Mita, M. Imanishi, N. Sanma, “Deep learning based fast hand gesture recognition using representative frames”, in International Conference on Digital Image Computing: Techniques and Applications (DICTA), 2016, IEEE, 2016, pp. 1–8. https://doi.org/10.1109/DICTA.2016.7797030

  • [13] K. Lai, S. N. Yanushkevich, “CNN+RNN depth and skeleton based dynamic hand gesture recognition”, in 24th International Conference on Pattern Recognition (ICPR), IEEE, 2018, pp. 3451–3456. https://doi.org/10.1109/ICPR.2018.8545718

  • [14] M. Ma, Z. Gao, J. Wu, Y. Chen, Q. Zhu, “A recognition method of hand gesture based on stacked denoising autoencoder”, Proceedings of the Fifth Euro-China Conference on Intelligent Data Analysis and Applications, Advances in Intelligent Systems and Computing, Springer, Cham, vol. 891, 2018, pp. 736–744. https://doi.org/10.1007/978-3-030-03766-6_83

  • [15] K. Schindler, L. Van Gool, “Action snippets: How many frames does human action recognition require?”, in IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, IEEE, 2008, pp. 1–8. https://doi.org/10.1109/CVPR.2008.4587730

  • [16] E. Ohn-Bar, M. M. Trivedi, “Hand gesture recognition in real time for automotive interfaces: A multimodal vision-based approach and evaluations”, IEEE Transactions on Intelligent Transportation Systems vol. 15, 2014, pp. 2368–2377. https://doi.org/10.1109/TITS.2014.2337331

  • [17] O. Oreifej, Z. Liu, “Hon4d: Histogram of oriented 4D normals for activity recognition from depth sequences”, in Proceedings of the IEEE conference on computer vision and pattern recognition, 2013, pp. 716–723. https://doi.org/10.1109/CVPR.2013.98


Journal + Issues