Mining Automatically Estimated Poses from Video Recordings of Top Athletes

R. Lienhart 1 , M. Einfalt 1 , and D. Zecha 1
  • 1 Multimedia Computing and Computer Vision Lab, Computer Science Department, University of Augsburg, Germany


Human pose detection systems based on state-of-the-art DNNs are about to be extended, adapted and re-trained to fit the application domain of specific sports. Therefore, plenty of noisy pose data will soon be available from videos recorded at a regular and frequent basis. This work is among the first to develop mining algorithms that can mine the expected abundance of noisy and annotation-free pose data from video recordings in individual sports. Using swimming as an example of a sport with dominant cyclic motion, we show how to determine unsupervised time-continuous cycle speeds and temporally striking poses as well as measure unsupervised cycle stability over time. The average error in cycle length estimation across all strokes is 0.43 frames at 50 fps compared to manual annotations. Additionally, we use long jump as an example of a sport with a rigid phase-based motion to present a technique to automatically partition the temporally estimated pose sequences into their respective phases with a mAP of 0.89. This enables the extraction of performance relevant, pose-based metrics currently used by national professional sports associations. Experimental results prove the effectiveness of our mining algorithms, which can also be applied to other cycle-based or phase-based types of sport.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • Andriluka, M., Pishchulin, L., Gehler, P., & Schiele, B. (2014). 2D human pose estimation: New benchmark and state of the art analysis. In IEEE Conference on Computer Vision and Pattern Recognition (cvpr), 3686–3693.

  • Baysal, S., Kurt, M. C., & Duygulu, P. (2010). Recognizing human actions using key poses. In 20th International Conference on Pattern Recognition (ICPR), 1727–1730.

  • Chu, X., Yang, W., Ouyang, W., Ma, C., Yuille, A. L., & Wang, X. (2017). Multicontext attention for human pose estimation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1831–1840.

  • de Souza Vicente, C. M., Nascimento, E. R., Emery, L. E. C., Flor, C. A. G., Vieira, T., & Oliveira, L. B. (2016). High performance moves recognition and sequence segmentation based on key poses filtering. In IEEE Winter Conference on Applications of Computer Vision (WACV), 1–8.

  • Einfalt, M., Zecha, D., & Lienhart, R. (2018). Activity-conditioned continuous human pose estimation for performance analysis of athletes using the example of swimming. In IEEE Winter Conference on Applications of Computer Vision (WACV), 446–455.

  • Frey, B. J., & Dueck, D. (2007). Clustering by passing messages between data points. Science, 315 (5814), 972–976.

  • Gorban, A., Idrees, H., Jiang, Y.-G., Roshan Zamir, A., Laptev, I., Shah, M., & Sukthankar, R. (2015). THUMOS challenge: Action recognition with a large number of classes.

  • Heilbron, F. C., Escorcia, V., Ghanem, B., & Niebles, J. C. (2015). Activitynet: A large-scale video benchmark for human activity understanding. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 961–970.

  • Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady, 10, 707–710.

  • Li, H., Tang, J., Wu, S., Zhang, Y., & Lin, S. (2010). Automatic detection and analysis of player action in moving background sports video sequences. IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 20 (3), 351–364.

  • Lv, F., & Nevatia, R. (2007). Single view human action recognition using key pose matching and viterbi path searching. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1–8.

  • Meyers, E. W. (1994). A sublinear algorithm for approximate keyword matching. Algorithmica, 12 (4-5), 345–374.

  • Newell, A., Yang, K., & Deng, J. (2016). Stacked hourglass networks for human pose estimation. In B. Leibe, J. Matas, N. Sebe, & M. Welling (Eds.), European Conference on Computer Vision (ECCV) (pp. 483–499). Cham: Springer International Publishing.

  • Pansold, B., Zinner, J., & Gabriel, B. (1985). Zum einsatz und zur interpretation von laktatbestimmungen in der leistungsdiagnostik. Theorie und Praxis des Leistungssports, 23, 98–195.

  • Pyne, D. B., Lee, H., & Swanwick, K. M. (2001). Monitoring the lactate threshold in world-ranked swimmers. Medicine and Science in Sports and Exercise, 33 (2), 291–297.

  • Rabiner, L. R. (1989, Feb). A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 77 (2), 257-286. doi: 10.1109/5.18626

  • Ren, C., Lei, X., & Zhang, G. (2011). Motion data retrieval from very large motion databases. In International Conference on Virtual Reality and Visualization (ICVRV), 70–77.

  • Rowley, H. A., Baluja, S., & Kanade, T. (1998). Neural network-based face detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20 (1), 23–38.

  • Sedmidubsky, J., Valcik, J., & Zezula, P. (2013). A key-pose similarity algorithm for motion data retrieval. In 15th International Conference on Advanced Concepts for Intelligent Vision Systems (ACIVS), 669–681.

  • Victor, B., He, Z., Morgan, S., & Miniutti, D. (2017). Continuous video to simple signals for swimming stroke detection with convolutional neural networks. In IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 122–131.

  • Vögele, A., Krüger, B., & Klein, R. (2014). Efficient unsupervised temporal segmentation of human motion. In Proceedings of the ACM Siggraph/Eurographics Symposium on Computer Animation, 167–176.

  • Wang, C., Wang, Y., & Yuille, A. L. (2013). An approach to pose-based action recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 915–922.

  • Wei, S.-E., Ramakrishna, V., Kanade, T., & Sheikh, Y. (2016). Convolutional pose machines. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 4724–4732.

  • Wu, C., Ma, Y.-F., Zhan, H.-J., & Zhong, Y.-Z. (2002). Events recognition by semantic inference for sports video. In IEEE International Conference on Multimedia and Expo (ICME), 1, 805–808.

  • Yang, W., Li, S., Ouyang, W., Li, H., & Wang, X. (2017, Oct). Learning feature pyramids for human pose estimation. In IEEE International Conference on Computer Vision (ICCV).

  • Zecha, D., Eggert, C., & Lienhart, R. (2017). Pose estimation for deriving kinematic parameters of competitive swimmers. In Computer Vision Applications in Sports, part of IS&T Electronic Imaging (pp. 21–29). Society for Imaging Science and Technology.


Journal + Issues