Modelling Match Outcome in Australian Football: Improved accuracy with large databases

Open access


Mathematical models that explain match outcome, based on the value of technical performance indicators (PIs), can be used to identify the most important aspects of technical performance in team field-sports. The purpose of this study was to evaluate several methodological opportunities, to enhance the accuracy of this type of modelling. Specifically, we evaluated the potential benefits of 1) modelling match outcome using an increased number of seasons and PIs compared with previous reports, 2) how to identify eras where technical performance characteristics were stable and 3) the application of a novel feature selection method. Ninety-one PIs across sixteen Australian Football (AF) League seasons were analysed. Change-point and Segmented Regression analyses were used to identify eras and they produced similar but non-identical outcomes. A feature selection ensemble method identified the most valuable 45 PIs for modelling. The use of a larger number of seasons for model development lead to improvement in the classification accuracy of the models, compared with previous studies (88.8 vs 78.9%). This study demonstrates the potential benefits of large databases when creating models of match outcome and the pitfalls of determining whether there are eras in a longitudinal database.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • Abeel T. Helleputte T. Van de Peer Y. Dupont P. & Saeys Y. J. B. (2009). Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. 26(3) pp. 392-398.

  • Berthelot G. Tafflet M. El Helou N. Len S. Escolano S. Guillaume M. ... Desgorces F. D. (2010). Athlete atypicity on the edge of human achievement: performances stagnate after the last peak in 1988. PloS One 5(1) p e8800.

  • Castellano J. Casamichana D. & Lago C. (2012). The use of match statistics that discriminate between successful and unsuccessful soccer teams. Journal of human kinetics 31 pp. 137-147.

  • Champion Data. (2017). AFL Prospectus: The Essential Number-Cuncher For Season 2017 (12th ed.): Champion Data Pty Ltd.

  • Fernandez-Navarro J. Fradua L. Zubillaga A. Ford P. R. & McRobert A. P. (2016). Attacking and defensive styles of play in soccer: analysis of Spanish and English elite teams. Journal of Sports Sciences 34(24) pp. 2195-2204.

  • Gómez M. A. Gómez-Lopez M. Lago C. & Sampaio J. (2012). Effects of game location and final outcome on game-related statistics in each zone of the pitch in professional football. European Journal of Sport Science 12(5) pp. 393-398.

  • Gómez M. A. Lorenzo A. Barakat R. Ortega E. & José M P. (2008). Differences in game-related statistics of basketball performance by game location for men's winning and losing teams. Perceptual and Motor Skills 106(1) pp. 43-50.

  • Hall M. Witten I. & Frank E. (2011). Data mining: Practical machine learning tools and techniques. Kaufmann Burlington

  • Hastie T. Tibshirani R. & Friedman J. (2013). The Elements of Statistical Learning: Data Mining Inference and Prediction: Springer New York.

  • Higham D. G. Hopkins W. G. Pyne D. B. & Anson J. M. (2014). Performance indicators related to points scoring and winning in international rugby sevens. Journal of Sports Science & Medicine 13(2) p 358.

  • Jacklin P. B. (2005). Temporal changes in home advantage in English football since the Second World War: What explains improved away performance? Journal of Sports Sciences 23(7) pp. 669-679. Retrieved from

  • Jones N. M. P. Mellalieu S. D. & James N. (2004). Team performance indicators as a function of winning and losing in rugby union. International Journal of Performance Analysis in Sport 4(1) pp. 61-71. Retrieved from

  • Lago-Peñas C. Lago-Ballesteros J. & Rey E. (2011). Differences in performance indicators between winning and losing teams in the UEFA Champions League. Journal of human kinetics 27 pp. 135-146.

  • Levendis J. D. (2018). Stationarity and Invertibility Time Series Econometrics: Learning Through Replication (pp. 81-99). Cham: Springer International Publishing.

  • Liu H. Gomez M.-Á. Lago-Peñas C. & Sampaio J. (2015). Match statistics related to winning in the group stage of 2014 Brazil FIFA World Cup. Journal of Sports Sciences 33(12) pp. 1205-1213.

  • Luo W. Phung D. Tran T. Gupta S. Rana S. Karmakar C. . . . Ho T. B. (2016). Guidelines for Developing and Reporting Machine Learning Predictive Models in Biomedical Research: A Multidisciplinary View. Journal of Medical Internet Research 18(12)

  • Moura F. A. Martins L. E. B. & Cunha S. A. (2014). Analysis of football game-related statistics using multivariate techniques. Journal of Sports Sciences 32(20) pp. 1881-1887.

  • Muggeo V. M. (2003). Estimating regression models with unknown break-points. Statistics in Medicine 22(19) pp. 3055-3071.

  • Muggeo V. M. & Muggeo M. V. M. (2017). Package ‘segmented’. Biometrika 58 pp. 525-534.

  • O'Brien R. M. (2017). Dropping highly collinear variables from a model: Why it typically is not a good idea. Social Science Quarterly 98(1) pp. 360-375.

  • O'Donoghue P. (2009). Research methods for sports performance analysis: Routledge.

  • O’Donoghue P. Ball D. Eustace J. McFarlan B. & Nisotaki M. (2016). Predictive models of the 2015 Rugby World Cup: Accuracy and application. 15(1) pp. 37-58.

  • O’Shaughnessy D. M. (2006). Possession versus position: strategic evaluation in AFL. Journal of Sports Science and Medicine 5(4) pp. 533-540.

  • Ofoghi B. Zeleznikow J. MacMahon C. & Raab M. (2013). Data mining in elite sports: a review and a framework. Measurement in Physical Education and Exercise Science 17(3) pp. 171-186.

  • Robertson S. Back N. & Bartlett J. D. (2016). Explaining match outcome in elite Australian Rules football using team performance indicators. Journal of Sports Sciences 34(7) pp. 637-644.

  • Robertson S. Gupta R. & McIntosh S. (2016). A method to assess the influence of individual player performance distribution on match outcome in team sports. Journal of Sports Sciences pp. 1-8.

  • Stewart M. Mitchell H. & Stavros C. (2007). Moneyball applied: Econometrics and the identification and recruitment of elite Australian footballers. International Journal of Sport Finance 2(4) pp. 231-248.

  • Taylor W. A. (2000). Change-point analysis: a powerful new tool for detecting changes. Retrieved Date from

  • R Studio Team. (2015). RStudio: integrated development for R. RStudio Inc. Boston MA URL

  • Vaz L. Van Rooyen M. & Sampaio J. (2010). Rugby game-related statistics that discriminate between winning and losing teams in IRB and Super twelve close games. Journal of Sports Science & Medicine 9(1) p 51.

  • Woods C. T. (2016). The use of team performance indicator characteristics to explain ladder position at the conclusion of the Australian Football League home and away season. International Journal of Performance Analysis in Sport 16(3) pp. 837-847.

  • Woods C. T. Robertson S. & Collier N. F. (2017). Evolution of game-play in the Australia n Football League from 2001 to 2015. Journal of Sports Sciences 35(19) pp. 1879-1887.

  • Yang P. Z. Bing Yang Jean Zomaya Albert (2013). Stability of feature selection algorithms and ensemble feature selection methods in bioinformatics. 23 p 333.

Journal information
Impact Factor

CiteScore 2018: 0.71

SCImago Journal Rank (SJR) 2018: 0.355
Source Normalized Impact per Paper (SNIP) 2018: 0.462

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 446 446 435
PDF Downloads 90 90 88