Open Access

Predicting Win-Loss outcomes in MLB regular season games – A comparative study using data mining methods

   | Dec 17, 2016

Cite

Ahmad, A., & Dey, L. (2005). A feature selection technique for classificatory analysis. Pattern Recognition Letters, 26(1), 43-56. doi: 10.1016/j.patrec.2004.08.015Search in Google Scholar

Alcalá-Fdez, J., Sánchez, L., García, S., Jesus, M. J., Ventura, S., Garrell, J. M., . . . Herrera, F. (2008). KEEL: a software tool to assess evolutionary algorithms for data mining problems. Soft Computing, 13(3), 307-318. doi: 10.1007/s00500-008-0323-ySearch in Google Scholar

Aslan, B. G., & Inceoglu, M. M. (2007). A comparative study on neural network based soccer result prediction. Paper presented at the Seventh International Conference on Intelligent Systems Design and Applications.10.1109/ISDA.2007.12Search in Google Scholar

Baumer, B., & Zimbalist, A. (2014). Quantifying Market Inefficiencies in the Baseball Players’ Market. Eastern Economic Journal, 40(4), 488-498. doi: 10.1057/eej.2013.43Search in Google Scholar

Burges, C. J. C. (1998). A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery, 2(2), 121-167. doi: 10.1023/a:1009715923555Search in Google Scholar

Chang, J., & Zenilman, J. (2013). A study of sabermetrics in Major League Baseball: The impact of Moneyball on free agent salaries.Search in Google Scholar

Dash, M., & Liu, H. (2003). Consistency-based search in feature selection. Artificial Intelligence, 151(1-2), 155-176. doi: 10.1016/S0004-3702(03)00079-1Search in Google Scholar

Delen, D., Cogdell, D., & Kasap, N. (2012). A comparative analysis of data mining methods in predicting NCAA bowl outcomes. International Journal of Forecasting, 28(2), 543-552. doi: 10.1016/j.ijforecast.2011.05.002Search in Google Scholar

Demens, S. (2015). Riding a probabilistic support vector machine to the Stanley Cup. Journal of Quantitative Analysis in Sports, 11(4), 205-218. doi: 10.1515/jqas-2014-0093Search in Google Scholar

Edelmann-Nusser, J., Hohmann, A., & Henneberg, B. (2002). Modeling and prediction of competitive performance in swimming upon neural networks. European Journal of Sport Science, 2(2), 1-10. doi: 10.1080/17461390200072201Search in Google Scholar

Fischer, A., Do, M., Stein, T., Asfour, T., Dillmann, R., & Schwameder, H. (2011). Recognition of Individual Kinematic Patterns during Walking and Running-A Comparison of Artificial Neural Networks and Support Vector Machines. International Journal of Computer Science in Sport, 10(1).Search in Google Scholar

Gartheeban, G., & Guttag, J. (2013). A data-driven method for in-game decision making in MLB: when to pull a starting pitcher. Paper presented at the Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining.10.1145/2487575.2487660Search in Google Scholar

Gutierrez-Osuna, R. (2002). The k nearest neighbor rule (k-nnr). k-NN Lecture Notes.Search in Google Scholar

Haghighat, M., Rastegari, H., & Nourafza, N. (2013). A review of data mining techniques for result prediction in sports. Advances in Computer Science: an International Journal, 2(5), 7-12.Search in Google Scholar

Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software: an update. SIGKDD Explor. Newsl., 11(1), 10-18. doi: 10.1145/1656274.1656278Search in Google Scholar

Hall, M. A., & Holmes, G. (2003). Benchmarking attribute selection techniques for discrete class data mining. Knowledge and Data Engineering, IEEE Transactions on, 15(6), 1437-1447. doi: 10.1109/TKDE.2003.1245283Search in Google Scholar

Han, J., & Kamber, M. (2006). Data Mining Concepts and Techniques (2nd ed.): Morgan Kaufmann Publishers.Search in Google Scholar

Haykin, S. (2008). Neural networks and learning machines (3rd ed.). New Jersey: Prentice Hall.Search in Google Scholar

Healey, G. (2015). Modeling the Probability of a Strikeout for a Batter/Pitcher Matchup. Knowledge and Data Engineering, IEEE Transactions on, 27(9), 2415-2423. doi: 10.1109/TKDE.2015.2416735Search in Google Scholar

Hornik, K., Stinchcombe, M., & White, H. (1990). Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks. Neural Networks, 3(5), 551-560. doi: 10.1016/0893-6080(90)90005-6Search in Google Scholar

Keerthi, S. S., Shevade, S. K., Bhattacharyya, C., & Murthy, K. R. K. (2001). Improvements to Platt's SMO Algorithm for SVM Classifier Design. Neural Computation, 13(3), 637-649. doi: 10.1162/089976601300014493Search in Google Scholar

Liao, S.-H., Chu, P.-H., & Hsiao, P.-Y. (2012). Data mining techniques and applications - A decade review from 2000 to 2011. Expert Systems with Applications, 39(12), 11303-11311. doi: 10.1016/j.eswa.2012.02.063Search in Google Scholar

Loh, W.-Y. (2014). Fifty Years of Classification and Regression Trees. International Statistical Review, 82(3), 329-348. doi: 10.1111/insr.12016Search in Google Scholar

Loughin, T. M., & Bargen, J. L. (2008). Assessing pitcher and catcher influences on base stealing in Major League Baseball. Journal of sports sciences, 26(1), 15-20. doi: 10.1080/02640410701287255Search in Google Scholar

Menéndez, H. D., Vázquez, M., & Camacho, D. (2015). Mixed Clustering Methods to Forecast Baseball Trends. In D. Camacho, L. Braubach, S. Venticinque & C. Badica (Eds.), Intelligent Distributed Computing VIII (pp. 175-184). Cham: Springer International Publishing.Search in Google Scholar

Morgan, S., Williams, M. D., & Barnes, C. (2013). Applying decision tree induction for identification of important attributes in one-versus-one player interactions: A hockey exemplar. Journal of sports sciences, 31(10), 1031-1037. doi: 10.1080/02640414.2013.770906Search in Google Scholar

Ockerman, S., & Nabity, M. (2014). Predicting the Cy Young Award Winner. PURE Insights, 3(1), 9.Search in Google Scholar

Percy, D. F. (2015). Strategy selection and outcome prediction in sport using dynamic learning for stochastic processes. Journal of the Operational Research Society, 66(11), 1840-1849. doi: 10.1057/jors.2014.137Search in Google Scholar

Robertson, S., Back, N., & Bartlett, J. D. (2015). Explaining match outcome in elite Australian Rules football using team performance indicators. Journal of sports sciences, 1-8. doi: 10.1080/02640414.2015.1066026Search in Google Scholar

Robinson, S. J. (2014). Extracting Individual Offensive Production from Baseball Run Distributions. International Journal of Computer Science in Sport, 13(2).Search in Google Scholar

Robnik-Šikonja, M., & Kononenko, I. (1997). An adaptation of Relief for attribute estimation in regression. Paper presented at the Machine Learning: Proceedings of the Fourteenth International Conference (ICML’97).Search in Google Scholar

Rosenfeld, J. W., Fisher, J. I., Adler, D., & Morris, C. (2010). Predicting overtime with the Pythagorean formula. Journal of Quantitative Analysis in Sports, 6(2). doi: 10.2202/1559-0410.1244Search in Google Scholar

Sauer, R. D., Waller, J. K., & Hakes, J. K. (2010). The progress of the betting in a baseball game. Public Choice, 142(3-4), 297-313. doi: 10.1007/s11127-009-9544-6Search in Google Scholar

Schumaker, R. P., Solieman, O. K., & Chen, H. (2010a). Greyhound racing using support vector machines. Sports Data Mining (pp. 117-125): Springer US.10.1007/978-1-4419-6730-5_11Search in Google Scholar

Schumaker, R. P., Solieman, O. K., & Chen, H. (2010b). Sports Data Mining: Springer US.10.1007/978-1-4419-6730-5Search in Google Scholar

Shearer, C. (2000). The CRISP-DM model: the new blueprint for data mining. Journal of Data Warehousing, 5, 13-22.Search in Google Scholar

Smith, E. E., & Groetzinger, J. D. (2010). Do fans matter? The effect of attendance on the outcomes of Major League Baseball games. Journal of Quantitative Analysis in Sports, 6(1). doi: 10.2202/1559-0410.1192Search in Google Scholar

Soto Valero, C., & González Castellanos, M. (2015). Sabermetría y nuevas tendencias en el análisis estadístico del juego de béisbol [Sabermetrics and new trends in statistical analysis of baseball]. Retos, 28(2), 122-127.10.47197/retos.v0i28.34826Search in Google Scholar

Stekler, H. O., Sendor, D., & Verlander, R. (2010). Issues in sports forecasting. International Journal of Forecasting, 26(3), 606-621. doi: 10.1016/j.ijforecast.2010.01.003Search in Google Scholar

Sykora, M., Chung, P. W. H., Folland, J. P., Halkon, B. J., & Edirisinghe, E. A. (2015). Advances in Sports Informatics Research Computational Intelligence in Information Systems (pp. 265-274): Springer.Search in Google Scholar

Tin Kam, H., & Basu, M. (2002). Complexity measures of supervised classification problems. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 24(3), 289-300. doi: 10.1109/34.990132Search in Google Scholar

Trawiński, K. (2010). A fuzzy classification system for prediction of the results of the basketball games. Paper presented at the Fuzzy Systems (FUZZ), 2010 IEEE International Conference.10.1109/FUZZY.2010.5584399Search in Google Scholar

Witnauer, W. D., Rogers, R. G., & Saint Onge, J. M. (2007). Major league baseball career length in the 20th century. Population research and policy review, 26(4), 371-386. doi: 10.1007/s11113-007-9038-5Search in Google Scholar

Witten, I. H., Frank, E., & Hall, M. A. (2011). Data Mining Practical Machine Learning Tools and Techniques (3rd ed.): Morgan Kaufmann Publishers.Search in Google Scholar

Wolf, G. H. (2015). The Sabermetric Revolution: Assessing the Growth of Analytics in Baseball by Benjamin Baumer and Andrew Zimbalist (review). Journal of Sport History, 42(2), 239-241.10.5406/jsporthistory.42.2.0239Search in Google Scholar

Wolpert, D. H., & Macready, W. G. (1997). No Free Lunch Theorems for Optimization. IEEE Transactions on Evolutionary Computation, 1(1), 67-82. doi: 10.1109/4235.585893Search in Google Scholar

Yang, T. Y., & Swartz, T. (2004). A Two-Stage Bayesian Model for Predicting Winners in Major League Baseball. Journal of Data Science, 2, 61-73.10.6339/JDS.2004.02(1).142Search in Google Scholar

Young, W. A., Holland, W. S., & Weckman, G. R. (2008). Determining hall of fame status for major league baseball using an artificial neural network. Journal of Quantitative Analysis in Sports, 4(4). doi: 10.2202/1559-0410.1131Search in Google Scholar

Yuan, L.-H., Liu, A., Yeh, A., Kaufman, A., Reece, A., Bull, P., . . . Bornn, L. (2015). A mixture-of-modelers approach to forecasting NCAA tournament outcomes. Journal of Quantitative Analysis in Sports, 11(1), 13-27. doi: 10.1515/jqas-2014-0056Search in Google Scholar

Zeng, X., & Martinez, T. R. (2000). Distribution-balanced stratified cross-validation for accuracy estimation. Journal of Experimental & Theoretical Artificial Intelligence, 12(1), 1-12. doi: 10.1080/095281300146272Search in Google Scholar

eISSN:
1684-4769
Language:
English
Publication timeframe:
2 times per year
Journal Subjects:
Computer Sciences, Databases and Data Mining, other, Sports and Recreation, Physical Education