Wen-Sheng Chiu, Kuo-Wei Lin, Yen-Chieh Wen


The development of athletes or players depends on two aspects: nature and nurture. The former is the talent and qualification of the players themselves, while the latter is the training that consumes human, material and financial resources. Take professional baseball players as an example. Matching the talents of players and referring to the relevant starting rules of the professional baseball league, when the up-and-coming players are first discovered, focused training are used on them. By doing so, the value of the players would be effectively enhanced and the players are helped to seek a better way out. This can form a virtuous circle: the pellets get quality players, and the players get better results. That is to say, strengthening the training for the shortcomings of the players with the potential of the starting players can avoid unnecessary training and huge training expenses behind them, and greatly reduce the risk of career, so that the players have higher security in their short career, and get a win-win-win situation. This study is aimed at the schedule information of the American Baseball League teams. Through feature selection of data mining, this study analyzes the main relationships and key differences between starting player and bench player of second baseman and shortstop in League of Nations teams. It is found that the on base percentage and speed of the infielders is an important ability indicator for the starting position; whereas, the second baseman emphasizes on the attack and the shortstop focuses on fielding. This feature is verified by comparing the opinions of experts and commentators.


Article visualizations:

Hit counter



data mining, feature selection, sports forecast, major league

Full Text:



Barry D., Hartigan J.A., 1993. Choice Models for Predicting Divisional Winners in Major League Baseball. Journal of the American Statistical Association 88(423): 766-774. doi:10.2307 / 2290761

Chen C.C., 2012. Establishing Quality Start Model of Chinese Professional Baseball League Using Logistic Regression. Journal of Physical Education Fu Jen Catholic University 11: 18-34. doi:10.29697/ JPE.201205.0002

Chen C.C., Cheng C.C., Chen T.T., 2005. Application of Standards Management Method in Professional Baseball Analysis—The 14th Annual Season of Chinese Professional Baseball League. Journal of Physical Education Fu Jen Catholic University 4: 206-218. doi:10.29697/ JPE.200505.0015

Chen C.C., Chen T.T., 2009. Starting Pitchers’ Pitch Skills Positioning of CPBL in 2008- Multidimentional Scaling Analysis. Journal of Physical Education Fu Jen Catholic University 8: 109-125. doi:10.29697/ JPE.200905.0008

Chen C.C., Chen T.T., Chen Y.C., Yu F.H., 2001. A Study on the Differences between the Winning Pitchers and the Beating Pitchers in the Chinese Major League Baseball. Journal of Tamkang Sports 12: 240-247. doi:10.6976/ TJP.200912.0240

Cortes C., Vapnik V., 1995. Support vector networks. Machine Learning 20: 273-297.

Donaker G., 2005. Applying Machine Learning to MLB Prediction & Analysis. CS229 – Stanford University.

Feng J.H., 2010. A study on the winning factors in professional baseball games - an application of data mining technology. The 6th knowledge community seminar, Taipei City, Taiwan.

Hung Y.H., Chang P.J., 2009. Applying IG feature selection to improve SVM multi- category classification performance. The 17th symposium on fuzzy theory and its applications, Kaohsiung City, Taiwan.

James B., Albert J., Stern H.S., 1993. Answering Questions about Baseball Using Statistics. Chance 6(2): 17-30. doi:10.1080 / 09332480.1993.10542357

Kaigh W.D., 1995. Forecasting Baseball Games. Chance 8(2): 33-37. doi:10.1080 / 09332480.1995.10542458

Kohavi R., 1995. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. IJCAI’95 Proceedings of the 14th International Joint Conference on Artificial Intelligence 2: 1137-1143.

Kohavi R., John G.H., 1997. Wrappers for Feature Subset Selection. Artificial Intelligence 97(12): 273-324. doi: 10.1016/S0004-3702(97)00043-X

Lai Y.T., Chang C.L., 2008. Using bayes approach to forecast the winner of professional games—the case of professional baseball in Taiwan. PhD Thesis, Aletheia University, New Taipei City, Taiwan.

Li C.H., Ku C.J., 2010. Study on Application of Neural Network and Data Mining Techniques for Medical Diagnosis. Engineering Science and Education Journal 7: 154-169. doi:10.6451/JETE.201003.0154

Li C.H., Wu K.C., Hung C.H., 2006. Particle population optimization for feature selection and support vector machine optimization. The 11th artificial intelligence and application seminar, Kaohsiung City, Taiwan.

Li W.P., Yao C.C., 2006. A Research of Data Mining Applied to the Predictive Model of Fatty Liver. Master’s Thesis, Chung Yuan Christian University, Taoyuan City, Taiwan. Retrieved from 2720983.htm#ixzz1PYKp4h2L

Lin J.H., Chen Y.C., 2008. Constructing an integrated credit rating model by using data exploration technology. The 2008 innovation management and new vision seminar, Kaohsiung City, Taiwan.

Miller K., 2011. Predicting Wins for Baseball Games. St. Lawrence University, Department of Mathematics, Computer Science and Statistics.

Rubin E, 1958. An Analysis of Baseball Scores by Innings. The American Statistician 12(2): 2l-22. doi:10.1080/00031305.1958.10481766

Stekler H.O., Sendor D., Verlander R., 2010. Issues in Sports Forecasting. International Journal of Forecasting 26(3): 606–621. doi:10.1016/j.ijforecast.2010.01.003

Witten I.H., Frank E., Trigg L., Hall M., Holmes G., Cunningham S.J., 1999. Weka: Practical Machine Learning Tools and Techniques with Java Implementations. Proceedings of the ICONIP/ANZIIS/ANNES'99 Workshop on Emerging Knowledge Engineering and Connectionist-Based Information Systems 192–196.

Yang C.H., Tu C.J., Wu K.C., Chang C.Y., Liu H.H., 2006. Tabu-PSO for feature selection. The 5th outlying islands information technology and application seminar, Kinmen, Taiwan.

Yang T.Y., Swartz T., 2004. A Two-Stage Bayesian Model for Predicting Winners in Major League Baseball. Journal of Data Science 2(1): 61-73. doi:10.6339/JDS.2004.02(1).142



  • There are currently no refbacks.

Copyright (c) 2019 Wen-Sheng Chiu, Kuo-Wei Lin, Yen-Chieh Wen

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright © 2015 - 2023. European Journal of Physical Education and Sport Science (ISSN 2501 - 1235) is a registered trademark of Open Access Publishing Group. All rights reserved.

This journal is a serial publication uniquely identified by an International Standard Serial Number (ISSN) serial number certificate issued by Romanian National Library (Biblioteca Nationala a Romaniei). All the research works are uniquely identified by a CrossRef DOI digital object identifier supplied by indexing and repository platforms. All authors who send their manuscripts to this journal and whose articles are published on this journal retain full copyright of their articles. All the research works published on this journal are meeting the Open Access Publishing requirements and can be freely accessed, shared, modified, distributed and used in educational, commercial and non-commercial purposes under a Creative Commons Attribution 4.0 International License (CC BY 4.0).