A comparison of subset selection and adaptive basis function construction for polynomial regression model building
The approach of subset selection in polynomial regression model building assumes that the chosen fixed full set of predefined basis functions contains a subset that is sufficient to describe the target relation sufficiently well. However, in most cases the necessary set of basis functions is not known and needs to be guessed - a potentially non-trivial (and long) trial and error process. In our previous research we considered an approach for polynomial regression model building which is different from the subset selection - letting the regression model building method itself construct the basis functions necessary for creating a model of arbitrary complexity without restricting oneself to the basis functions of a predefined full model. The approach is titled Adaptive Basis Function Construction (ABFC). In the present paper we compare the two approaches for polynomial regression model building - subset selection and ABFC - both theoretically and empirically in terms of their underlying principles, computational complexity, and predictive performance. Additionally in empirical evaluations the ABFC is compared also to two other well-known regression modelling methods - Locally Weighted Polynomials and Multivariate Adaptive Regression Splines.
If the inline PDF is not rendering correctly, you can download the PDF file here.
Hastie T. Tibshirani R. Friedman J. The elements of statistical learning: data mining inference and prediction. - Berlin: Springer 2003.
Webb A. R. Statistical Pattern Recognition 2nd ed. - John Wiley & Sons 2002. - P.496.
Dash M. Liu H. Feature Selection for Classification // Intelligent Data Analysis. An International Journal. - Elsevier Vol. 1 1997. - P.131-156.
Jekabsons G. Lavendels J. An approach for polynomial regression modelling using construction of basis functions // Scientific Proceedings of Riga Technical University 5 Computer Science 34. - Riga: RTU 2008.
Jekabsons G. Lavendels J. Polynomial regression modelling using adaptive construction of basis functions // IADIS International Conference Applied Computing 2008. - Algarve Portugal 2008. - P.269-276.
Jekabsons G. Ensembling adaptively constructed polynomial regression models // International Journal of Intelligent Systems and Technologies (IJIST) Vol. 3 No 2. - WASET 2008. - P.56-61. (http://www.waset.org/ijist/v3/v3-2-11.pdf) http://www.waset.org/ijist/v3/v3-2-11.pdf
Russell S. J. Norvig P. Artificial intelligence: a modern approach 2nd ed. - Englewood Cliffs New Jersey: Prentice Hall 2002.
Molina L. C. Belanche L. Nebot A. Feature selection algorithms: a survey and experimental evaluation // Proceedings of the International Conference on Data Mining (ICDM'02). - Maebashi City: IEEE Computer Society 2002. - P.306-313.
Ginsberg M. L. Essentials of artificial intelligence. - Morgan Kaufmann 1993.
Hurvich C. M. Tsai C.-L. Regression and time series model selection in small samples // Biometrika Vol. 76. - 1989. - P.297-307.
Kohavi R. John G. H. Wrappers for feature subset selection // Artificial Intelligence vol. 97. - 1997. - P.273-324.
Pudil P. Ferri F. J. Novovicova J. Kittler J. Floating search methods for feature selection with nonmonotonic criterion functions // Proceedings of the International Conference on Pattern Recognition Vol. 2. - Los Alamitos CA: IEEE 1994. - P.279-283.
Elder IV J. F. The generalization paradox of ensembles // Journal of Computational and Graphical Statistics Vol. 12. - 2003. - P.853-864.
Reunanen J. Overfitting in making comparisons between variable selection methods // Journal of Machine Learning Research Vol. 3. - 2003. - P.371-382.
Loughrey J. Cunningham P. Overfitting in wrapper-based feature subset selection: the harder you try the worse it gets // 24rth SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence (AI-2004). - 2004. - P.33-43.
Harrell Jr. F. E. Regression modelling strategies with applications to linear models logistic regression and survival analysis. - New York: Springer 2001.
Breiman L. Heuristics of instability and stabilization in model selection // Annals of Statistics Vol. 24. - 1996. - P.2350-2383.
Kotsiantis S. Pintelas P. Combining Bagging and Boosting // International Journal of Computational Intelligence Vol. 1. - 2004. - P.324-333.
Opitz D. Maclin R. Popular Ensemble Methods: An Empirical Study // Journal of Artificial Intelligence Research Vol. 11. - 1999. - P.169-198.
Kalnins K. Ozolins O. Jekabsons G. Metamodels in design of GFRP composite stiffened deck structure // Proceedings of 7th ASMO-UK/ISSMO International Conference on Engineering Design Optimization Association for Structural and Multidisciplinary Optimization in the UK (ASMO-UK). - Bath UK 2008. - P.11. (in print)
Friedman J. H. Fast MARS Department of Statistics Stanford University Tech. Report LCS110 1993.