Feature Selection Using Particle Swarm Optimization in Text Categorization

Open access

Abstract

Feature selection is the main step in classification systems, a procedure that selects a subset from original features. Feature selection is one of major challenges in text categorization. The high dimensionality of feature space increases the complexity of text categorization process, because it plays a key role in this process. This paper presents a novel feature selection method based on particle swarm optimization to improve the performance of text categorization. Particle swarm optimization inspired by social behavior of fish schooling or bird flocking. The complexity of the proposed method is very low due to application of a simple classifier. The performance of the proposed method is compared with performance of other methods on the Reuters-21578 data set. Experimental results display the superiority of the proposed method.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • [1] Jensen R. (2005). Combining rough and fuzzy sets for feature selection. Ph.D. dissertation School of Information Edinburgh University.

  • [2] W. Shang H. Huang H. Zhu Y. Lin Y. Qu and Z. Wang A novel feature selection algorithm for text categorization Expert Systems with Applications vol. 33(1) pp. 1-5 2007.

  • [3] Y. Yang and J.O. Pedersen A Comparative Study on Feature Selection in Text Categorization Proceedings of the 14th International Conference on Machine Learning pp. 412-420 1997.

  • [4] H. Kim P. Howland and H. Park Dimension Reduction in Text Classification with Support Vector Machines Journal of Machine Learning Research 6 37-53 2005.

  • [5] G. Forman Feature Selection for Text Classification In: Computational Methods of Feature Selection Chapman and Hall/CRC Press 2007.

  • [6] M. Raymer W. Punch E. Goodman L. Kuhn and A.K. Jain Dimensionality Reduction Using Genetic Algorithms IEEE Transactions on Evolutionary Computing 4 pp. 164-171 2000.

  • [7] M.H. Aghdam N. Ghasem-Aghaee and M.E. Basiri Application of ant colony optimization for feature selection in text categorization Proceedings of the IEEE Congress on Evolutionary Computation pp. 2872-2878 1-6 2008.

  • [8] M. Srinivas L.M. and L.M. Patnik Genetic Algorithms: A Survey IEEE Computer Society Press Los lamitos 1994.

  • [9] W. Siedlecki and J. Sklansky A note on genetic algorithms for large-scale feature selection Pattern Recognition Letters vol. 10(5) pp. 335-347 1989.

  • [10] M.F. Caropreso and S. Matwin Beyond the Bag of Words: A Text Representation for Sentence Selection StateplaceBerlin: Springer-Verlag pp. 324-335 2006.

  • [11] R. Kohavi and G.H. John Wrappers for feature subset selection Journal of Artificial Intelligence vol. 97(1-2) pp. 273-324 1997.

  • [12] M. Dash and H. Liu Feature selection for classification Intelligent Data Analysis: An International Journal vol. 1(3) pp. 131-156 1997.

  • [13] H. Liu and L. Yu Toward Integrating Feature Selection Algorithms for Classification and Clustering IEEE Transactions on Knowledge and Data Engineering vol. 17(4) pp. 491-502 2005.

  • [14] D. Mladeni Feature Selection for Dimensionality Reduction. Subspace Latent Structure and Feature Selection Statistical and Optimization Perspectives Workshop SLSFS 2005 City place Bohinj country-regionSlovenia Lecture Notes in Computer Science 3940 Springer pp. 84-102 2006.

  • [15] J. Kennedy R.C. Eberhart Particle swarm optimization Proceedings of IEEE International Conference on Neural Networks pp. 1942-1948 1995.

  • [16] A.P. Engelbrecht Fundamentals of Computational Swarm Intelligence John Wiley & Sons London 2005.

  • [17] The reuters-21578 text categorization test collection. Available: http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html

  • [18] G. Salton and C. Buckley Term-weighting approaches in automatic text retrieval PlaceNameCornell PlaceTypeUniversity CityplaceIthaca StateNY country-regionUSA Technical Report TR87-881 1987.

  • [19] C.J. Rijsbergen Information Retrieval 2nd ed. Butterworths London UK 1979.

  • [20] M.H. Aghdam J. Tanha A.R. Naghsh-Nilchi and M.E. Basiri Combination of Ant Colony Optimization and Bayesian Classification for Feature Selection in a Bioinformatics Dataset Journal of Computer Science & Systems Biology vol. 2 pp. 186-199 2009.

Search
Journal information
Impact Factor


CiteScore 2018: 4.70

SCImago Journal Rank (SJR) 2018: 0.351
Source Normalized Impact per Paper (SNIP) 2018: 4.066

Cited By
Metrics
All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 1111 562 66
PDF Downloads 811 468 62