Feature Selection Using Particle Swarm Optimization in Text Categorization

Open access


Feature selection is the main step in classification systems, a procedure that selects a subset from original features. Feature selection is one of major challenges in text categorization. The high dimensionality of feature space increases the complexity of text categorization process, because it plays a key role in this process. This paper presents a novel feature selection method based on particle swarm optimization to improve the performance of text categorization. Particle swarm optimization inspired by social behavior of fish schooling or bird flocking. The complexity of the proposed method is very low due to application of a simple classifier. The performance of the proposed method is compared with performance of other methods on the Reuters-21578 data set. Experimental results display the superiority of the proposed method.

[1] Jensen, R. (2005). Combining rough and fuzzy sets for feature selection. Ph.D. dissertation, School of Information, Edinburgh University.

[2] W. Shang, H. Huang, H. Zhu, Y. Lin, Y. Qu, and Z. Wang, A novel feature selection algorithm for text categorization, Expert Systems with Applications, vol. 33(1), pp. 1-5, 2007.

[3] Y. Yang, and J.O. Pedersen, A Comparative Study on Feature Selection in Text Categorization, Proceedings of the 14th International Conference on Machine Learning, pp. 412-420, 1997.

[4] H. Kim, P. Howland, and H. Park, Dimension Reduction in Text Classification with Support Vector Machines, Journal of Machine Learning Research, 6, 37-53, 2005.

[5] G. Forman, Feature Selection for Text Classification, In: Computational Methods of Feature Selection, Chapman and Hall/CRC Press, 2007.

[6] M. Raymer, W. Punch, E. Goodman, L. Kuhn, and A.K. Jain, Dimensionality Reduction Using Genetic Algorithms, IEEE Transactions on Evolutionary Computing, 4, pp. 164-171, 2000.

[7] M.H. Aghdam, N. Ghasem-Aghaee, and M.E. Basiri, Application of ant colony optimization for feature selection in text categorization, Proceedings of the IEEE Congress on Evolutionary Computation, pp. 2872-2878, 1-6, 2008.

[8] M. Srinivas, L.M. and L.M. Patnik, Genetic Algorithms: A Survey, IEEE Computer Society Press, Los lamitos, 1994.

[9] W. Siedlecki, and J. Sklansky, A note on genetic algorithms for large-scale feature selection, Pattern Recognition Letters, vol. 10(5), pp. 335-347, 1989.

[10] M.F. Caropreso, and S. Matwin, Beyond the Bag of Words: A Text Representation for Sentence Selection, StateplaceBerlin: Springer-Verlag, pp. 324-335, 2006.

[11] R. Kohavi, and G.H. John, Wrappers for feature subset selection, Journal of Artificial Intelligence, vol. 97(1-2), pp. 273-324, 1997.

[12] M. Dash, and H. Liu, Feature selection for classification, Intelligent Data Analysis: An International Journal, vol. 1(3), pp. 131-156, 1997.

[13] H. Liu, and L. Yu, Toward Integrating Feature Selection Algorithms for Classification and Clustering, IEEE Transactions on Knowledge and Data Engineering, vol. 17(4), pp. 491-502, 2005.

[14] D. Mladeni, Feature Selection for Dimensionality Reduction. Subspace, Latent Structure and Feature Selection, Statistical and Optimization, Perspectives Workshop, SLSFS 2005, City place Bohinj, country-regionSlovenia, Lecture Notes in Computer Science 3940 Springer, pp. 84-102, 2006.

[15] J. Kennedy, R.C. Eberhart, Particle swarm optimization, Proceedings of IEEE International Conference on Neural Networks, pp. 1942-1948, 1995.

[16] A.P. Engelbrecht, Fundamentals of Computational Swarm Intelligence, John Wiley & Sons, London, 2005.

[17] The reuters-21578 text categorization test collection. Available: http://kdd.ics.uci.edu/databases/reuters21578/reuters21578.html

[18] G. Salton, and C. Buckley, Term-weighting approaches in automatic text retrieval, PlaceNameCornell PlaceTypeUniversity CityplaceIthaca, StateNY, country-regionUSA, Technical Report TR87-881, 1987.

[19] C.J. Rijsbergen, Information Retrieval, 2nd ed. Butterworths, London, UK, 1979.

[20] M.H. Aghdam, J. Tanha, A.R. Naghsh-Nilchi, and M.E. Basiri, Combination of Ant Colony Optimization and Bayesian Classification for Feature Selection in a Bioinformatics Dataset, Journal of Computer Science & Systems Biology vol. 2, pp. 186-199, 2009.

Journal of Artificial Intelligence and Soft Computing Research

The Journal of Polish Neural Network Society, the University of Social Sciences in Lodz & Czestochowa University of Technology

Journal Information

CiteScore 2017: 5.00

SCImago Journal Rank (SJR) 2017: 0.492
Source Normalized Impact per Paper (SNIP) 2017: 2.813

Cited By


All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 330 330 58
PDF Downloads 191 191 31