Clustering Large-Scale Data Based On Modified Affinity Propagation Algorithm

Open access


Traditional clustering algorithms are no longer suitable for use in data mining applications that make use of large-scale data. There have been many large-scale data clustering algorithms proposed in recent years, but most of them do not achieve clustering with high quality. Despite that Affinity Propagation (AP) is effective and accurate in normal data clustering, but it is not effective for large-scale data. This paper proposes two methods for large-scale data clustering that depend on a modified version of AP algorithm. The proposed methods are set to ensure both low time complexity and good accuracy of the clustering method. Firstly, a data set is divided into several subsets using one of two methods random fragmentation or K-means. Secondly, subsets are clustered into K clusters using K-Affinity Propagation (KAP) algorithm to select local cluster exemplars in each subset. Thirdly, the inverse weighted clustering algorithm is performed on all local cluster exemplars to select well-suited global exemplars of the whole data set. Finally, all the data points are clustered by the similarity between all global exemplars and each data point. Results show that the proposed clustering method can significantly reduce the clustering time and produce better clustering result in a way that is more effective and accurate than AP, KAP, and HAP algorithms.

[1] L. Kaufan, P. J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis, John Wiley & Sons, New York, 1990.

[2] R. Lior, and O. Maimon, Clustering Methods, Data mining and knowledge discovery handbook. Springer US, 2005, pp. 321-352

[3] S. Patel, S. Sihmar and A. Jatain, A Study of Hierarchical Clustering Algorithms, Computing for Sustainable Global Development (INDIACom), 2015 2nd International Conference on, 2005 pp. 537-541

[4] J.W. Han and M. Kambr, Data Mining Concepts and Techniques, Higher Education Press, Beijing, 2001.

[5] Y. Kang and Y. B. PARK, The Performance Evaluation of K-means by Two MapReduce Frameworks, Hadoop vs. Twister, Information Networking (ICOIN), 2015 International Conference on, 2015, pp. 405-406

[6] A. Y. Ng, M. I. Jordan, and Y. Weiss, On spectral clustering: Analysis and an algorithm, in Advances in Neural Information Processing Systems, 2001, pp. 849-856

[7] H. D. Menendez, D. F. Barrero and D. Camacho, A Co-Evolutionary Multi-Objective Approach for a K-Adaptive Graph-based Clustering Algorithm, IEEE Congress on Evolutionary Computation (CEC), 2014, pp. 2724-2731

[8] Han, J., and Kamber, M. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, 2001, pp. 450-479

[9] C. Tsai; Y. Hu, Enhancement of efficiency by thrifty search of interlocking neighbor grids approach for grid-based data clustering, Machine Learning and Cybernetics (ICMLC), 2013 International Conference on, 2013, pp. 1279-1284

[10] M. Ester, H. P. Kriegel, J. S, X. W. Xu, A density based algorithm for discovering clusters in large spatial databases with noise, in Proc. 2nd International Conference on, 1993, pp. 2-11

[11] S.T.Mai, He. Xiao, N. Hubig, C. Plant and C. Bohm, Active Density-Based Clustering, Data Mining (ICDM), 2013 IEEE 13th International Conference on, 2013, pp. 508–517

[12] Zahn, C. T., Graph-theoretical methods for detecting and describing gestalt clusters. IEEE trans. Comput. C-20 (Apr.), 1971, pp. 68-86

[13] F. Chamroukhi, Robust EM algorithm for model-based curve clustering, Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN), 2013, pp. 1-8

[14] B. J. Frey, D. Dueck, Clustering by Passing Messages Between Data Points, in Science, vol. 315, 2007, pp. 972-976

[15] Wang Kai-jun, Zhang Jun-ying, Li Dan, et al, Adaptive Affinity Propagation Clustering, J. Acta Automatica Sinica, vol. 33(12), 2007, pp. 1242-1246

[16] Wang Kai-jun, Li Jian, Zhang Jun-ying, et al, Semi-supervised Affinity Propagation Clustering, J. Computer Engineering, vol. 33(23), 2007, pp. 197-201

[17] Yancheng He, Qingcai Chen, Xiaolong, et al, An Adaptive Affinity Propagation Docu-ment Clustering, Proceedings of the 7th International Conference on Informatics and Sys-tems, 2010, pp. 1-7

[18] Yangqing Jiay, Jingdong Wangz, Changshui Zhangy, Xian-Sheng Hua, Finding Image Exemplars Using Fast Sparse Affinity Propagation, Proceedings of the 16th ACM International conference on Multimedia, 2006, pp. 113-118

[19] Yasuhiro Fujiwara, Go Irie and Tomoe Kitahara, Fast Algorithm for Affinity Propagation, International Joint Conference on Artificial Intelligence (IJCAI), 2011, pp. 2238-2243

[20] Xiangliang Zhang, Wei Wang, Kjetil Nrvag and Michele Sebag, K-AP: Generating Specified K Clusters by Efficient Affinity Propagation, Data Mining (ICDM), 2010 IEEE 10th International Conference on, 2010, pp. 1187-1192

[21] Xiaonan Liu, Meijuan Yin, Junyong Luo and Wuping Chen, An Improved Affinity Propagation Clustering Algorithm for Large-scale Data Sets, 2013 Ninth International Conference on Natural Computation (ICNC), IEEE, 2013, pp. 894 - 899

[22] W. Barbakh and C. Fyfe. Inverse weighted clustering algorithm, Computing and InformationSystems, 11(2)10-18, May 2007. ISSN 1352-9404.

[23] C.-D. Wang, J.-H. Lai, C. Suen, and J.-Y. Zhu, Multi-exemplar affinity propagation, Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 35, 2013 pp. 2223–2237

[24] K.J. Wang, J.Y. Zhang, D. Li, X.N. Zhang, and T. Guo, Adaptive Affinity Propagation Clustering, Acta Automatica Sinica, vol. 33, no. 12, 2007, pp. 1242-1246

[25] C. L. Blake, C. J. Merz, “UCI repository of machine learning databases,” 2012,

[26] L. N. Ana, Fred, K. J. Anil, Robust Data Clustering, Computer Vision and Pattern Recognition, 2003, Proceedings, 2003 IEEE Computer Society Conference on, 2003, pp. 128 – 133

Journal of Artificial Intelligence and Soft Computing Research

The Journal of Polish Neural Network Society, the University of Social Sciences in Lodz & Czestochowa University of Technology

Journal Information

CiteScore 2017: 5.00

SCImago Journal Rank (SJR) 2017: 0.492
Source Normalized Impact per Paper (SNIP) 2017: 2.813

Cited By


All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 98 98 25
PDF Downloads 24 24 6