Clustering Large-Scale Data Based On Modified Affinity Propagation Algorithm

Open access


Traditional clustering algorithms are no longer suitable for use in data mining applications that make use of large-scale data. There have been many large-scale data clustering algorithms proposed in recent years, but most of them do not achieve clustering with high quality. Despite that Affinity Propagation (AP) is effective and accurate in normal data clustering, but it is not effective for large-scale data. This paper proposes two methods for large-scale data clustering that depend on a modified version of AP algorithm. The proposed methods are set to ensure both low time complexity and good accuracy of the clustering method. Firstly, a data set is divided into several subsets using one of two methods random fragmentation or K-means. Secondly, subsets are clustered into K clusters using K-Affinity Propagation (KAP) algorithm to select local cluster exemplars in each subset. Thirdly, the inverse weighted clustering algorithm is performed on all local cluster exemplars to select well-suited global exemplars of the whole data set. Finally, all the data points are clustered by the similarity between all global exemplars and each data point. Results show that the proposed clustering method can significantly reduce the clustering time and produce better clustering result in a way that is more effective and accurate than AP, KAP, and HAP algorithms.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • [1] L. Kaufan P. J. Rousseeuw Finding Groups in Data: An Introduction to Cluster Analysis John Wiley & Sons New York 1990.

  • [2] R. Lior and O. Maimon Clustering Methods Data mining and knowledge discovery handbook. Springer US 2005 pp. 321-352

  • [3] S. Patel S. Sihmar and A. Jatain A Study of Hierarchical Clustering Algorithms Computing for Sustainable Global Development (INDIACom) 2015 2nd International Conference on 2005 pp. 537-541

  • [4] J.W. Han and M. Kambr Data Mining Concepts and Techniques Higher Education Press Beijing 2001.

  • [5] Y. Kang and Y. B. PARK The Performance Evaluation of K-means by Two MapReduce Frameworks Hadoop vs. Twister Information Networking (ICOIN) 2015 International Conference on 2015 pp. 405-406

  • [6] A. Y. Ng M. I. Jordan and Y. Weiss On spectral clustering: Analysis and an algorithm in Advances in Neural Information Processing Systems 2001 pp. 849-856

  • [7] H. D. Menendez D. F. Barrero and D. Camacho A Co-Evolutionary Multi-Objective Approach for a K-Adaptive Graph-based Clustering Algorithm IEEE Congress on Evolutionary Computation (CEC) 2014 pp. 2724-2731

  • [8] Han J. and Kamber M. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers 2001 pp. 450-479

  • [9] C. Tsai; Y. Hu Enhancement of efficiency by thrifty search of interlocking neighbor grids approach for grid-based data clustering Machine Learning and Cybernetics (ICMLC) 2013 International Conference on 2013 pp. 1279-1284

  • [10] M. Ester H. P. Kriegel J. S X. W. Xu A density based algorithm for discovering clusters in large spatial databases with noise in Proc. 2nd International Conference on 1993 pp. 2-11

  • [11] S.T.Mai He. Xiao N. Hubig C. Plant and C. Bohm Active Density-Based Clustering Data Mining (ICDM) 2013 IEEE 13th International Conference on 2013 pp. 508–517

  • [12] Zahn C. T. Graph-theoretical methods for detecting and describing gestalt clusters. IEEE trans. Comput. C-20 (Apr.) 1971 pp. 68-86

  • [13] F. Chamroukhi Robust EM algorithm for model-based curve clustering Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN) 2013 pp. 1-8

  • [14] B. J. Frey D. Dueck Clustering by Passing Messages Between Data Points in Science vol. 315 2007 pp. 972-976

  • [15] Wang Kai-jun Zhang Jun-ying Li Dan et al Adaptive Affinity Propagation Clustering J. Acta Automatica Sinica vol. 33(12) 2007 pp. 1242-1246

  • [16] Wang Kai-jun Li Jian Zhang Jun-ying et al Semi-supervised Affinity Propagation Clustering J. Computer Engineering vol. 33(23) 2007 pp. 197-201

  • [17] Yancheng He Qingcai Chen Xiaolong et al An Adaptive Affinity Propagation Docu-ment Clustering Proceedings of the 7th International Conference on Informatics and Sys-tems 2010 pp. 1-7

  • [18] Yangqing Jiay Jingdong Wangz Changshui Zhangy Xian-Sheng Hua Finding Image Exemplars Using Fast Sparse Affinity Propagation Proceedings of the 16th ACM International conference on Multimedia 2006 pp. 113-118

  • [19] Yasuhiro Fujiwara Go Irie and Tomoe Kitahara Fast Algorithm for Affinity Propagation International Joint Conference on Artificial Intelligence (IJCAI) 2011 pp. 2238-2243

  • [20] Xiangliang Zhang Wei Wang Kjetil Nrvag and Michele Sebag K-AP: Generating Specified K Clusters by Efficient Affinity Propagation Data Mining (ICDM) 2010 IEEE 10th International Conference on 2010 pp. 1187-1192

  • [21] Xiaonan Liu Meijuan Yin Junyong Luo and Wuping Chen An Improved Affinity Propagation Clustering Algorithm for Large-scale Data Sets 2013 Ninth International Conference on Natural Computation (ICNC) IEEE 2013 pp. 894 - 899

  • [22] W. Barbakh and C. Fyfe. Inverse weighted clustering algorithm Computing and InformationSystems 11(2)10-18 May 2007. ISSN 1352-9404.

  • [23] C.-D. Wang J.-H. Lai C. Suen and J.-Y. Zhu Multi-exemplar affinity propagation Pattern Analysis and Machine Intelligence IEEE Transactions on vol. 35 2013 pp. 2223–2237

  • [24] K.J. Wang J.Y. Zhang D. Li X.N. Zhang and T. Guo Adaptive Affinity Propagation Clustering Acta Automatica Sinica vol. 33 no. 12 2007 pp. 1242-1246

  • [25] C. L. Blake C. J. Merz “UCI repository of machine learning databases” 2012

  • [26] L. N. Ana Fred K. J. Anil Robust Data Clustering Computer Vision and Pattern Recognition 2003 Proceedings 2003 IEEE Computer Society Conference on 2003 pp. 128 – 133

Journal information
Impact Factor

CiteScore 2018: 4.70

SCImago Journal Rank (SJR) 2018: 0.351
Source Normalized Impact per Paper (SNIP) 2018: 4.066

Cited By
All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 503 291 13
PDF Downloads 177 136 6