Dsmk-Means “Density-Based Split-And-Merge K-Means Clustering Algorithm”

Open access

Abstract

Clustering is widely used to explore and understand large collections of data. K-means clustering method is one of the most popular approaches due to its ease of use and simplicity to implement. This paper introduces Density-based Split- and -Merge K-means clustering Algorithm (DSMK-means), which is developed to address stability problems of standard K-means clustering algorithm, and to improve the performance of clustering when dealing with datasets that contain clusters with different complex shapes and noise or outliers. Based on a set of many experiments, this paper concluded that developed algorithms “DSMK-means” are more capable of finding high accuracy results compared with other algorithms especially as they can process datasets containing clusters with different shapes, densities, or those with outliers and noise.

[1] wikipedia. (2012, April) wikipedia. [Online]. http://en.wikipedia.org/wiki/1854_Broad_Street_cholera_outbreak

[2] T. Abraham and J. F. Roddick, ”Survey of Spatio- Temporal Databases,” GeoInformatica, vol. 3, March 1999.W540W4226

[3] D. Birant and A. Kut, ”ST-DBSCAN: an algorithm for clustering spatial-temporal data,” Data & Knowledge Engineering, vol. 60, pp. 208-221, 2007.W540W4226

[4] Oded Maimon (Editor) and Lior Rokach (Editor),.: Springer; 1 edition, September 1, 2005.W540W4226

[5] J. MacQueen, ”Some methods for classification and analysis of multivariate observations,” Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 281-297, 1967.W540W4226

[6] H. Vinod, ”Integer programming and the theory of grouping,” Journal of the American Statistical Association, vol. 64, pp. 506-519, 1969.W540W4226

[7] Anil K. Jain, ”Data Clustering: 50 Years Beyond K-Means,” Pattern Recognition Letters, 2009.W540W4226

[8] Zhiwu Huang, DongZhan Zhang, and JiangJiao Duan, ”BNAK-Divide-and-Merge Clustering Algorithm,” in ICISE ’09 Proceedings of the 2009 First IEEE International Conference on Information Science and Engineering , 2009 , pp. 810-813.W540W4226

[9] Jan Carlo Barca and Grace Rumantir, ”A Modified K-means Algorithm for Noise Reduction in Optical Motion Capture Data ,” in Computer and Information Science, 2007. ICIS 2007. 6th IEEE/ACIS International Conference on, 11-13 July 2007.W540W4226

[10] M. Muhr and M. Granitzer, ”Automatic Cluster Number Selection Using a Split and Merge KMeans Approach ,” IEEE Conference Publications, 20th International Workshop on Database and Expert Systems Application, pp. 363 - 367, 2009.W540W4226

[11] University of Massachusetts Amherst. Funding support from the National Science Foundation. UC Irvine Machine Learning Repository. [Online]. http://archive.ics.uci.edu/ml/W540W4226

[12] Clustering analysis. wikipedia. [Online]. http://en.wikipedia.org/wiki/Cluster_analysisEvaluation_of_clustering_resultsW540W4226

[13] Oded Maimon and Lior Rokach, Data Mining And Knowledge Discovery Handbook, 1st ed., 978-0387244358, Ed.: amazon, 2005.W540W4226

[14] H. Bozdogan, ”Akaike’s Information Criterion and Recent Developments in Information Complexity,” Journal of Mathematical Psychology, vol. 44, pp. 62-91, 2000.W540W4226

[15] wikipedia. [Online]. http://en.wikipedia.org/wiki/Akaike_information_criterionW540W4226

[16] G. Schwarz, ”Estimating the dimension of a model,” Annals of Statistics, vol. 6(??), pp. 461-464, 1978.W540W4226

[17] Y. Zhao and G. Karypis, ”Criterion functions for document clustering,” Technical report, Department of Computer Science, University of Minnesota / Army HPC Research Center, 2002.W540W4226 x

Journal of Artificial Intelligence and Soft Computing Research

The Journal of Polish Neural Network Society, the University of Social Sciences in Lodz & Czestochowa University of Technology

Journal Information

CiteScore 2017: 5.00

SCImago Journal Rank (SJR) 2017: 0.492
Source Normalized Impact per Paper (SNIP) 2017: 2.813

Metrics

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 153 153 15
PDF Downloads 37 37 6