Dsmk-Means “Density-Based Split-And-Merge K-Means Clustering Algorithm”

Raed T. Aldahdooh 1  and Wesam Ashour 1
  • 1 Computer Engineering Dept., Islamic University of Gaza (IUG), Gaza, Palestine

Abstract

Clustering is widely used to explore and understand large collections of data. K-means clustering method is one of the most popular approaches due to its ease of use and simplicity to implement. This paper introduces Density-based Split- and -Merge K-means clustering Algorithm (DSMK-means), which is developed to address stability problems of standard K-means clustering algorithm, and to improve the performance of clustering when dealing with datasets that contain clusters with different complex shapes and noise or outliers. Based on a set of many experiments, this paper concluded that developed algorithms “DSMK-means” are more capable of finding high accuracy results compared with other algorithms especially as they can process datasets containing clusters with different shapes, densities, or those with outliers and noise.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • [1] wikipedia. (2012, April) wikipedia. [Online]. http://en.wikipedia.org/wiki/1854_Broad_Street_cholera_outbreak

  • [2] T. Abraham and J. F. Roddick, ”Survey of Spatio- Temporal Databases,” GeoInformatica, vol. 3, March 1999.W540W4226

  • [3] D. Birant and A. Kut, ”ST-DBSCAN: an algorithm for clustering spatial-temporal data,” Data & Knowledge Engineering, vol. 60, pp. 208-221, 2007.W540W4226

  • [4] Oded Maimon (Editor) and Lior Rokach (Editor),.: Springer; 1 edition, September 1, 2005.W540W4226

  • [5] J. MacQueen, ”Some methods for classification and analysis of multivariate observations,” Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 281-297, 1967.W540W4226

  • [6] H. Vinod, ”Integer programming and the theory of grouping,” Journal of the American Statistical Association, vol. 64, pp. 506-519, 1969.W540W4226

  • [7] Anil K. Jain, ”Data Clustering: 50 Years Beyond K-Means,” Pattern Recognition Letters, 2009.W540W4226

  • [8] Zhiwu Huang, DongZhan Zhang, and JiangJiao Duan, ”BNAK-Divide-and-Merge Clustering Algorithm,” in ICISE ’09 Proceedings of the 2009 First IEEE International Conference on Information Science and Engineering , 2009 , pp. 810-813.W540W4226

  • [9] Jan Carlo Barca and Grace Rumantir, ”A Modified K-means Algorithm for Noise Reduction in Optical Motion Capture Data ,” in Computer and Information Science, 2007. ICIS 2007. 6th IEEE/ACIS International Conference on, 11-13 July 2007.W540W4226

  • [10] M. Muhr and M. Granitzer, ”Automatic Cluster Number Selection Using a Split and Merge KMeans Approach ,” IEEE Conference Publications, 20th International Workshop on Database and Expert Systems Application, pp. 363 - 367, 2009.W540W4226

  • [11] University of Massachusetts Amherst. Funding support from the National Science Foundation. UC Irvine Machine Learning Repository. [Online]. http://archive.ics.uci.edu/ml/W540W4226

  • [12] Clustering analysis. wikipedia. [Online]. http://en.wikipedia.org/wiki/Cluster_analysisEvaluation_of_clustering_resultsW540W4226

  • [13] Oded Maimon and Lior Rokach, Data Mining And Knowledge Discovery Handbook, 1st ed., 978-0387244358, Ed.: amazon, 2005.W540W4226

  • [14] H. Bozdogan, ”Akaike’s Information Criterion and Recent Developments in Information Complexity,” Journal of Mathematical Psychology, vol. 44, pp. 62-91, 2000.W540W4226

  • [15] wikipedia. [Online]. http://en.wikipedia.org/wiki/Akaike_information_criterionW540W4226

  • [16] G. Schwarz, ”Estimating the dimension of a model,” Annals of Statistics, vol. 6(??), pp. 461-464, 1978.W540W4226

  • [17] Y. Zhao and G. Karypis, ”Criterion functions for document clustering,” Technical report, Department of Computer Science, University of Minnesota / Army HPC Research Center, 2002.W540W4226 x

OPEN ACCESS

Journal + Issues

Search