Impact of Learners’ Quality and Diversity in Collaborative Clustering

Open access

Abstract

Collaborative Clustering is a data mining task the aim of which is to use several clustering algorithms to analyze different aspects of the same data. The aim of collaborative clustering is to reveal the common underlying structure of data spread across multiple data sites by applying clustering techniques. The idea of collaborative clustering is that each collaborator shares some information about the segmentation (structure) of its local data and improve its own clustering with the information provided by the other learners. This paper analyses the impact of the quality and the diversity of the potential learners to the quality of the collaboration for topological collaborative clustering algorithms based on the learning of a Self-Organizing Map (SOM). Experimental analysis on real data-sets showed that the diversity between learners impact the quality of the collaboration. We also showed that some internal indexes of quality are a good estimator of the increase of quality due to the collaboration.

[1] R. E. Schapire, The strength of weak learn-ability, Mach. Learn., vol. 5, no. 2, pp. 197–227, Jul. 1990. [Online]. Available: http://dx.doi.org/10.1023/A:1022648800760

[2] D. H. Wolpert, Stacked generalization, Neural Networks, vol. 5, pp. 241–259, 1992

[3] J. Kittler, M. Hatef, R. P. W. Duin, and J. Matas, On combining classifiers, IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, no. 3, pp. 226–239, Mar. 1998. [Online]. Available: http://dx.doi.org/10.1109/34.667881

[4] P. Bachman, O. Alsharif, and D. Precup, Learning with pseudo-ensembles, in Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Weinberger, Eds. Curran Associates, Inc., 2014, pp. 3365–3373

[5] A. Strehl and J. Ghosh, Cluster Ensembles – A Knowledge Reuse Framework for Combining Multiple Partitions, Journal on Machine Learning Research (JMLR), vol. 3, pp. 583–617, Dec. 2002

[6] J. da Silva and M. Klusch, Inference on distributed data clustering, in Machine Learning and Data Mining in Pattern Recognition, ser. Lecture Notes in Computer Science, P. Perner and A. Imiya, Eds. Springer Berlin Heidelberg, 2005, vol. 3587, pp. 610–619. [Online]. Available: http://dx.doi.org/10.1007/11510888_60

[7] W. Pedrycz, Collaborative fuzzy clustering, Pattern Recognition Letters, vol. 23, no. 14, pp. 1675–1686, 2002

[8] N. Grozavu, M. Ghassany, and Y. Bennani, Learning confidence exchange in collaborative clustering, in IJCNN, 2011, pp. 872–879

[9] W. Pedrycz and K. Hirota, A consensus-driven fuzzy clustering, Pattern Recogn. Lett., vol. 29, no. 9, pp. 1333–1343, 2008

[10] N. Grozavu, G. Cabanes, and Y. Bennani, Diversity analysis in collaborative clustering, in IEEE World Congress on Computational Intelligence, 2014

[11] B. Depaire, R. Falcón, K. Vanhoof, and G. Wets, Pso driven collaborative clustering: A clustering algorithm for ubiquitous environments, Intell. Data Anal., vol. 15, no. 1, pp. 49–68, Jan. 2011. [Online]. Available: http://dl.acm.org/citation.cfm?id=1937721.1937725

[12] M. Ghassany, N. Grozavu, and Y. Bennani, Collaborative clustering using prototype-based techniques, International Journal of Computational Intelligence and Applications, vol. 11, no. 03, p. 1250017, 2012

[13] S. Zhang, C. Zhang, and X. Wu, Knowledge Discovery in Multiple Databases, ser. Advanced Information and Knowledge Processing. Springer, 2004. [Online]. Available: http://dx.doi.org/10.1007/978-0-85729-388-6

[14] W. Pedrycz, Interpretation of clusters in the framework of shadowed sets, Pattern Recogn. Lett., vol. 26, no. 15, pp. 2439–2449, 2005

[15] N. Grozavu and Y. Bennani, Topological collaborative clustering, Australian Journal of Intelligent Information Processing Systems, vol. 12, no. 3, 2010

[16] M. Ghassany, N. Grozavu, and Y. Bennani, Collaborative clustering using prototype-based techniques, International Journal of Computational Intelligence and Applications, vol. 11, no. 3, 2012

[17] N. Grozavu and Y. Bennani, Topological Collaborative Clustering, in LNCS Springer of ICONIP’10 : 17th International Conference on Neural Information Processing, 2010

[18] T. Kohonen, Self-organized formation of topologically correct feature maps, Biol. Cyb., vol. 43, pp. 59–69, 1982

[19] Analysis of a simple self-organizing process, Biol. Cyb., vol. 44, pp. 135–140, 1982

[20] C. M. Bishop and C. K. I. Williams, GTM: The generative topographic mapping, Neural Computation, vol. 10, pp. 215–234, 1998

[21] N. Grozavu, Y. Bennani, and M. Lebbah, From variable weighting to cluster characterization in topographic unsupervised learning, in Proc. of IJCNN09, International Joint Conference on Neural Network, 2009

[22] N. Grozavu and Y. Bennani, Topological collaborative clustering, Australian Journal of Intelligent Information Processing Systems, vol. 12, no. 2, 2010

[23] J. Sublime, N. Grozavu, G. Cabanes, Y. Bennani, and A. Cornuéjols, From horizontal to vertical collaborative clustering using generative topographic maps, International Journal of Hybrid Intelligent Systems, vol. 12, no. 4, 2016

[24] L. I. Kuncheva and C. J. Whitaker, Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy, Mach. Learn., vol. 51, no. 2, pp. 181–207, May 2003

[25] F. Gullo, A. Tagarelli, and S. Greco, Diversity-Based Weighting Schemes for Clustering Ensembles, in SDM, 2009, pp. 437–448

[26] N. Grozavu, M. Ghassany, and Y. Bennani, Learning confidence exchange in collaborative clustering, in Neural Networks (IJCNN), The 2011 International Joint Conference on IEEE, 2011, pp. 872–879

[27] A. K. Jain and R. C. Dubes, Algorithms for clustering data. Upper Saddle River, NJ, USA: Prentice-Hall, Inc., 1988

[28] W. M. Rand, Objective criteria for the evaluation of clustering methods, Journal of the American Statistical Association, vol. 66, no. 336, pp. 846–850, Dec. 1971

[29] L. Hubert and P. Arabie, Comparing Partitions, Journal of the Classification, vol. 2, pp. 193–218, 1985

[30] P. Jaccard, The distribution of the flora in the alpine zone, New Phytologist, vol. 11, no. 2, pp. 37–50, 1912

[31] D. L. Wallace, A Method for Comparing Two Hierarchical Clusterings: Comment, Journal of the American Statistical Association, vol. 78, no. 383, pp. pp. 569–576, 1983. [Online]. Available: http://www.jstor.org/stable/2288118

[32] F. Pinto, J. Carrico, M. Ramirez, and J. Almeida, Ranked Adjusted Rand: integrating distance and partition information in a measure of clustering agreement, BMC Bioinformatics, vol. 8, no. 1, p. 44, 2007. [Online]. Available: http://www.biomedcentral.com/1471-2105/8/44

[33] I. H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2005

[34] M. Meila, Comparing clusterings - an information based distance, Journal of Multivariate Analysis, vol. 98, pp. 873–895, 2007

[35] A. Frank and A. Asuncion, UCI machine learning repository, 2010. [Online]. Available: http://archive.ics.uci.edu/ml

[36] T. Calinski and J. Harabasz, Dendrite method for cluster analysis, Communications in Statistics, vol. 3, no. 1, pp. 1–27, 1974

[37] D. L. Davies and D. W. Bouldin, A cluster separation measure, IEEE Trans. Pattern Anal. Mach. Intell., vol. 1, no. 2, pp. 224–227, Feb. 1979

[38] W. J. Krzanowski and Y. T. Lai, A criterion for determining the number of groups in a data set using sum-of-squares clustering, Biometrics, vol. 44, no. 1, pp. pp. 23–34, 1988. [Online]. Available: http://www.jstor.org/stable/2531893

[39] P. J. Rousseeuw, Silhouettes: A graphical aid to the interpretation and validation of cluster analysis, Journal of Computational and Applied Mathematics, vol. 20, no. 0, pp. 53 – 65, 1987. [Online]. Available: http://www.sciencedirect.com/science/article/pii/0377042787901257

[40] K. Kiviluoto, Topology Preservation in Self-Organizing Maps, International Conference on Neural Networks, pp. 294–299, 1996

[41] T. Kohonen, Self-Organizing Maps. Berlin: Springer-Verlag, 2001

Journal of Artificial Intelligence and Soft Computing Research

The Journal of Polish Neural Network Society, the University of Social Sciences in Lodz & Czestochowa University of Technology

Journal Information

CiteScore 2017: 5.00

SCImago Journal Rank (SJR) 2017: 0.492
Source Normalized Impact per Paper (SNIP) 2017: 2.813

Metrics

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 5883 5883 309
PDF Downloads 250 250 43