The paper presents a novel spectral algorithm EVSA (eigenvector structure analysis), which uses eigenvalues and eigenvectors of the adjacency matrix in order to discover clusters. Based on matrix perturbation theory and properties of graph spectra we show that the adjacency matrix can be more suitable for partitioning than other Laplacian matrices. The main problem concerning the use of the adjacency matrix is the selection of the appropriate eigenvectors. We thus propose an approach based on analysis of the adjacency matrix spectrum and eigenvector pairwise correlations. Formulated rules and heuristics allow choosing the right eigenvectors representing clusters, i.e., automatically establishing the number of groups. The algorithm requires only one parameter-the number of nearest neighbors. Unlike many other spectral methods, our solution does not need an additional clustering algorithm for final partitioning. We evaluate the proposed approach using real-world datasets of different sizes. Its performance is competitive to other both standard and new solutions, which require the number of clusters to be given as an input parameter.
If the inline PDF is not rendering correctly, you can download the PDF file here.
Azran A. and Ghahramani Z. (2006). Spectral methods for automatic multiscale data clustering in A. Fitzgibbon et al. (Eds.) Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition IEEE Computer Society Washington DC pp. 190-197.
Dua D. and Karra Taniskidou E. (2017). UCI Machine Learning Repository University of California Irvine CA http://archive.ics.uci.edu/ml.
Brigham R. and Dutton R. (1984). Bounds on graph spectra Journal of Combinatorial Theory: Series B 37(3): 228-234.
Brito M. Chavez E. Quiroz A. and Yukich J. (1997). Connectivity of the mutual k-nearest-neighbor graph Statistics and Probability Letters 35(1): 33-42.
Cheeger J. (1970). A lower bound for the smallest eigenvalue of the Laplacian in R. Gunning (Ed.) Problems in Analysis Princeton University Princeton NJ pp. 195-199.
Chung F. (1997). Spectral Graph Theory American Mathematical Society Providence RI.
Fiedler M. (1975). A property of eigenvectors of nonnegative symmetric matrices and its application to graph theory Czechoslovak Mathematical Journal 25(4): 619-633.
Hull J. (1994). A database for handwritten text recognition research IEEE Transactions on Pattern Analysis and Machine Intelligence 16(5): 550-554.
Jia H. Ding S. Xu X. and Nie R. (2014). The latest research progress on spectral clustering Neural Computing and Applications 124(7-8): 1477-1486.
Lin M.Wang F. and Zhang C. (2015). Large-scale eigenvector approximation Pattern Recognition 48(5): 1904-1912.
Liu A. Poon L. Liu T.F. and Zhang N. (2014). Latent tree models for rounding in spectral clustering Neurocomputing 144: 448-462.
Lovász L. and Vesztergombi K. (2002). Geometric representations of graphs in G. Halász et al. (Eds.) Paul Erdös and His Mathematics Bolyai Society Mathematical Studies Budapest pp. 471-498.
Lucińska M. and Wierzchoń S.T. (2014). Spectral clustering based on analysis of eigenvector properties in K. Saeed and V. Snasel (Eds.) Proceedings of Computer Information Systems and Industrial Management Applications Springer Berlin/Heidelberg pp. 43-54.
Manning C. Raghavan P. and Schütze H. (2008). An Introduction to Information Retrieval 1st Edn. Cambridge University Press Cambridge.
Meyer C. (2000). Matrix Analysis and Applied Linear Algebra 1st Edn. SIAM Philadelphia PA.
Mohar B. (1989). Isoperimetric numbers of graphs Journal of Combinatorial Theory 47(3): 274-291.
Ng A. Jordan M. and Weiss Y. (2001). On spectral clustering: Analysis and an algorithm in T. Dietterich et al. (Eds.) Advances in Neural Information Processing Systems MIT Press Cambridge MA pp. 849-856.
Schwarz G. (1978). Estimating the dimension of a model Annals of Statistics 6(2): 460-464.
Shi J. and Malik J. (2000). Normalized cuts and image segmentation IEEE Transactions on Pattern Analysis and Machine Intelligence 22(8): 888-905.
Shi T. Belkin M. and Yu B. (2009). Data spectroscopy: Eigenspaces of convolution operators and clustering Annals of Statistics 37(6b): 3960-3984.
Shmueli Y. Wolf G. and Averbuch A. (2012). Updating kernel methods in spectral decomposition by affinity perturbations Linear Algebra and Its Applications 437(6): 1356-1365.
Stewart G.W. and Sun J. (1990). Matrix Perturbation Theory 1st Edn. Academic Press New York NY.
von Luxburg U. (2007). A tutorial on spectral clustering Statistics and Computing 17(4): 395-416.
Wu L. Ying X. Wu X. and Zhou Z.-H. (2011). Line orthogonality in adjacency eigenspace with application to community partition in T. Walsh (Ed.) Proceedings of the 22nd International Joint Conference on Artificial Intelligence AAAI Press Menlo Park CA pp. 2349-2354.
Xiang T. and Gong S. (2008). Spectral clustering with eigenvector selection Pattern Recognition 41(3): 1012-1029.
Yann L. and Corinna C. (2009). TheMNIST Database of Handwritten Digits http://yann.lecun.com/exdb/mnist/.
Zelnik-Manor L. and Perona P. (2004). Self-tuning spectral clustering in L.K. Saul et al. (Eds.) Advances in Neural Information Processing Systems MIT Press Cambridge MA pp. 1601-1608.
Zheng X. and Lin X. (2004). Automatic determination of intrinsic cluster number family in spectral clustering using random walk on graph Proceedings of the International Conference on Image Processing Singapore Singapore pp. 3471-3474.