Random projections and Hotelling’s T2 statistics for change detection in high-dimensional data streams

Open access

The method of change (or anomaly) detection in high-dimensional discrete-time processes using a multivariate Hotelling chart is presented. We use normal random projections as a method of dimensionality reduction. We indicate diagnostic properties of the Hotelling control chart applied to data projected onto a random subspace of Rn. We examine the random projection method using artificial noisy image sequences as examples.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • Achlioptas D. (2001 ). Database friendly random projections Proceedings of the 20th ACM SIGMOD-SIGACT-SIGARTSymposium on Principles of Database Systems Santa BarbaraCA USA pp. 274-281.

  • Ailon N. and Chazelle B. (2006). Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform Proceedings of the 38th Annual ACM Symposium on Theoryof Computing Seattle WA USA pp. 557-563.

  • Arriaga R. and Vempala S.(1999). An algorithmic theory of learning: Robust concepts and random projection Proceedings of the 40th Annual IEEE Symposium on theFoundations of Computer Science New York NY USA pp. 616-623.

  • Biau G. and Devroye L. and Lugosi G. (2008). On the performance of clustering in Hilbert spaces IEEE Transactionson Information Theory 54(2): 781-790.

  • Bodnar O. and Schmid W. (2005). Multivariate control charts based on a projection approach Allgemeines StatistischesArchiv 89(1): 75-93.

  • Chandola V. Banerjee A. and Kumar V. (2009). Anomaly detection: A survey ACM Computing Surveys41(3): 15:1-15:58.

  • Cramer H. and Wold H.(1936). Some theorems on distribution functions Journal of the London Mathematical Society11(2): 290-295.

  • Cuesta-Albertos J.A. del Barrio E. Fraiman R. and Matran C. (2007). The random projection method in goodness of fit for functional data Computational Statistics and DataAnalysis 51(10): 4814-4831.

  • Cuturi M. Vert J-P. and dAspremont A. (2009). White functionals for anomaly detection in dynamical systems in Y. Bengio D. Schuurmans J. Lafferty C.K.I. Williams and A. Culotta (Eds.) Advances in Neural InformationProcessing Systems Vol. 22 MIT Press Vancouver pp. 432-440.

  • Dasgupta S. and Gupta A. (2003). An elementary proof of a theorem of Johnson and Lindenstrauss Random Structuresand Algorithms 22(1): 60-65.

  • Donoho D.L. (2000 ). High-dimensional data analysis: The curses and blessings of dimensionality Technical report Department of Statistics Stanford University Stanford CA.

  • Frankl P. and Maehara H. (1987). The Johnson-Lindenstrauss lemma and the sphericity of some graphs Journal of CombinatorialTheory A 44(3): 355-362.

  • Forbes C. Evans M. and Hastings N. and Peacock B. (2011). Statistical Distributions 4th Edn. John Wiley and Sons Inc. Hoboken NJ.

  • Hyv¨arinen A. Karhunen J. and Oja E. (2001). IndependentComponent Analysis Wiley New York NY.

  • Hotelling H. (1931). The generalization of Student’s ratio TheAnnals of Mathematical Statistics 2(3): 360-378.

  • Indyk P. and Motwani R. (1998). Approximate nearest neighbors: Towards removing the curse of dimensionality Proceedings of the 30th Annual ACM Symposium on theTheory of Computing Dallas TX USA pp. 604-613.

  • Indyk P. and Naor A.(2007). Nearest neighbor preserving embeddings ACM Transactions on Algorithms 3(3): 31:1-31:12.

  • Jolliffe I.T. (1986). Principal Component Analysis Springer-Verlag New York NY.

  • Johnson W.B. and Lindenstrauss J.(1984). Extensions of Lipschitz mapping into Hilbert space ContemporaryMathematics 26: 189-206.

  • Korbicz J. Ko´scielny J.M. Kowalczuk Z. and Cholewa W. (Eds.) (2004). Fault Diagnosis. Models ArtificialIntelligence Applications. Springer Verlag Berlin/Heidelberg/New York NY.

  • Lee J.A. and Verleysen M. (2007). Nonlinear DimensionalityReduction Springer New York NY.

  • Li P. Hastie T.J. and Church K.W. (2006a). Nonlinear estimators and tail bounds for dimension reduction in L1 using Cauchy random projections Technical report Department of Statistics Stanford University Stanford CA.

  • Li P. Hastie T.J. and Church K.W. (2006b). Sub-Gaussian random projections Technical report Department of Statistics Stanford University Stanford CA.

  • Mason R.L. Tracy N.D. and Young J.C. (1992). Multivariate control charts for individual observations Journal ofQuality Technology 24(2): 88-95.

  • Mason R.L. and Young J.C. (2002). Multivariate StatisticalProcess Control with Industrial Application SIAM Philadelphia PA.

  • Mathai A.M. and Provost S.B. (1992). Quadratic Formsin Random Variables: Theory and Applications Marcel Dekker New York NY.

  • Matouˆsek J.(2008). On variants of the Johnson-Lindenstrauss lemma Random Structures and Algorithms 33(2): 142-156.

  • Milman V.(1971). A new proof of the theorem of A. Dvoretzky on sections of convex bodies Functional Analysis and ItsApplications 5(4): 28-37 (English translation).

  • Montgomery D.C. (1996 ). Introduction to Statistical QualityControl 3rd Edn. John Wiley and Sons New York NY.

  • Qin S.J.(2003). Statistical process monitoring: Basics and beyond Journal of Chemometrics 17(8-9): 480-502.

  • Rao C.R. (1973). Linear Statistical Inference andIts Applications John Wiley and Sons New York NY/London/Sydney/Toronto.

  • Runger G.C. (1996). Projections and the U-squared multivariate control chart Journal of Quality Technology28(3): 313-319.

  • Runger G. Barton R. Del Castillo E. and Woodall W.H. (2007). Optimal monitoring of multivariate data for fault patterns Journal of Quality Technology 39(2): 159-172.

  • Skubalska-Rafajłowicz E. (2006). RBF neural network for probability density function estimation and detecting changes in multivariate processes in L. Rutkowski R. Tadeusiewicz L.A. Zadeh and J. ˙Zurada (Eds.) Artificial Intelligence and Soft Computing Lecture Notes in Computer Science Vol. 4029 Springer-Verlag Berlin/Heidelberg pp. 133-141.

  • Skubalska-Rafajłowicz E. (2008). Random projection RBF nets for multidimensional density estimation InternationalJournal of Applied Mathematics and Computer Science18(4): 455-464 DOI: 10.2478/v10006-008-0040-9.

  • Skubalska-Rafajłowicz E. (2009). Neural networks with sigmoidal activation functions dimension reduction using normal random projection Nonlinear Analysis 71(12): e1255-e1263.

  • Skubalska-Rafajłowicz E. (2011). Fast and efficient method of change detection in statistically monitored high-dimensional data streams Proceedings of the 10thInternational Science and Technology Conference on Diagnosticsof Processes and Systems Zamo´s´c Poland pp. 256-260.

  • SrivastavaM.S. (2009). A review of multivariate theory for high dimensional data with fewer observations in A. SenGupta (Ed.) Advances in Multivariate Statistical Methods Vol. 9 World Scientific Singapore pp. 25-52.

  • Sulliva J.H. and Woodall W.H. (2000). Change-point detection of mean vector or covariance matrix shifts using multivariate individual observations IIE Transactions32(6): 537-549.

  • Tsung F. and Wang K. (2010). Adaptive charting techniques: Literature review and extensions in H.-J. Lenz P.-T. Wilrich and W. Schmid (Eds.) Frontiers in StatisticalQuality Control Vol. 9 Springer-Verlag Berlin/Heidelberg pp. 19-35.

  • Vempala S. (2004). The Random Projection Method American Mathematical Society Providence RI.

  • Wang K. and Jiang W. (2009). High-dimensional process monitoring and fault isolation via variable selection Journalof Quality Technology 41(3): 247-258.

  • Wang J. (2012). Geometric Structure of High-DimensionalData and Dimensionality Reduction Higher Education Press Beijing/Springer-Verlag Berlin/Heidelberg.

  • Wold H. (1966). Estimation of principal components and related models by iterative least squares in P. Krishnaiaah (Ed.) Multivariate Analysis Academic Press New York NY pp. 391-420.

  • Zorriassatine F. Tannock J.D.T. and O‘Brien C. (2003). Using novelty detection to identify abnormalities caused by mean shifts in bivariate processes Computers and Industrial Engineering44(3): 385-408.

Journal information
Impact Factor

IMPACT FACTOR 2018: 1,504
5-year IMPACT FACTOR: 1,553

CiteScore 2018: 2.09

SCImago Journal Rank (SJR) 2018: 0.493
Source Normalized Impact per Paper (SNIP) 2018: 1.361

Mathematical Citation Quotient (MCQ) 2017: 0.13

Cited By
All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 192 128 9
PDF Downloads 77 66 4