A New Approach For Discovering Top-K Sequential Patterns Based On The Variety Of Items

Shigeaki Sakurai 1  and Minoru Nishizawa 2
  • 1 Big Data Cloud Technology Center, Toshiba Corporation Cloud & Solutions Company, 72-34, Horikawa-cho, Saiwai-ku, Kawasaki 212-8585, Japan
  • 2 Advanced IT Research and Development Center, Toshiba Solutions Company, 3-22, Katamachi, Fuchu, Tokyo 183-8512, Japan


This paper proposes a method that discovers various sequential patterns from sequential data. The sequential data is a set of sequences. Each sequence is a row of item sets. Many previous methods discover frequent sequential patterns from the data. However, the patterns tend to be similar to each other because they are composed of limited items. The patterns do not always correspond to the interests of analysts. Therefore, this paper tackles on the issue discovering various sequential patterns. The proposed method decides redundant sequential patterns by evaluating the variety of items and deletes them based on three kinds of delete processes. It can discover various sequential patterns within the upper bound for the number of sequential patterns given by the analysts. This paper applies the method to the synthetic sequential data which is characterized by number of items, their kind, and length of sequence. The effect of the method is verified through numerical experiments.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • [1] R. Agrawal, R. Srikant, Mining Sequential Patterns, Proc. of the 11th International Conference on Data Engineering, 1995, pp. 3-14.

  • [2] J. F. Allen, Maintaining Knowledge about Temporal Intervals, Communications of the ACM, vol. 26, no. 11, 1983, pp. 832-843.

  • [3] J. Ayres, J. E. Gehrke, T. Yiu, J. Flannick, Sequential Pattern Mining using Bitmaps, Proc. of the 8th International Conference on Knowledge Discovery and Data Mining, 2002, pp. 429-435.

  • [4] J.-H. Chiang, Z.-X. Yin, C.-Y. Chen, Discovering Gene-gene Relations from Fuzzy Sequential Sentence Patterns in Biomedical Literature, Proc. of the 13th IEEE International Conference on Fuzzy Systems, vol. 2, 2004, pp. 1165-1168.

  • [5] C. Fiot, A. Laurent, M. Teisseire, Approximate Sequential Patterns for Incomplete Sequence Database Mining, Proc. of the 16th IEEE International Conference on Fuzzy Systems, 2007, pp. 1-6.

  • [6] C. Fiot, F. Masseglia, A. Laurent, M. Teisseire, TED and EVA: Expressing Temporal Tendencies among Quantitative Variables using Fuzzy Sequential Patterns, Proc. of the 17th IEEE International Conference on Fuzzy Systems, 2008, pp. 1861-1868.

  • [7] F. Giannotti, M. Nanni, D. Pedreschi, Efficient Mining of Temporally Annotated Sequences, Proc. of the 2006 SIAM International Conference on Data Mining, 2006, pp. 348-359.

  • [8] F. Höppner, Discovery of Temporal Patterns - Learning Rules about the Qualitative Behaviour of Time Series, Proc. of the 5th European Conference on Principles of Data Mining and Knowledge Discovery, 2001, pp. 192-203.

  • [9] Y.-P. Huang, L.-J. Kao, A Novel Approach to Mining Inter-transaction Fuzzy Association Rules from Stock Price Variation Data, Proc. of the 14th IEEE International Conference on Fuzzy Systems, 2005, pp. 791-796.

  • [10] J.-Y. Jiang, W.-J. Lee, S.-J. Lee, Mining Calendarbased Asynchronous Periodical Association Rules with Fuzzy Calendar Constraints, Proc. of the 14th IEEE International Conference on Fuzzy Systems, 2005, pp. 773-778.

  • [11] J. Lin, E. Keogh, S. Lonarrdi, B. Chiu, A Symbolic Representation of Time Series, with Implications for Streaming Algorithms, Proc. of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 2003, pp. 2-11.

  • [12] B. Lkhagava, Y. Suzuki, K. Kawagoe, Extended SAX: Extension of Symbolic Aggregate Approximation for Financial Time Series Data Representation, Proc. of the Data Engineering Workshop 2006, 2006, 4A0-8.

  • [13] S. Malinowski, T. Guyet, R. Quiniou, R. Tavenard, 1d-SAX : A Novel Symbolic Representation for Time Series, Proc. of the 12th International Symposium on Intelligent Data Analysis, 2013, pp. 273-284.

  • [14] J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, M. Hsu, PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth, Proc. of the 17th International Conference on Data Engineering, 2001, pp. 215-224.

  • [15] S. Sakurai, K. Ueno, R. Orihara, Discovery of Time Series Event Patterns based on Time Constraints from Textual Data, International Journal of Computational Intelligence, vol. 4, no. 2, 2008, pp. 144-151.

  • [16] R. Srikant, R. Agrawal, Mining Sequential Patterns: Generalizations and Performance Improvements, Proc. of the 5th International Conference on Extending Database Technology, 1996, pp. 3-17.

  • [17] P. Tzvetkov, X. Yan, J. Han, TSP: Mining Top-k Closed Sequential Patterns, Knowledge and Information Systems, vol. 7, issue 4, 2005, pp. 438-457.

  • [18] A. Vautier, M.-O. Cordier, R. Quiniou, An Inductive Database for Mining Temporal Patterns in Event Sequences, Proc. of the 2005 ECML/PKDD Workshop on Mining Spatial and Temporal Data, 2005, pp. 1640-1641.

  • [19] M. J. Zaki, SPADE: An Efficient Algorithm for Mining Frequent Sequences, Machine Learning, vol. 42, no. 1, 2001, pp. 31-60.

  • [20] W. Zalewski, F. Silva, H. D. Lee, A. G. Maletzke, F. C. Wu, Time Series Discretization based on the Approximation of the Local Slope Information, Proc. of the 13th Ibero-American Conference on AI, 2012, pp. 91-100.


Journal + Issues