Probabilistic Sequence Mining – Evaluation and Extension of ProMFS Algorithm for Real-Time Problems

Open access

Abstract

Sequential pattern mining is an extensively studied method for data mining. One of new and less documented approaches is estimation of statistical characteristics of sequence for creating model sequences, that can be used to speed up the process of sequence mining. This paper proposes extensive modifications to one of such algorithms, ProMFS (probabilistic algorithm for mining frequent sequences), which notably increases algorithm’s processing speed by a significant reduction of its computational complexity. A new version of algorithm is evaluated for real-life and artificial data sets and proven to be useful in real-time applications and problems.

[1] R. Agrawal and R. Srikant, “Mining sequential patterns,” in Proceedingsof the Eleventh International Conference on Data Engineering, 1995, pp. 3-14.

[2] R. Tumasonis and G. Dzemyda, “A probabilistic algorithm for mining frequent sequences,” in ABDIS, 2004.

[3] R. Agrawal and R. Srikant, “Mining sequential patterns: Generalizations and performance improvements,” in International Conference onExtending Database Technology, 1996, pp. 3-17.

[4] C. Antunes and A. Oliveira, “Sequential Pattern Mining Algorithms: Tradeoffs between Speed and Memory,” in 2nd Workshop on MiningGraphs, Trees and Seq, 2004.

[5] J. Ayres, J. Gehrke, T. Yiu, and J. Flannick, “Sequential PAttern Mining using a Bitmap Representation,” in Proceedings of the eighth ACMSIGKDD international conference on Knowledge discovery and datamining, 2002, pp. 429-435.

[6] Z. Yang, Y. Wang, and M. Kitsuregawa, “LAPIN-SPAM: An improved algorithm for mining sequential pattern,” in Proceedings of the 21stInternational Conference on Data Engineering Workshops, 2005, p. 1222.

[7] R. Dass, “An Efficient Algorithm for Frequent Pattern Mining for Real- Time Business Intelligence Analytics in Dense Datasets,” in HICSS 06Proceedings of the 39th Annual Hawaii International Conference onSystem Sciences, 2006, p. 170b.

[8] R. Agrawal and R. Srikant, “Fast algorithms for mining association rules in large databases,” in Proceedings of the 20th International Conferenceon Very Large Data Bases, VLDB, 1994, pp. 487-499.

[9] K. Hryni ´ow, “Probabilistic sequence mining - evaluation and extension of promfs algorithm,” in IIPhDW2009, Szklarska Poreba, Poland, 2009.

[10] D. W. Cheung, J. Han, V. Ng, and C. Wong, “Maintenance of Discovered Association Rules in Large Databases: An Incremental Updating Technique,” in Proceedings of the Twelfth International Conference onData Engineering, 1996, pp. 106-114.

[11] “King James bible,” (1611 Authorized Version, 1769 Revised Edition), http://printkjv.ifbweb.com/.

[12] K. Hryni ´ow, “Parallel pattern mining - application of GSP algorithm for Graphics Processing Units,” in 13th International Carpathian ControlConference, Slovakia, 2012, pp. 233-236.

International Journal of Electronics and Telecommunications

The Journal of Committee of Electronics and Telecommunications of Polish Academy of Sciences

Journal Information


CiteScore 2016: 0.72

SCImago Journal Rank (SJR) 2016: 0.248
Source Normalized Impact per Paper (SNIP) 2016: 0.542

Metrics

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 98 98 7
PDF Downloads 29 29 5