In this paper, we introduce a method for survival analysis on data streams. Survival analysis (also known as event history analysis) is an established statistical method for the study of temporal “events” or, more specifically, questions regarding the temporal distribution of the occurrence of events and their dependence on covariates of the data sources. To make this method applicable in the setting of data streams, we propose an adaptive variant of a model that is closely related to the well-known Cox proportional hazard model. Adopting a sliding window approach, our method continuously updates its parameters based on the event data in the current time window. As a proof of concept, we present two case studies in which our method is used for different types of spatio-temporal data analysis, namely, the analysis of earthquake data and Twitter data. In an attempt to explain the frequency of events by the spatial location of the data source, both studies use the location as covariates of the sources.
If the inline PDF is not rendering correctly, you can download the PDF file here.
Aggarwal C.C. Han J. Wang J. and Yu P.S. (2003). A framework for clustering evolving data streams Proceedingsof the 29th International Conference on Very LargeData Bases Berlin Germany pp. 81-92.
Allan J. Papka R. and Lavrenko V. (1998). On-line new event detection and tracking Proceedings of the 21st AnnualInternational ACM SIGIR Conference on Researchand Development in Information Retrieval (SIGIR 1998)Melbourne Australia pp. 37-45.
Amati G. Amodeo G. and Gaibisso C. (2012). Survival analysis for freshness in microblogging search Proceedingsof the 21st ACM International Conference on Informationand Knowledge Management (CIKM-2012) MauiHI USA pp. 2483-2486.
Amodeo G. Blanco R. and Brefeld U. (2011). Hybrid models for future event prediction Proceedings of the 20th ACMInternational Conference on Information and KnowledgeManagement (CIKM-2011) Glasgow UK pp. 1981-1984.
Babcock B. Babu S. Datar M. Motwani R. and Widom J. (2002). Models and issues in data stream systems Proceedings of the 21st ACM SIGACT-SIGMOD-SIGARTSymposium on Principles of Database Systems MadisonWI USA pp. 1-16.
Beringer J. and H¨ullermeier E. (2006). Online clustering of parallel data streams Data and Knowledge Engineering58(2): 180-204.
Bottou L. (1998). Online algorithms and stochastic approximations in D. Saad (Ed.) Online Learningand Neural Networks Cambridge University Press Cambridge.
Chen G.Wu X. and Zhu X. (2005). Sequential pattern mining in multiple streams Proceedings of the 5th IEEE InternationalConference on Data Mining (ICDM) Houston TXUSA pp. 585-588.
Cheon S.-P. Kim S. Lee S.-Y. and Lee C.-B. (2009). Bayesian networks based rare event prediction with sensor data Knowledge-Based Systems 22(5): 336-343.
Cherniack M. Balakrishnan H. Balazinska M. Carney D. Cetintemel U. Xing Y. and Zdonik S. (2003). Scalable distributed stream processing Proceedings of CIDR-03:1st Biennial Conference on Innovative Database SystemsAsilomar CA USA.
Considine J. Li F. Kollios G. and Byers J. (2004). Approximate aggregation techniques for sensor databases ICDE-04: 20th IEEE International Conference on DataEngineering Boston MA USA pp. 449-460.
Cormode G. and Muthukrishnan S. (2005). What’s hot and what’s not: Tracking most frequent items dynamically ACM Transactions on Database Systems 30(1): 249-278.
Cox D. (1972). Regression models and life tables Journal ofthe Royal Statistical Society B 34(2): 187-220.
Cox D. R. and Oakes D. (1984). Analysis of Survival Data Chapman & Hall London.
Das A. Gehrke J. and Riedewald M. (2003). Approximate join processing over data streams Proceedings of the 2003 ACM SIGMOD International Conference on Managementof Data San Diego CA USA pp. 40-51.
Domingos P. and Hulten G. (2003). A general framework for mining massive data streams Journal of Computationaland Graphical Statistics 12(4): 945-949.
Gaber M.M. Zaslavsky A. and Krishnaswamy S. (2005). Mining data streams: A review ACM SIGMOD Record34(1): 18-26.
Gama J. (2012). A survey on learning from data streams: Current and future trends Progress in Artificial Intelligence1(1): 45-55.
Gama J. and Gaber M.M. (2007). Learning from Data Streams Springer-Verlag Berlin/New York NY.
Garofalakis M. Gehrke J. and Rastogi R. (2002). Querying and mining data streams: You only get one look Proceedingsof the 2002 ACM SIGMOD International Conferenceon Management of Data Madison WI USA pp. 635-635.
Golab L. and Tamer M. (2003). Issues in data stream management ACM SIGMOD Record 32(2): 5-14.
Hulten G. Spencer L. and Domingos P. (2001). Mining time-changing data streams Proceedings of the 7th ACMSIGKDD International Conference on Knowledge Discoveryand Data Mining San Francisco CA USA pp. 97-106.
Ikonomovska E. Gama J. and Dzeroski S. (2011). Learning model trees from evolving data streams Data Mining andKnowledge Discovery 23(1): 128-168.
Krizanovic K. Galic Z. and Baranovic M. (2011). Data types and operations for spatio-temporal data streams IEEEInternational Conference on Mobile Data Management(MDM) Lule°a Sweden pp. 11-14.
Li R. Lei K.H. Khadiwala R. and Chang K.C.-C. (2012). Tedas: A twitter-based event detection and analysis system Proceedings of the IEEE 28th International Conferenceon Data Engineering (ICDE 2012) WashingtonDC USA pp. 1273-1276.
Oliveira M. and Gama J. (2012). A framework to monitor clusters evolution applied to economy and finance problems Intelligent Data Analysis 16(1): 93-111.
Radinsky K. and Horvitz E. (2013). Mining the web to predict future events Proceedings of the 6th ACM InternationalConference on Web Search and Data Mining(WSDM 2013) Rome Italy pp. 255-264.
Sakaki T. Okazaki M. and Matsuo Y. (2013). Tweet analysis for real-time event detection and earthquake reporting system development IEEE Transactions on Knowledgeand Data Engineering 25(4): 919-931.
Weng J. and Lee B.-S. (2011). Event detection in twitter Proceedingsof the 5th International Conference on Weblogsand Social Media (ICWSM 2011) Barcelona Spain.
Yang Y. Pierce T. and Carbonell J.G. (1998). A study of retrospective and on-line event detection Proceedings ofthe 21st Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval (SIGIR1998) Melbourne Australia pp. 28-36.
Zadeh L. (1965). Fuzzy sets Information and Control8(3): 338-353.
Zupan B. Demˇsar J. Kattan M.W. Beck J.R. and Bratko I. (2000). Machine learning for survival analysis: A case study on recurrence of prostate cancer Artificial Intelligencein Medicine 20(1): 59-75.