In this paper, we introduce a method for survival analysis on data streams. Survival analysis (also known as event history analysis) is an established statistical method for the study of temporal “events” or, more specifically, questions regarding the temporal distribution of the occurrence of events and their dependence on covariates of the data sources. To make this method applicable in the setting of data streams, we propose an adaptive variant of a model that is closely related to the well-known Cox proportional hazard model. Adopting a sliding window approach, our method continuously updates its parameters based on the event data in the current time window. As a proof of concept, we present two case studies in which our method is used for different types of spatio-temporal data analysis, namely, the analysis of earthquake data and Twitter data. In an attempt to explain the frequency of events by the spatial location of the data source, both studies use the location as covariates of the sources.
Aggarwal, C.C., Han, J., Wang, J. and Yu, P.S. (2003). A framework for clustering evolving data streams, Proceedingsof the 29th International Conference on Very LargeData Bases, Berlin, Germany, pp. 81-92.
Allan, J., Papka, R. and Lavrenko, V. (1998). On-line new event detection and tracking, Proceedings of the 21st AnnualInternational ACM SIGIR Conference on Researchand Development in Information Retrieval (SIGIR 1998),Melbourne, Australia, pp. 37-45.
Amati, G., Amodeo, G. and Gaibisso, C. (2012). Survival analysis for freshness in microblogging search, Proceedingsof the 21st ACM International Conference on Informationand Knowledge Management (CIKM-2012), Maui,HI, USA, pp. 2483-2486.
Amodeo, G., Blanco, R. and Brefeld, U. (2011). Hybrid models for future event prediction, Proceedings of the 20th ACMInternational Conference on Information and KnowledgeManagement (CIKM-2011), Glasgow, UK, pp. 1981-1984.
Babcock, B., Babu, S., Datar, M., Motwani, R. and Widom, J. (2002). Models and issues in data stream systems, Proceedings of the 21st ACM SIGACT-SIGMOD-SIGARTSymposium on Principles of Database Systems, Madison,WI, USA, pp. 1-16.
Beringer, J. and H¨ullermeier, E. (2006). Online clustering of parallel data streams, Data and Knowledge Engineering58(2): 180-204.
Bottou, L. (1998). Online algorithms and stochastic approximations, in D. Saad (Ed.), Online Learningand Neural Networks, Cambridge University Press, Cambridge.
Chen, G.,Wu, X. and Zhu, X. (2005). Sequential pattern mining in multiple streams, Proceedings of the 5th IEEE InternationalConference on Data Mining (ICDM), Houston, TX,USA, pp. 585-588.
Cheon, S.-P., Kim, S., Lee, S.-Y. and Lee, C.-B. (2009). Bayesian networks based rare event prediction with sensor data, Knowledge-Based Systems 22(5): 336-343.
Cherniack, M., Balakrishnan, H., Balazinska, M., Carney, D., Cetintemel, U., Xing, Y. and Zdonik, S. (2003). Scalable distributed stream processing, Proceedings of CIDR-03:1st Biennial Conference on Innovative Database Systems,Asilomar, CA, USA.
Considine, J., Li, F., Kollios, G. and Byers, J. (2004). Approximate aggregation techniques for sensor databases, ICDE-04: 20th IEEE International Conference on DataEngineering, Boston, MA, USA, pp. 449-460.
Cormode, G. and Muthukrishnan, S. (2005). What’s hot and what’s not: Tracking most frequent items dynamically, ACM Transactions on Database Systems 30(1): 249-278.
Cox, D. (1972). Regression models and life tables, Journal ofthe Royal Statistical Society B 34(2): 187-220.
Cox, D. R. and Oakes, D. (1984). Analysis of Survival Data, Chapman & Hall, London.
Das, A., Gehrke, J. and Riedewald, M. (2003). Approximate join processing over data streams, Proceedings of the 2003 ACM SIGMOD International Conference on Managementof Data, San Diego, CA, USA, pp. 40-51.
Domingos, P. and Hulten, G. (2003). A general framework for mining massive data streams, Journal of Computationaland Graphical Statistics 12(4): 945-949.
Gaber, M.M., Zaslavsky, A. and Krishnaswamy, S. (2005). Mining data streams: A review, ACM SIGMOD Record34(1): 18-26.
Gama, J. (2012). A survey on learning from data streams: Current and future trends, Progress in Artificial Intelligence1(1): 45-55.
Gama, J. and Gaber, M.M. (2007). Learning from Data Streams, Springer-Verlag, Berlin/New York, NY.
Garofalakis, M., Gehrke, J. and Rastogi, R. (2002). Querying and mining data streams: You only get one look, Proceedingsof the 2002 ACM SIGMOD International Conferenceon Management of Data, Madison, WI, USA, pp. 635-635.
Golab, L. and Tamer, M. (2003). Issues in data stream management, ACM SIGMOD Record 32(2): 5-14.
Hulten, G., Spencer, L. and Domingos, P. (2001). Mining time-changing data streams, Proceedings of the 7th ACMSIGKDD International Conference on Knowledge Discoveryand Data Mining, San Francisco, CA, USA, pp. 97-106.
Ikonomovska, E., Gama, J. and Dzeroski, S. (2011). Learning model trees from evolving data streams, Data Mining andKnowledge Discovery 23(1): 128-168.
Krizanovic, K., Galic, Z. and Baranovic, M. (2011). Data types and operations for spatio-temporal data streams, IEEEInternational Conference on Mobile Data Management(MDM), Lule°a, Sweden, pp. 11-14.
Li, R., Lei, K.H., Khadiwala, R. and Chang, K.C.-C. (2012). Tedas: A twitter-based event detection and analysis system, Proceedings of the IEEE 28th International Conferenceon Data Engineering (ICDE 2012), Washington,DC, USA, pp. 1273-1276.
Oliveira, M. and Gama, J. (2012). A framework to monitor clusters evolution applied to economy and finance problems, Intelligent Data Analysis 16(1): 93-111.
Radinsky, K. and Horvitz, E. (2013). Mining the web to predict future events, Proceedings of the 6th ACM InternationalConference on Web Search and Data Mining(WSDM 2013), Rome, Italy, pp. 255-264.
Sakaki, T., Okazaki, M. and Matsuo, Y. (2013). Tweet analysis for real-time event detection and earthquake reporting system development, IEEE Transactions on Knowledgeand Data Engineering 25(4): 919-931.
Weng, J. and Lee, B.-S. (2011). Event detection in twitter, Proceedingsof the 5th International Conference on Weblogsand Social Media (ICWSM 2011), Barcelona, Spain.
Yang, Y., Pierce, T. and Carbonell, J.G. (1998). A study of retrospective and on-line event detection, Proceedings ofthe 21st Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval (SIGIR1998), Melbourne, Australia, pp. 28-36.
Zadeh, L. (1965). Fuzzy sets, Information and Control8(3): 338-353.
Zupan, B., Demˇsar, J., Kattan, M.W., Beck, J.R. and Bratko, I. (2000). Machine learning for survival analysis: A case study on recurrence of prostate cancer, Artificial Intelligencein Medicine 20(1): 59-75.