MapReduce and Semantics Enabled Event Detection using Social Media

Open access

Abstract

Social media is playing an increasingly important role in reporting major events happening in the world. However, detecting events from social media is challenging due to the huge magnitude of the data and the complex semantics of the language being processed. This paper proposes MASEED (MapReduce and Semantics Enabled Event Detection), a novel event detection framework that effectively addresses the following problems: 1) traditional data mining paradigms cannot work for big data; 2) data preprocessing requires significant human efforts; 3) domain knowledge must be gained before the detection; 4) semantic interpretation of events is overlooked; 5) detection scenarios are limited to specific domains. In this work, we overcome these challenges by embedding semantic analysis into temporal analysis for capturing the salient aspects of social media data, and parallelizing the detection of potential events using the MapReduce methodology. We evaluate the performance of our method using real Twitter data. The results will demonstrate the proposed system outperforms most of the state-of-the-art methods in terms of accuracy and efficiency.

[1] J. Wen and B. Lee, Event Detection in Twitter, In Proceedings of the 5th International AAAI Conference on Weblogs and Social Media, 2011, 401-408.

[2] T. Sakaki, M. Okazaki, and Y. Matsuo, Earthquake shakes Twitter users: real-time event detection by social sensors, In Proceedings of the 19th International Conference on World Wide Web, 2010, 851-860.

[3] Q. Zhao and P. Mitra, Event Detection and Visualization for Social Text Streams, In Proceedings of the International AAAI Conference on Weblogs and Social Media, 2007, 26-28.

[4] G. Kumaran and J. Allan, Text classification and named entities for new event detection, In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2004, 297-304.

[5] R. Parikh and K. Karlapalem, Et: events from tweets. In Proceedings of the 22nd International Conference on World Wide Web companion, 2013, 613-620.

[6] G. Fung, J. Yu, P. Yu, and H. Lu, Parameter free bursty events detection in text streams, In Proceedings of the 31st International Conference on Very Large Databases, 2005, 181-192.

[7] A. Guille and C. Favre, Mention-anomaly-based event detection and tracking in twitter, Advances in Social Networks Analysis and Mining(ASONAM), 2014, 375-382.

[8] X. Wang, F. Zhu, J. Jiang and S. Li, Real time event detection in twitter, In: Web-Age Information Management, Springer, Berlin Heidelberg, 2013, 502-513.

[9] A. Ritter, S. Clark and O. Etzioni, Named entity recognition in tweets: an experimental study, In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2011, 1524-1534.

[10] J. Kleinberg, Bursty and hierarchical structure in streams, Data Mining and Knowledge Discovery 7, no. 4, 2003, 373-397.

[11] PearAnalytics. Twitter study - august 2009, http://www.pearanalytics.com/wpcontent/uploads/2009/08/Twitter-Study-August-2009.pdf, 2009

[12] R. Li, S. Wang, H. Deng, R. Wang and K. Chang, Towards social user profiling: unified and discriminative influence model for inferring home locations. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2012, 1023-1031.

[15] F. Chen and D. Neill, Non-parametric scan statistics for event detection and forecasting in heterogeneous social media graphs. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2014, 1166-1175.

[16] S. Levine, How fast the news spreads through social media, In http://blog.sysomos.com/2011/05/02/how-fast-the-news-spreads-through-social-media/, 2012.

[17] J. Benhardus and J. Kalita, Streaming trend detection in twitter, International Journal of Web Based Communities 9, no. 1, 2013, 122-139.

[18] D, Shamma, L. Kennedy and E. Churchill, Peaks and persistence: modeling the shape of microblog conversations, In Proceedings of the ACM 2011 conference on Computer supported cooperative work, 2011, 355-358.

[19] J. Lau, N. Collier and T. Baldwin, On-line Trend Analysis with Topic Models:\# twitter Trends Detection Topic Model Online, In COLING, 2012, 1519-1534.

[20] T. Lappas, B. Arai, M. Platakis, D. Kotsakos and D. Gunopulos, On burstiness-aware search for document sequences, In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2009, 477-486.

[21] D. Gruhl, R. Guha, D. Liben-Nowell and A. Tomkins, Information diffusion through blogspace, In Proceedings of the 13th International Conference on World Wide Web, 2004, 491-501.

[22] Y. Hu, A. John, D. Seligmann and F. Wang, What Were the Tweets About? Topical Associations between Public Events and Twitter Feeds, In: ICWSM, 2012.

[23] C. Li, A. Sun and A. Datta. Twevent: segment-based event detection from tweets, In Proceedings of the 21st ACM International Conference on Information and Knowledge Management, 2012, 155-164.

[24] A. Kaplan and M. Haenlein, The early bird catches the news: Nine things you should know about micro-blogging. Business Horizons 54, no. 2, 2011, 105-113.

[25] Y. Teh, M. Jordan, M. Beal and D. Blei, Hierarchical dirichlet processes, Journal of the American Statistical Association 101, no. 476, 2006

Journal of Artificial Intelligence and Soft Computing Research

The Journal of Polish Neural Network Society, the University of Social Sciences in Lodz & Czestochowa University of Technology

Journal Information

CiteScore 2017: 5.00

SCImago Journal Rank (SJR) 2017: 0.492
Source Normalized Impact per Paper (SNIP) 2017: 2.813

Metrics

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 130 130 22
PDF Downloads 58 58 13