Decision-Making Enhancement in a Big Data Environment: Application of the K-Means Algorithm to Mixed Data

[1] Ahmed Abbasi, Suprateek Sarker, and Roger HL Chiang. Big data research in information systems: Toward an inclusive research agenda. Journal of the Association for Information Systems, 17(2):I, 2016.10.17705/1jais.00423Search in Google Scholar

[2] Ritu Agarwal and Vasant Dhar. Big data, data science, and analytics: The opportunity and challenge for is research. Information Systems Research, 25(3):443–448, 2014.10.1287/isre.2014.0546Open DOI Search in Google Scholar

[3] Amir Ahmad and Lipika Dey. A k-mean clustering algorithm for mixed numeric and categorical data. Data & Knowledge Engineering, 63(2):503–527, 2007.10.1016/j.datak.2007.03.016Open DOI Search in Google Scholar

[4] Pavel Berkhin. A survey of clustering data mining techniques. In Grouping multidimensional data, pages 25–71. Springer, 2006.10.1007/3-540-28349-8_2Search in Google Scholar

[5] Xiao Cai, Feiping Nie, and Heng Huang. Multi-view k-means clustering on big data. In Twenty-Third International Joint conference on artificial intelligence, 2013.Search in Google Scholar

[6] Xiaoli Cui, Pingfei Zhu, Xin Yang, Keqiu Li, and Changqing Ji. Optimized big data k-means clustering using mapreduce. The Journal of Supercomputing, 70(3):1249–1259, 2014.10.1007/s11227-014-1225-7Search in Google Scholar

[7] Kenneth Cukier and Viktor Mayer-Schoenberger. The rise of big data: How it’s changing the way we think about the world. Foreign Aff., 92:28, 2013.Search in Google Scholar

[8] Jeffrey Dean and Sanjay Ghemawat. Mapreduce: simplified data processing on large clusters. Communications of the ACM, 51(1):107–113, 2008.10.1145/1327452.1327492Search in Google Scholar

[9] Yuri Demchenko, Canh Ngo, and Peter Membrey. Architecture framework and components for the big data ecosystem. Journal of System and Network Engineering, pages 1–31, 2013.10.1109/CTS.2014.6867550Search in Google Scholar

[10] Dany Di Tullio and D Sandy Staples. The governance and control of open source software projects. Journal of Management Information Systems, 30(3):49–80, 2013.10.2753/MIS0742-1222300303Search in Google Scholar

[11] Gal Engelberg, Oded Koren, and Nir Perel. Big data performance evaluation analysis using apache pig. International Journal of Software Engineering and Its Applications, 10(11):429–440, 2016.10.14257/ijseia.2016.10.11.34Search in Google Scholar

[12] Johann Füller, Katja Hutter, Julia Hautz, and Kurt Matzler. User roles and contributions in innovation-contest communities. Journal of Management Information Systems, 31(1):273–308, 2014.10.2753/MIS0742-1222310111Open DOI Search in Google Scholar

[13] Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The google file system. In Proceedings of the 19th ACM Symposium on Operating Systems Principles, pages 20–43, Bolton Landing, NY, 2003.10.1145/945445.945450Search in Google Scholar

[14] Shanshan Guo, Xitong Guo, Yulin Fang, and Doug Vogel. How doctors gain social and economic returns in online health-care communities: a professional capital perspective. Journal of Management Information Systems, 34(2):487–519, 2017.10.1080/07421222.2017.1334480Search in Google Scholar

[15] Bock Hans-Hermann. Origins and extensions of the k-means algorithm in cluster analysis. Journal Electronique dHistoire des Probabilités et de la Statistique Electronic Journal for History of Probability and Statistics, 4:48–49, 2008.Search in Google Scholar

[16] Doug Henschen. Why sears is going all-in on hadoop. Information week. Retrieved July, 1:2014, 2012.Search in Google Scholar

[17] Joshua Zhexue Huang, Michael K Ng, Hongqiang Rong, and Zichen Li. Automated variable weighting in k-means type clustering. IEEE Transactions on Pattern Analysis & Machine Intelligence, (5):657–668, 2005.10.1109/TPAMI.2005.9515875789Open DOI Search in Google Scholar

[18] Zhexue Huang. Extensions to the k-means algorithm for clustering large data sets with categorical values. Data mining and knowledge discovery, 2(3):283–304, 1998.10.1023/A:1009769707641Search in Google Scholar

[19] Cisco Visual Networking Index. The zettabyte era–trends and analysis. Cisco white paper, 2013.Search in Google Scholar

[20] Anil K Jain. Data clustering: 50 years beyond k-means. Pattern recognition letters, 31(8):651–666, 2010.10.1016/j.patrec.2009.09.011Search in Google Scholar

[21] Tapas Kanungo, David M Mount, Nathan S Netanyahu, Christine D Piatko, Ruth Silverman, and Angela Y Wu. An efficient k-means clustering algorithm: Analysis and implementation. IEEE Transactions on Pattern Analysis & Machine Intelligence, (7):881–892, 2002.10.1109/TPAMI.2002.1017616Open DOI Search in Google Scholar

[22] Daniel Kendal, Oded Koren, and Nir Perel. Pig vs. hive use case analysis. International Journal of Database Theory and Application, 9(12):267–276, 2016.10.14257/ijdta.2016.9.12.24Search in Google Scholar

[23] Oded Koren, Carina Antonia Hallin, Nir Perel, and Dror Bendet. Enhancement of the k-means algorithm for mixed data in big data platforms. In Proceedings of SAI Intelligent Systems Conference, pages 1025–1040. Springer, 2018.10.1007/978-3-030-01054-6_71Search in Google Scholar

[24] Sara Landset, Taghi M Khoshgoftaar, Aaron N Richter, and Tawfiq Hasanin. A survey of open source tools for machine learning with big data in the hadoop ecosystem. Journal of Big Data, 2(1):24, 2015.10.1186/s40537-015-0032-1Search in Google Scholar

[25] James Manyika, Michael Chui, Brad Brown, Jacques Bughin, Richard Dobbs, Charles Roxburgh, and Angela H Byers. Big data: The next frontier for innovation, competition, and productivity. McKinsey Global Institute, 2011.Search in Google Scholar

[26] R Angelin Preethi and J Elavarasi. Big data analytics using hadoop tools, pache hive vs apache pig. International Journal of Emerging Technology in Computer Science & Electronics, 24(3), 2017.Search in Google Scholar

[27] Arun Rai. Editor’s comments: Synergies between big data and theory. MIS quarterly, 40(2):iii–ix, 2016.Search in Google Scholar

[28] Henri Ralambondrainy. A conceptual version of the k-means algorithm. Pattern Recognition Letters, 16(11):1147–1157, 1995.10.1016/0167-8655(95)00075-RSearch in Google Scholar

[29] Alok R Saboo, V Kumar, and Insu Park. Using big data to model time-varying effects for marketing resource (re) allocation. MIS Quarterly, 40(4), 2016.10.25300/MISQ/2016/40.4.06Search in Google Scholar

[30] Ohn Mar San, Van-Nam Huynh, and Yoshiteru Nakamori. An alternative extension of the k-means algorithm for clustering categorical data. International Journal of Applied Mathematics and Computer Science, 14:241–247, 2004.Search in Google Scholar

[31] Prasanna Tambe. Big data investment, skills, and firm value. Management Science, 60(6):1452–1469, 2014.10.1287/mnsc.2014.1899Open DOI Search in Google Scholar

[32] Tom White. Hadoop: The definitive guide. O’Reilly Media, Inc., 2012.Search in Google Scholar

[33] Rui Xu and Donald C Wunsch. Survey of clustering algorithms. IEEE Transactions on Neural Networks, 16(3):645–678, 2005.10.1109/TNN.2005.84514115940994Open DOI Search in Google Scholar

eISSN:: 2083-2567
Language:: English

Publication timeframe:: 4 times per year
Journal Subjects:: Computer Sciences, Databases and Data Mining, Artificial Intelligence

Journal RSS Feed

Decision-Making Enhancement in a Big Data Environment: Application of the K-Means Algorithm to Mixed Data

Published Online: Aug 30, 2019

Page range: 293 - 302

Received: May 08, 2019

Accepted: Jul 25, 2019

DOI: https://doi.org/10.2478/jaiscr-2019-0010

KeywordsBig data, mixed data, Hadoop, K-means, decision making

© 2019 Oded Koren et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Keywords
Big data, mixed data, Hadoop, K-means, decision making