Big Data systems manage and process huge volumes of data constantly generated by various technologies in a myriad of formats. Big Data advocates (and preachers) have claimed that, relative to classical, relational/SQL Data Base Management Systems, Big Data technologies such as NoSQL, Hadoop and in-memory data stores perform better. This paper compares data processing performance of two systems belonging to SQL (PostgreSQL/Postgres XL) and Big Data (Hadoop/Hive) camps on a distributed five-node cluster deployed in cloud. Unlike benchmarks in use (YCSB, TPC), a series of R modules were devised for generating random non-aggregate queries on different subschema (with increasing data size) of TPC-H database. Overall performance of the two systems was compared. Subsequently a number of models were developed for relating performance on the system and also on various query parameters such as the number of attributes in SELECT and WHERE clause, number of joins, number of processing rows etc.
Cogean D. I. Fotache M. and Greavu-Serban V. 2013. NoSQL in Higher Education. A Case Study. In C. Boja L. Batagan M. Doinea C. Ciurea P. Pocatilu A. Ion R. Magos L. Cotfas A. Velicanu C. Amancei M. Andreica and A. Zamfiroiu (Eds.) International Conference on Informatics in Economy (pp. 352-360). Bucharest: Bucharest Univ Economic Studies-Ase.
Cooper B. F. Silberstein A. Tam E. Ramakrishnan R. and Sears R. 2010. Benchmarking cloud serving systems with YCSB. Paper presented at the 1st ACM symposium on Cloud computing (published in the Proceedings) Indianapolis Indiana USA. doi:
Cooper, B. F., Silberstein, A., Tam, E., Ramakrishnan, R., and Sears, R., 2010. Benchmarking cloud serving systems with YCSB. Paper presented at the 1st ACM symposium on Cloud computing (published in the Proceedings), Indianapolis, Indiana, USA. doi: http://dx.doi.org/10.1145/1807128.1807152)| false
Faraway J. 2015. Linear Models with R (2nd ed. ed.). Boca Raton FL: CRC Press.
Fotache M. and Hrubaru I. 2016. Big Data Technology on Medium-Sized Data. Preliminary Results for Non-Aggregate Queries. In C. Boja M. Doinea C. Ciurea P. Pocatilu L. Batagan A. Velicanu M. E. Popescu I. Manafi A. Zamfiroiu and M. Zurini (Eds.) International Conference on Informatics in Economy Ie 2016: Education Research & Business Technologies (pp. 273-278). Bucharest: Bucharest Univ Economic Studies-Ase.
Fotache M. Strimbei C. Hrubaru I. and Cogean D. I. 2014. Scratching Big Data Surface: Comparing Simple Queries in PostgreSQL and MongoDB. Paper presented at the 13th International Conference on Informatics in Economy - IE 2014 (published in the Proceedings) Bucharest Romania.
Fox J. 2003. Effect Displays in R for Generalised Linear Models. Journal of Statistical Software 8(15) 1-27. doi:
Fox J. 2016. Applied Regression Analysis and Generalized Linear Models (3rd ed. ed.). Thousand Oaks CA: Sage.
Fox J. and Weisberg S. 2011. An R Companion to Applied Regression (2nd ed. ed.). Thousand Oaks CA: Sage.
Giraudoux P. 2016. pgirmess: Data Analysis in Ecology. R package version 1.6.5. Retrieved from https://CRAN.R-project.org/package=pgirmess
Gross J. and Ligges U. 2015. nortest: Tests for Normality. R package version 1.0-4. Retrieved from https://CRAN.R-project.org/package=nortest
Hothorn T. and Hornik K. 2015. exactRankTests: Exact Distributions for Rank and Permutation Tests. R package version 0.8-28. Retrieved from https://cran.r-project.org/package=exactRankTests
Hrubaru I. and Fotache M. 2015. On a Hadoop Cliche: Physical and Logical Models Separation. In C. Boja M. Doinea C. Ciurea P. Pocatilu L. Batagan A. Ion V. Diaconita M. Andreica C. Delcea A. Zamfiroiu M. Zurini and O. Popescu (Eds.) Proceedings of the 14th International Conference on Informatics in Economy (pp. 357-363). Bucharest: Bucharest Univ Economic Studies-Ase.
Jacobs A. 2009. The pathologies of big data. Communications of the ACM 52(8) 36-44. doi:
Kowalczyk, M., and Buxmann, P., 2014. Big Data and Information Processing in Organizational Decision Processes. Business & Information Systems Engineering, 6(5), 267-278. doi: http://dx.doi.org/10.1007/s12599-014-0341-5)| false
Pinheiro J. Bates D. DebRoy S. Sarkar D. EISPACK authors Heisterkamp S. . . . R-core team 2016. nlme: Linear and Nonlinear Mixed Effects Models. R package version 3.1-128. Retrieved from http://CRAN.R-project.org/package=nlme
PostgresXL 2016. Postgres XL Overview. Retrieved 10 September 2016 from http://www.postgresxl.org/overview/
Sakr S. Liu A. and Fayoumi A. G. 2013. The family of mapreduce and large-scale data processing systems. ACM Computing Surveys 46(1) 1-44. doi:
Thusoo, A., Sarma, J. S., Jain, N., Shao, Z., Chakka, P., Anthony, S., . . . Murthy, R., 2009. Hive: a warehousing solution over a map-reduce framework. Proc. VLDB Endow., 2(2), 1626-1629. doi: http://dx.doi.org/10.14778/1687553.1687609)| false
Trancoso P. 2015. Moving to memoryland: in-memory computation for existing applications. Paper presented at the Proceedings of the 12th ACM International Conference on Computing Frontiers Ischia Italy. doi:
Trancoso, P., 2015. Moving to memoryland: in-memory computation for existing applications. Paper presented at the Proceedings of the 12th ACM International Conference on Computing Frontiers, Ischia, Italy. doi: http://dx.doi.org/10.1145/2742854.2742874)| false