Open Access

Parallelizing user–defined functions in the ETL workflow using orchestration style sheets

International Journal of Applied Mathematics and Computer Science's Cover Image
International Journal of Applied Mathematics and Computer Science
Exploring Complex and Big Data (special section, pp. 7-91), Johann Gamper, Robert Wrembel (Eds.)

Cite

Ali, S.M.F. (2018). Next-generation ETL framework to address the challenges posed by big data, Workshop Proceedings of the EDBT/ICDT Joint Conference, Vienna, Austria.Search in Google Scholar

Ali, S.M.F. and Wrembel, R. (2017). From conceptual design to performance optimization of ETL workflows: Current state of research and open problems, The VLDB Journal26(6): 1–25.10.1007/s00778-017-0477-2Search in Google Scholar

Aßmann, U. (2003). Invasive software composition, Invasive Software Composition, Springer, Berlin/Heidelberg, pp. 107–145.10.1007/978-3-662-05082-8_4Search in Google Scholar

Battré, D., Ewen, S., Hueske, F., Kao, O., Markl, V. and Warneke, D. (2010). Nephele/PACTs: A programming model and execution framework for web-scale analytical processing, Proceedings of the Symposium on Cloud Computing, Indianapolis, IN, USA, pp. 119–130.10.1145/1807128.1807148Search in Google Scholar

Chaiken, R., Jenkins, B., Larson, P.-Å., Ramsey, B., Shakib, D., Weaver, S. and Zhou, J. (2008). Scope: Easy and efficient parallel processing of massive data sets, Proceedings of the VLDB Endowment1(2): 1265–1276.10.14778/1454159.1454166Search in Google Scholar

Cloudera (2016). Example: Sentiment analysis using MapReduce custom counters, https://www.cloudera.com/documentation/other/tutorial/CDH5/topics/ht_example_4_sentiment_analysis.html.Search in Google Scholar

Dagum, L. and Menon, R. (1998). OpenMP: An industry standard API for shared-memory programming, IEEE Computational Science and Engineering5(1): 46–55.10.1109/99.660313Search in Google Scholar

Dean, J. and Ghemawat, S. (2008). MapReduce: Simplified data processing on large clusters, Communications of the ACM51(1) 107–113.10.1145/1327452.1327492Search in Google Scholar

Ekman, T. and Hedin, G. (2007). The JastAdd system modular extensible compiler construction, Science of Computer Programming69(1–3): 14–26.10.1016/j.scico.2007.02.003Search in Google Scholar

Ghazal, A., Rabl, T., Hu, M., Raab, F., Poess, M., Crolotte, A. and Jacobsen, H.-A. (2013). Bigbench: Towards an industry standard benchmark for big data analytics, Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, New York, NY, USA, pp. 1197–1208.10.1145/2463676.2463712Search in Google Scholar

González-Vélez, H. and Kontagora, M. (2011). Performance evaluation of MapReduce using full virtualisation on a departmental cloud, International Journal of Applied Mathematics and Computer Science21(2): 275–284, DOI: 10.2478/v10006-011-0020-3.10.2478/v10006-011-0020-3Open DOISearch in Google Scholar

Große, P., May, N. and Lehner, W. (2014). A study of partitioning and parallel UDF execution with the SAP HANA database, Proceedings of the 26th International Conference on Scientific and Statistical Database Management, Aalborg, Denmark, p. 36.10.1145/2618243.2618274Search in Google Scholar

Hedin, G. (2000). Reference attributed grammars, Informatica (Slovenia)24(3): 301–317.Search in Google Scholar

Karagiannis, A., Vassiliadis, P. and Simitsis, A. (2013). Scheduling strategies for efficient ETL execution, Information Systems38(6): 927–945.10.1016/j.is.2012.12.001Search in Google Scholar

Karol, S. (2015). Well-formed and Scalable Invasive Software Composition, PhD dissertation, Technische Universitat Dresden, Dresden.Search in Google Scholar

Kiczales, G., Lamping, J., Mendhekar, A., Maeda, C., Lopes, C., Loingtier, J.-M. and Irwin, J. (1997). Aspect-oriented programming, in M. Akşit and S. Matsuoka (Eds.), European Conference on Object-oriented Programming, Springer, Berlin/Heidelberg, pp. 220–242.10.1007/BFb0053381Search in Google Scholar

Kumar, N. and Kumar, P.S. (2010). An efficient heuristic for logical optimization of ETL workflows, International Workshop on Business Intelligence for the Real-Time Enterprise, Singapore, Singapore, pp. 68–83.10.1007/978-3-642-22970-1_6Search in Google Scholar

Liu, X., Thomsen, C. and Pedersen, T.B. (2013). ETLMR: A highly scalable dimensional etl framework based on MaprEduce, in A. Hameurlain et al. (Eds.), Transactions on Large-Scale Data-and Knowledge-Centered Systems VIII, Springer, Berlin/Heidelberg, pp. 1–31.10.1007/978-3-642-37574-3_1Search in Google Scholar

Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S. and McClosky, D. (2014). The Stanford CoreNLP natural language processing toolkit, Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Baltimore, MD, USA, pp. 55–60.10.3115/v1/P14-5010Search in Google Scholar

Mey, J., Karol, S., Aßmann, U., Huismann, I., Stiller, J. and Fröhlich, J. (2016). Using semantics-aware composition and weaving for multi-variant progressive parallelization, Procedia Computer Science80: 1554–1565.10.1016/j.procs.2016.05.482Search in Google Scholar

Nambiar, R.O. and Poess, M. (2006). The making of TPC-DS, Proceedings of the 32nd International Conference on Very Large Data Bases, Seoul, Korea, pp. 1049–1058.Search in Google Scholar

Simitsis, A., Vassiliadis, P. and Sellis, T. (2005). State-space optimization of ETL workflows, IEEE Transactions on Knowledge and Data Engineering17(10): 1404–1419.10.1109/TKDE.2005.169Search in Google Scholar

Simitsis, A., Wilkinson, K., Dayal, U. and Castellanos, M. (2010). Optimizing ETL workflows for fault-tolerance, IEEE 26th International Conference on Data Engineering (ICDE), Long Beach, CA, USA, pp. 385–396.10.1109/ICDE.2010.5447816Search in Google Scholar

Thomsen, C. and Pedersen, T.B. (2011). Easy and effective parallel programmable ETL, Proceedings of the ACM 14th International Workshop on Data Warehousing and OLAP, New York, NY, USA, pp. 37–44.10.1145/2064676.2064684Search in Google Scholar

Tziovara, V., Vassiliadis, P. and Simitsis, A. (2007). Deciding the physical implementation of ETL workflows, Proceedings of the International Workshop on Data Warehousing and OLAP, New York, NY, USA, pp. 49–56.10.1145/1317331.1317341Search in Google Scholar

Vassiliadis, P., Simitsis, A. and Baikousi, E. (2009). A taxonomy of ETL activities, Proceedings of the ACM 12th International Workshop on Data Warehousing and OLAP, New York, NY, USA, pp. 25–32.10.1145/1651291.1651297Search in Google Scholar

Weinberg, A.I. and Last, M. (2017). Interpretable decision-tree induction in a big data parallel framework, International Journal of Applied Mathematics and Computer Science27(4): 737–748, DOI: 10.1515/amcs-2017-0051.10.1515/amcs-2017-0051Open DOISearch in Google Scholar

eISSN:
2083-8492
Language:
English
Publication timeframe:
4 times per year
Journal Subjects:
Mathematics, Applied Mathematics