Data Organisation and Process Design Based on Functional Modularity for a Standard Production Process

Open access

Abstract

We propose to use the principles of functional modularity to cope with the essential complexity of statistical production processes. Moving up in the direction of international statistical production standards (GSBPM and GSIM), data organisation and process design under a combination of object-oriented and functional computing paradigms are proposed. The former comprises a standardised key-value pair abstract data model where keys are constructed by means of the structural statistical metadata of the production system. The latter makes extensive use of the principles of functional modularity (modularity, data abstraction, hierarchy, and layering) to design production steps. We provide a proof of concept focusing on an optimisation approach to selective editing applied to real survey data in standard production conditions at the Spanish National Statistics Institute. Several R packages have been prototyped implementing these ideas. We also share diverse aspects arising from the practicalities of the implementation.

Arbués, I., P. Revilla, and D. Salgado. 2013. “An optimization approach to selective editing.” Journal of Official Statistics 29: 489–510. Doi: http://dx.doi.org/10.2478/jos-2013-0037.

Boehm, B. 1988. “A spiral model of software development and enhancement.” IEEE Computer 21(5): 61–72. Doi: http://dx.doi.org/10.1145/12944.12948.

Booch, G., R.A. Maksimchuk, M.W. Eagle, B.J. Young, J. Conallen, and K.A. Houston. 2007. Object-oriented Analysis and Design with Applications. Addison-Wesley.

Chambers, J.M. 2008. Software for Data Analysis. Springer.

DDI Alliance. 2018. Data Documentation Initiative 2018. Available at https://www.ddialliance.org/ (accessed November 05, 2018).

De Waal, T., J. Pannekoek, and S. Scholtus. 2011. Handbook of Statistical Data Editing and Imputation. Wiley.

Dowle, M. and A. Srinivasan. 2016. data.table: Extension of ‘data.frame’. Available at https://CRAN.R-project.org/package=data.table. R package version 1.10.0.

Esteban, E., S. Saldaña, and D. Salgado. 2017a. RepoTime: Implementation of a notation for time intervals. Available at https://github.com/david-salgado/RepoTime. R package version 0.2.2.

Esteban, E., S. Saldaña, and D. Salgado. 2017b. StQ: Tools to manage metadata-incorporated keyvalue pair datasets. Available at https://github.com/david-salgado/StQ. R package version 0.4.34.

Esteban, E., S. Saldaña, and D. Salgado. 2017c. RepoReadWrite: Read and write files from/to the microdata repository. Available at https://github.com/david-salgado/RepoReadWrite. R package version 0.4.5.

Esteban, E., S. Saldaña, and D. Salgado. 2017d. RepoUtils: Implementation of tools to map and work with repositories. Available at https://github.com/david-salgado/RepoUtils. R package version 0.1.2.

Esteban, E., S. Saldaña, and D. Salgado. 2017e. contObsPredModelParam: Class and methods for the parameters of a continuous observation- prediction model. Available at https://github.com/david-salgado/contObsPredModelParam. R package version 0.0.1.

Esteban, E., S. Saldaña, and D. Salgado. 2017f. StQPrediction: Define S4 classes and methods to make predictions. Available at https://github.com/david-salgado/StQPrediction. R package version 0.0.1.

Esteban, E., S. Saldaña, and D. Salgado. 2017g. StQImputation: Classes and methods to implement different imputation methods upon StQ objects. Available at https://github.com/david-salgado/StQImputation. R package version 0.0.1.

Esteban, E., S. Saldaña, and D. Salgado. 2017h. SelEditErrorMoment: Compute the conditional measurement error moments under the optimization approach to selective editing. Available at https://github.com/david-salgado/SelEditErrorMoment. R package version 0.0.1.

Esteban, E., S. Saldaña, and D. Salgado. 2017i. SelEditFunctions: Functions for selective editing. Available at https://github.com/david-salgado/SelEditFunctions. R package version 0.0.1.

Esteban, E., S. Saldaña, and D. Salgado. 2017j. SelEditUnitPriorit: Classes and methods to implement unit prioritization. Available at https://github.com/david-salgado/SelEditUnitPriorit. R package version 0.0.1.

Esteban, E., S. Saldaña, and D. Salgado. 2017k. TSPred: Point and std prediction of time series. Available at https://github.com/elisa-esteban/TSPred. R package version 0.2.5.

Esteban, E., S. Saldaña, and D. Salgado. 2017l. BestTSPred: Construction of objects of class BestTSPredParam. Available at https://github.com/elisa-esteban/BestTSPred. R package version 0.0.1.

Esteban, E., S. Saldaña, and D. Salgado. 2017m. Software implementation of optimization-based selective editing techniques at Statistics Spain (INE). UNECE Work Session on Statistical Data Editing. The Hague, 24–26 April 2017. Available at https://www.unece.org/fileadmin/DAM/stats/documents/ece/ces/ge.44/2017/mtg2/Paper_19_StatSpain.pdf (accessed November 05, 2018).

Eurostat. 2014a. ESS Vision 2020. Available at http://ec.europa.eu/eurostat/web/ess/about-us/ess-vision-2020.

Eurostat. 2014b. Vision 2020 Implementation Portfolio. Available at http://ec.europa.eu/eurostat/web/ess/about-us/ess-vision-2020/implementation-portfolio.

HLG-MOS. 2011. “High-Level Group for the Modernisation of Official Statistics. Strategic vision of the High-Level Group for strategic developments in business architecture in Statistics.” Conference of European Statisticians Geneva. 59th Plenary Session. 14–16 June, 2011. Working Paper 1. Available at https://www.unece.org/fileadmin/DAM/stats/documents/ece/ces/2011/1.e.pdf.

HLG-MOS. 2017. High-Level Group for the Modernisation of Official Statistics. UN-ECE Statistics Wikis. Available at http://www1.unece.org/stat/platform/display/hlgbas/High-Level+Group+for+the+Modernisation+of+Official+Statistics.

Informal Task Force on Metadata Flows. 2013. “Metadata flows in the GSBPM.” Work Session on Statistical Metadata. Geneva, 6–8 May, 2013. Working Paper 22. Available at https://www.unece.org/fileadmin/DAM/stats/documents/ece/ces/ge.40/2013/WP22.pdf (accessed November 05, 2018).

Lundell, L.-G. 2013. Framework of metadata requirements and roles in the SDWH. ESSnet on microdata linking and data warehousing in production of business statistics. Deliverable 1.1. Available at https://ec.europa.eu/eurostat/cros/content/dwh-sga2-wp1-11-metadata-framework-statistical-data-warehousing-v112-final_en.

Palmquist, M.S., M.A Lapham, S. Miller, T. Chick, and I. Ozkaya. 2013. Parallel worlds: agile and waterfall differences and similarities. Technical Note CMU/SEI-2013-TN-021. Software Engineering Institute. Carnegie Mellon University. Available at http://repository.cmu.edu/cgi/viewcontent.cgi?article=1761&context=sei.

Pearson, J.W., S. Olver, and M.A. Porter. 2017. “Numerical methods for the computation of the confluent and Gauss hypergeometric functions.” Numerical Algorithms 74: 821–866. Doi: http://dx.doi.org/10.1007/s11075-016-0173-0.

R Core Team. 2012. R: A language and environment for statistical computing. R Foundation for Statistical Computing. Available at http://www.R-project.org.

Saltzer, J.H. and M.F. Kaashoek. 2009. “Principles of computer system design: An Introduction. Morgan Kaufmann, 2009. ISBN: 978-0-12-374957-4.

Sanguiao, L. 2017. Transformation of Standard Questionnaires. Available at https://github.com/Luis-Sanguiao/StQT. R package version 0.1.0.9000.

UNECE. 2013a. Generic Statistical Business Process Model. Version 5.0. Available at http://www1.unece.org/stat/platform/display/metis/The+Generic+Statistical+Business+Process+Model.

UNECE. 2013b. Generic Statistical Information Model. Version 1.1. Available at https://statswiki.unece.org/display/gsim/Generic+Statistical+Information+Model.

UNECE. 2015. Generic Statistical Data Editing Models. Version 1.0. Available at https://statswiki.unece.org/display/kbase/GSDEMs.

UNECE. 2017a. Statistical Data Editing Work Sessions. Available at http://www1.unece.org/stat/platform/display/kbase/UNECE+Work+Sessions+on+Statistical+Data+Editing.

UNECE. 2017b. Capabilities and Communication Group. Available at http://www1.unece.org/stat/platform/display/MCOOFE/Capabilities+and+Communication+ Group%3A+Home.

Van der Loo, M. 2015. A formal typology of data validation functions. UNECE Work Session on Statistical Data Editing. Budapest, 14–16 September 2015. https://www.unece.org/fileadmin/DAM/stats/documents/ece/ces/ge.44/2015/mtg1/WP_5_Netherlands_A_formal_typology_of_data_validation_functions.pdf (accessed November 05, 2018).

Van Roy, P. and S. Haridi. 2004. “Concepts, Techniques, and Models of Computer Programming.” MIT Press.

Weinberg, G.M. 2011. “An introduction to General Systems Thinking.” Weinberg and Weinberg. ISBN: 978-0-93-263349-1.

Wickham, H. 2014. “Tidy data.” Journal of Statistical Software 29(10): 1–23. Doi: http://dx.doi.org/10.18637/jss.v059.i10.

Journal of Official Statistics

The Journal of Statistics Sweden

Journal Information


IMPACT FACTOR 2017: 0.662
5-year IMPACT FACTOR: 1.113

CiteScore 2017: 0.74

SCImago Journal Rank (SJR) 2017: 1.158
Source Normalized Impact per Paper (SNIP) 2017: 0.860

Metrics

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 169 169 86
PDF Downloads 178 178 79