Automated and Manual Data Editing: A View on Process Design and Methodology

Open access

Abstract

Data editing is arguably one of the most resource-intensive processes at NSIs. Forced by everincreasing budget pressure, NSIs keep searching for more efficient forms of data editing. Efficiency gains can be obtained by selective editing, that is, limiting the manual editing to influential errors, and by automating the editing process as much as possible. In our view, an optimal mix of these two strategies should be aimed for. In this article we present a decomposition of the overall editing process into a number of different tasks and give an upto- date overview of all the possibilities of automatic editing in terms of these tasks. During the design of an editing process, this decomposition may be helpful in deciding which tasks can be done automatically and for which tasks (additional) manual editing is required. Such decisions can be made a priori, based on the specific nature of the task, or by empirical evaluation, which is illustrated by examples. The decomposition in tasks, or statistical functions, also naturally leads to reuseable components, resulting in efficiency gains in process design.

Al Hamad, A., Lewis, D., and Silva, P.L.N. (2008). Assessing the Performance of the Thousand Pounds Automatic Editing Procedure at the ONS and the Need for an Alternative Approach. Working Paper No. 21, UN/ECE Work Session on Statistical Data Editing, Vienna. Available at: http://www.unece.org/stats/documents/2008.04. sde.html (accessed October 2013).

Bethlehem, J. (2009). Applied Survey Methods. Wiley series in survey methodology. New York: John Wiley & Sons, Inc.

Boskovitz, A. (2008). Data Editing and Logic: the Covering Set Method from the Perspective of Logic. Ph. D. thesis, Australian National University.

Camstra, A. and Renssen, R. (2011). Standard Process Steps Based on Standard Methods as Part of the Business Architecture. Proceedings of the 58th World Statistical Congress (Session STS044): International Statistical Institute, 1-10. Available at: http://2011.isiproceedings.org/ (accessed October 2013).

Damerau, F. (1964). A Technique for Computer Detection and Correction of Spelling Errors. Communications of the ACM, 7, 171-176.

De Jonge, E. and Van der Loo, M. (2011). Manipulation of Linear Edits and Error Localization with the Editrules Package. Technical Report 201120, Statistics Netherlands, The Hague. Available at: http://www.cbs.nl/en-GB/menu/methoden/ onderzoekmethoden/discussionpapers/archief/2011/default.htm (accessed October 2013).

De Jonge, E. and Van der Loo, M. (2012). Editrules: R Package for Parsing and Manipulating of Edit Rules and Error Localization, R package version 2.5. Available at: http://www.cbs.nl/en-GB/menu/methoden/onderzoekmethoden/discussionpapers/ archief/2012/default.htm (accessed October 2013).

De Waal, T. and Quere, R. (2003). A Fast and Simple Algorithm for Automatic Editing of Mixed Data. Journal of Official Statistics, 19, 383-402.

De Waal, T., Pannekoek, J., and Scholtus, S. (2011). Handbook of Statistical Data Editing and Imputation. Wiley handbooks in survey methodology. New York: John Wiley & Sons.

De Waal, T., Pannekoek, J., and Scholtus, S. (2012). The Editing of Statistical Data: Methods and Techniques for the Efficient Detection and Correction of Errors and Missing Values. Wiley Interdisciplinary Reviews: Computational Statistics, 4, 204-210. DOI: http://dx.doi.org/10.1002/wics.1194

Di Zio, M., Guarnera, U., and Luzi, O. (2005). Editing Systematic Unity Measure Errors Through Mixture Modelling. Survey Methodology, 31, 53-63.

Fellegi, I.P. and Holt, D. (1976). A Systematic Approach to Automatic Edit and Imputation. Journal of the American Statistical Association, 71, 17-35.

Granquist, L. and Kovar, J.G. (1997). Editing of Survey Data: How Much is Enough? In Survey Measurement and Process Quality. L. Lyberg, P. Biemer, M. Collins, E. de Leeuw, C. Dippo, N. Schwartz, and D. Trewin (eds). Wiley series in probability and statistics. New York: Wiley, 416-435.

Groves, R.M. (1989). Survey Errors and Survey Costs. Wiley series in survey probability and mathematical statistics. New York: John Wiley & Sons, Inc.

Hedlin, D. (2003). Score Functions to Reduce Business Survey Editing at the U.K. Office for National Statistics. Journal of Official Statistics, 19, 177-199.

Latouche, M. and Berthelot, J.-M. (1992). Use of a Score Function to Prioritize and Limit Recontacts in Editing Business Surveys. Journal of Official Statistics, 8, 389-400.

Lawrence, D. and McDavitt, C. (1994). Significance Editing in the Australian Survey of Average Weekly Earning. Journal of Official Statistics, 10, 437-447.

Lawrence, D. and McKenzie, R. (2000). The General Application of Significance Editing. Journal of Official Statistics, 16, 243-253.

Levenshtein, V.I. (1966). Binary Codes Capable of Correcting Deletions, Insertions, and Reversals. Soviet Physics Doklady, 10, 707-710.

Little, R.J.A. and Rubin, D.B. (2002). Statistical Analysis with Missing Data (second Edition). New York: John Wiley & Sons.

Pannekoek, J., Shlomo, N., and de Waal, T. (forthcoming). Calibrated Imputation of Numerical Data Under Linear Edit Restrictions. Annals of Applied Statistics.

Pannekoek, J. and Zhang, L.-C. (2011). Partial (donor) Imputation with Adjustments. Working Paper No. 40, UN/ECE Work Session on Statistical Data Editing, Ljubljana. Available at: http://www.unece.org/stats/documents/2011.05.sde.html (accessed October 2013).

Pannekoek, J. and Zhang, L.-C. (2012). On the General Flow of Editing. Working Paper No. 10, UN/ECE Work Session on Statistical Data Editing, Oslo. Available at: http://www.unece.org/stats/documents/2012.09.sde.html (accessed October 2013).

Scholtus, S. (2009). Automatic Correction of Simple Typing Errors in Numerical Data with Balance Edits. Technical Report 09046, Statistics Netherlands, The Hague. Available at: http://www.cbs.nl/en-GB/menu/methoden/onderzoekmethoden/discussionpapers/ archief/2009/default.htm (accessed October 2013).

Scholtus, S. (2011). Algorithms for Correcting Sign Errors and Rounding Errors in Business Survey Data. Journal of Official Statistics, 27, 467-490.

Scholtus, S. (2013). Automatic Editing with Hard and Soft Edits. Survey Methodology, 39, 59-89.

Scholtus, S. and Go¨ksen, S. (2012). Automatic Editing with Hard and Soft Edits - Some First Experiences. Technical Report 201225, Statistics Netherlands, The Hague. Available at: http://www.cbs.nl/en-GB/menu/methoden/onderzoekmethoden/discus sionpapers/archief/2012/default.htm (accessed October 2013).

UNECE Secretariat (2009). Generic Statistical Business Process Model version 4.0. Joint UNECE/Eurostat/OECD Work Session on Statistical Metadata.

Van der Loo, M. (2012). rspa: Adapt Numerical Records to Fit (in)Equality Restrictions with the Successive Projection Algorithm. R package version 0.1-1. Available at: http://cran.r-project.org/web/packages/rspa/index.html (accessed October 2013).

Van der Loo, M. and De Jonge, E. (2011). Deductive Imputation with the Deducorrect Package. Technical Report 201126, Statistics Netherlands, The Hague. Available at: http://www.cbs.nl/en-GB/menu/methoden/onderzoekmethoden/discussionpapers/ archief/2011/default.htm (accessed October 2013).

Van der Loo, M., De Jonge, E., and Scholtus, S. (2011). Deducorrect: Deductive Correction, Deductive Imputation, and Deterministic Correction. R package version 1.3-1. Available at: http://cran.r-proiect.org/web/packages/deducorrect/index.html (accessed October 2013).

Journal of Official Statistics

The Journal of Statistics Sweden

Journal Information


IMPACT FACTOR 2017: 0.662
5-year IMPACT FACTOR: 1.113

CiteScore 2017: 0.74

SCImago Journal Rank (SJR) 2017: 1.158
Source Normalized Impact per Paper (SNIP) 2017: 0.860

Metrics

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 285 285 25
PDF Downloads 105 105 11