The aim of selective editing is to identify observations affected by influential errors. A score function based on the impact of the potential error on target estimates is useful to prioritize observations for accurate reviewing. We assume a Gaussian model for true data and an “intermittent” error mechanism such that a proportion of data is contaminated by an additive Gaussian error. In this setting, scores can be related to the expected value of errors affecting data. Consequently, a set of units can be selected such that the expected residual error in data is below a prefixed threshold. In the context of economic surveys when positive variables are analyzed, the method is more realistically applied to logarithms of data instead of data in their original scale. The method is illustrated through an experimental study on real business survey data where contamination is simulated according to error mechanisms frequently encountered in the practical context of economic surveys.
If the inline PDF is not rendering correctly, you can download the PDF file here.
Buglielli M.T. Di Zio M. Guarnera U. and Pogelli F.R. (2011). Selective Editing of Business Survey Data Based on Contamination Models: an Experimental Application. NTTS 2011 New Techniques and Technologies for Statistics Brussels 22-24 February 2011.
Buglielli T. and Guarnera U. (2011). SeleMix: Selective Editing via Mixture models. R package version 0.8.1. Available at: http://cran.r-project.org/web/packages/SeleMix/ index.html (accessed October 9 2013).
De Waal T. Pannekoek J. and Scholtus S. (2011). Handbook of Statistical Data Editing and Imputation. New York: John Wiley and Sons.
Ghosh-Dastidar B. and Schafer J.L. (2006). Outlier Detection and Editing Procedures for Continuous Multivariate Data. Journal of Official Statistics 22 487-506.
Granquist L. (1997). The New View on Editing. International Statistical Review 65 381-387.
Hedlin D. (2003). Score Functions to Reduce Business Survey Editing at the U.K. Office for National Statistics. Journal of Official Statistics 19 177-199.
Hedlin D. (2008). Local and Global Score Functions in Selective Editing. In Proceedings of UN/ECE Work Session on Statistical Data Editing 21-23 April Vienna. Available at: http://www.unece.org/fileadmin/DAM/stats/documents/2008/04/sde/wp.31.e.pdf
Jäder A. and Norberg A. (2005). A Selective Editing Method Considering both Suspicion and Potential Impact Developed and Applied to the Swedish Foreign Trade Statistics. In Proceedings of UN/ECE Work Session on Statistical Data Editing 16-18 May Ottawa. Available at: http://www.unece.org/stats/documents/2005.05.sde.htm (accessed October 9 2013).
Latouche M. and Berthelot J.M. (1992). Use of a Score Function to Prioritize and Limit Recontacts in Editing Business Surveys. Journal of Official Statistics 8 389-400.
Lawrence D. and McDavitt C. (1994). Significance Editing in the Australian Survey of Average Weekly Earnings. Journal of Official Statistics 10 437-447.
Lawrence D. and McKenzie R. (2000). The General Application of Significance Editing. Journal of Official Statistics 16 243-253.
Meng X.L. and Rubin D.B. (1993). Maximum Likelihood Estimation via the ECM Algorithm: a General Framework. Biometrika 80 267-278.
Norberg A. Adolfsson C. Arvidson G. Gidlund P. and Nordberg L. (2010). A General Methodology for Selective Data Editing. Stockholm: Statistics Sweden. Available at: http://gauss.stat.su.se/master/statdatabaser/HT10/Literature/SwedishEditingMethods. pdf (accessed October 9 2013).