Arnout van Delden, Boris Lorenc, Peter Struijs and Li-Chun Zhang
Arnout van Delden, Sander Scholtus and Joep Burger
Publications in official statistics are increasingly based on a combination of sources. Although combining data sources may result in nearly complete coverage of the target population, the outcomes are not error free. Estimating the effect of nonsampling errors on the accuracy of mixed-source statistics is crucial for decision making, but it is not straightforward. Here we simulate the effect of classification errors on the accuracy of turnover-level estimates in car-trade industries. We combine an audit sample, the dynamics in the business register, and expert knowledge to estimate a transition matrix of classification-error probabilities. Bias and variance of the turnover estimates caused by classification errors are estimated by a bootstrap resampling approach. In addition, we study the extent to which manual selective editing at micro level can improve the accuracy. Our analyses reveal which industries do not meet preset quality criteria. Surprisingly, more selective editing can result in less accurate estimates for specific industries, and a fixed allocation of editing effort over industries is more effective than an allocation in proportion with the accuracy and population size of each industry. We discuss how to develop a practical method that can be implemented in production to estimate the accuracy of register-based estimates.
Joep Burger, Arnout van Delden and Sander Scholtus
For policymakers and other users of official statistics, it is crucial to distinguish real differences underlying statistical outcomes from noise caused by various error sources in the statistical process. This has become more difficult as official statistics are increasingly based upon a mix of sources that typically do not involve probability sampling. In this article, we apply a resampling method to assess the sensitivity of mixed-source statistics to sourcespecific classification errors. Classification errors can be seen as coverage errors within a stratum. The method can be used to compare relative accuracies between strata and releases, it can assist in deciding how to optimally allocate resources in the statistical process, and it can be applied in evaluating potential estimators. A case study on short-term business statistics shows that bias occurs especially for those strata that deviate strongly from the mean value in other strata. It also suggests that shifting classification resources from small and mediumsized enterprises to large enterprises has virtually no net effect on accuracy, because the gain in precision is offset by the creation of bias. The resampling method can be extended to include other types of nonsampling error.
Arnout van Delden, Jan van der Laan and Annemarie Prins
Administrative data sources are increasingly used by National Statistical Institutes to compile statistics. These sources may be based on decentralised autonomous administrations, for instance municipalities that deliver data on their inhabitants. One issue that may arise when using these decentralised administrative data is that categorical variables are underreported by some of the data suppliers, for instance to avoid administrative burden. Under certain conditions overreporting may also occur.
When statistical output on changes is estimated from decentralised administrative data, the question may arise whether those changes are affected by shifts in reporting frequencies. For instance, in a case study on hospital data, the values from certain data suppliers may have been affected by changes in reporting frequencies. We present an automatic procedure to detect suspicious data suppliers in decentralised administrative data in which shifts in reporting behaviour are likely to have affected the estimated output. The procedure is based on a predictive mean matching approach, where part of the original data values are replaced by imputed values obtained from a selected reference group. The method is successfully applied to a case study with administrative hospital data.