Detecting Reporting Errors in Data from Decentralised Autonomous Administrations with an Application to Hospital Data

Open access


Administrative data sources are increasingly used by National Statistical Institutes to compile statistics. These sources may be based on decentralised autonomous administrations, for instance municipalities that deliver data on their inhabitants. One issue that may arise when using these decentralised administrative data is that categorical variables are underreported by some of the data suppliers, for instance to avoid administrative burden. Under certain conditions overreporting may also occur.

When statistical output on changes is estimated from decentralised administrative data, the question may arise whether those changes are affected by shifts in reporting frequencies. For instance, in a case study on hospital data, the values from certain data suppliers may have been affected by changes in reporting frequencies. We present an automatic procedure to detect suspicious data suppliers in decentralised administrative data in which shifts in reporting behaviour are likely to have affected the estimated output. The procedure is based on a predictive mean matching approach, where part of the original data values are replaced by imputed values obtained from a selected reference group. The method is successfully applied to a case study with administrative hospital data.

Backor, K., S. Golde, and N. Nie. 2007. “Estimating Survey Fatigue in Time Use Study.” Paper presented at the 29th Annual Conference of the International Association of Time Use Research, 17–19 October 2007, Washington, DC, U.S.A. Available at (accessed October 2018).

Bakker, B.F.M. and P.J.H. Daas. 2012. Methodological Challenges of Register-based Research. Statistica Neerlandica, 66: 2–7. Doi:

Berenschot. 2012. Inventarisatie informatiebehoefte brandweerstatistiek. Eindrapport (in Dutch). Available at (accessed February 2018).

Bottle, A., B. Jarman, and P. Aylin. 2011. “Hospital Standardized Mortality Ratios: Sensitivity Analyses on the Impact of Coding.” Health Services Research 46: 1741–1761. Doi:

Brackstone, G.J. 1987. “Issues in the Use of Administrative Records for Statistical Purposes.” Survey Methodology 13: 29–43. Available at (accessed October 2018).

Charlson, M.E., P. Pompei, K.L. Ales, and R. MacKenzie. 1987. “A New Method of Classifying Prognostic Comorbidity in Longitudinal Studies: Development and Validation.” Journal of Chronic Diseased 40: 373–383. Doi:

De Waal, T., J. Pannekoek, and S. Scholtus. 2011. “Handbook of Statistical Data Editing and Imputation.” New York: John Wiley and Sons.

Efron, B. and C.N. Morris. 1975. “Data Analysis using Stein’s Estimator and Its Generalizations.” Journal of the American Statistical Association 74: 311–319. Doi:

Elixhauser, A., C. Steiner, D.R. Harris, and R.M. Coffey. 1998. “Comorbidity Measures for Use with Administrative Data.” Medical Care 36: 8–27. Doi:

Groen, J.A. 2012. “Sources of Error in Survey and Administrative Data: The Importance of Reporting Procedures.” Journal of Official Statistics 28: 173–198. Available at (accessed October 2018).

Harteloh, P., K. de Bruin, and J. Kardaun. 2010. “The Reliability of Cause-of-death Coding in The Netherlands.” The European Journal of Epidemiology 25: 531–538. Doi:

Hosmer, D.W. and S. Lemeshow. 2004. Applied Logistic Regression. New York: John Wiley and Sons.

Israëls, A., J. van der Laan, J. van der Akker-Ploemacher, and A. de Bruin. 2012. HSMR 2011: Methodological report. Technical report, Statistics Netherlands. Available at (accessed February 2018).

Jarman, B. 2008. “In Defence of the Hospital Standardised Mortality Ratio.” Healthcare Papers 8: 37–41. Doi:

Jarman, B., S. Gault, B. Alves, A. Hider, S. Dolan, A. Cook, B. Hurwitz, and L.I. Iezzoni. 1999. “Explaining Differences in English Hospital Death Rates Using Routinely Collected Data.” Biomedicial Journal (BMJ) 318: 1515–1520. Doi:

Kim, Y., Y-K. Choi, and S. Emery. 2013. “Logistic Regression with Multiple Random Effects: A Simulation Study of Estimation Methods and Statistical Packages.” The American Statistician 67: 171–182. Doi:

Oberski, D.L., A. Kirchner, S. Eckman, and F. Kreuter. 2017. “Evaluating the Quality of Survey and Administrative Data with Generalized Multitrait-Multimethod Models.” Journal of the American Statistical Association. Available at (accessed February 2018).

Pieter, D., R.B. Kool, and G.P. Westert. 2010. Nederlands Tijdschrift voor Geneeskunde, 154, A2186 [in Dutch]. Available at (accessed February 2018).

Pitches, D.W., M.A. Mohammed, and R.J. Lilford. 2007. “What Is the Empirical Evidence That Hospitals with Higher-Risk Adjusted Mortality Rates Provide Poorer Quality Care? A Systematic Review of the Literature.” BMC Health Services Research 7: 91–98. Doi:

Prins, M.J. 2016. The Effect of Coding Practice on the Hospital Standardised Mortality Ratio, Master Thesis. Utrecht University. (available upon request).

Quan, H., B. Li, L.D. Saunders, G.A. Parsons, C.I. Nilsson, A. Alibhai, and W.A. Ghali. 2008. “Assessing Validity of ICD-9-CM and ICD-10 Administrative Data in Recording Clinical Conditions in a Unique Dually Coded Database.” Health Services Research 43: 1424–1441. Available at (accessed February 2018).

Rousseeuw, P.J. and A.M. Leroy. 1987. Robust Regression and Outlier Detection. New York: John Wiley and Sons.

Rubin, D.B. 1978. Multiple Imputations in Sample Surveys – a Phenomenological Bayesian Approach to Nonresponse. Proceedings of the Section on Survey Research Methods. American Statistical Association. Available at (accessed February 2018).

Rubin, D.B. 1987. Multiple Imputation for Non-response in Surveys. New York: John Wiley and Sons.

Shields, J. and N. To. 2005. “Learning to Say No: Conditioned Underreporting in an Expenditure Survey.” Paper presented at the American Association for Public Opinion Research Annual Conference, 12–15 May 2005, Miami Beach, U.S.A. Available at (accessed October 2018).

Silberstein, A.R. and C.A. Jacobs. 1989. Symptoms of Repeated Interview Effects in the Consumer Expenditure Survey. In Panel Surveys, edited by D. Kasprzyk, G. Duncan, G. Kalton, and M.P. Singh, 289–303. New York: John Wiley and Sons.

Tourangeau, R., R.M. Groves, and C. Redline. 2010. “Sensitive Topics and Reluctant Respondents: Demonstrating a Link between Nonresponse Bias and Measurement Error.” Public Opinion Quarterly 74(3): 413–432. Doi:

Tourangeau, R. and T. Yan. 2007. “Sensitive Questions in Surveys.” Psychological Bulletin 133(5): 859–883. Doi:

United Nations Economic Commission for Europe. 2011. Using Administrative and Secondary Sources for Official Statistics: a Handbook of Principles and Practices. New York and Geneva: United Nations. Available at (accessed February 2018).

Van Delden, A. and S. Scholtus. 2017. “Correspondence between survey and admin data on quarterly turnover.” CBS Discussion Paper 2017-03. Available at (accessed February 2018).

Van den Bosch, W.F., J. Silberbusch, K.J. Roozendaal, and C. Wagner. 2010. Variatie in codering Patiëntengegevens beïnvloedt gestandaardiseerd ziekenhuissterftecijfer (HSMR). Nederlands Tijdschrift voor Geneeskunde, 154 A1189 [in Dutch]. Available atëntengegevens-beïnvloedtgestandaardiseerd-ziekenhuissterftecijfer/volledig (accessed February 2018).

Van der Laan, J. 2013. Quality of the Dutch Medical Registration (LMR) for the calculation of the Hospital Standardised Mortality Ratio. Discussion Paper. Statistics Netherlands. Available at (accessed February 2018).

Van der Laan, J., A. de Bruin, J. van den Akker-Ploemacher, C. Penning, and F. Pijpers. 2015. HSMR 2014: Methodological Report. Technical Report. Statistics Netherlands. Available at (accessed February 2018).

Wallgren, A. and B. Wallgren. 2014. Register-based Statistics. Statistical Methods for Administrative Data (2nd edition). New York: John Wiley and Sons.

West, B.T. and A.G. Blom. 2017. “Explaining Interviewing Effects: A Research Synthesis.” Journal of Survey Statistics and Methodology 5: 175–211. Doi:

Journal of Official Statistics

The Journal of Statistics Sweden

Journal Information

IMPACT FACTOR 2017: 0.662
5-year IMPACT FACTOR: 1.113

CiteScore 2017: 0.74

SCImago Journal Rank (SJR) 2017: 1.158
Source Normalized Impact per Paper (SNIP) 2017: 0.860


All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 130 130 62
PDF Downloads 96 96 43