Administrative data sources are increasingly used by National Statistical Institutes to compile statistics. These sources may be based on decentralised autonomous administrations, for instance municipalities that deliver data on their inhabitants. One issue that may arise when using these decentralised administrative data is that categorical variables are underreported by some of the data suppliers, for instance to avoid administrative burden. Under certain conditions overreporting may also occur.
When statistical output on changes is estimated from decentralised administrative data, the question may arise whether those changes are affected by shifts in reporting frequencies. For instance, in a case study on hospital data, the values from certain data suppliers may have been affected by changes in reporting frequencies. We present an automatic procedure to detect suspicious data suppliers in decentralised administrative data in which shifts in reporting behaviour are likely to have affected the estimated output. The procedure is based on a predictive mean matching approach, where part of the original data values are replaced by imputed values obtained from a selected reference group. The method is successfully applied to a case study with administrative hospital data.
Backor, K., S. Golde, and N. Nie. 2007. “Estimating Survey Fatigue in Time Use Study.” Paper presented at the 29th Annual Conference of the International Association of Time Use Research, 17–19 October 2007, Washington, DC, U.S.A. Available at http://www.atususers.umd.edu/wip2/papers_i2007/Backor.pdf (accessed October 2018).
Charlson, M.E., P. Pompei, K.L. Ales, and R. MacKenzie. 1987. “A New Method of Classifying Prognostic Comorbidity in Longitudinal Studies: Development and Validation.” Journal of Chronic Diseased 40: 373–383. Doi: http://dx.doi.org/10.1016/0021-9681(87)90171-8.
De Waal, T., J. Pannekoek, and S. Scholtus. 2011. “Handbook of Statistical Data Editing and Imputation.” New York: John Wiley and Sons.
Jarman, B., S. Gault, B. Alves, A. Hider, S. Dolan, A. Cook, B. Hurwitz, and L.I. Iezzoni. 1999. “Explaining Differences in English Hospital Death Rates Using Routinely Collected Data.” Biomedicial Journal (BMJ) 318: 1515–1520. Doi: http://dx.doi.org/10.1136/bmj.318.7197.1515.
Kim, Y., Y-K. Choi, and S. Emery. 2013. “Logistic Regression with Multiple Random Effects: A Simulation Study of Estimation Methods and Statistical Packages.” The American Statistician 67: 171–182. Doi: http://dx.doi.org/10.1080/00031305.2013.817357.
Oberski, D.L., A. Kirchner, S. Eckman, and F. Kreuter. 2017. “Evaluating the Quality of Survey and Administrative Data with Generalized Multitrait-Multimethod Models.” Journal of the American Statistical Association. Available at http://dx.doi.org/10.1080/01621459.2017.1302338 (accessed February 2018).
Pitches, D.W., M.A. Mohammed, and R.J. Lilford. 2007. “What Is the Empirical Evidence That Hospitals with Higher-Risk Adjusted Mortality Rates Provide Poorer Quality Care? A Systematic Review of the Literature.” BMC Health Services Research 7: 91–98. Doi: http://dx.doi.org/10.1186/1472-6963-7-91.
Prins, M.J. 2016. The Effect of Coding Practice on the Hospital Standardised Mortality Ratio, Master Thesis. Utrecht University. (available upon request).
Quan, H., B. Li, L.D. Saunders, G.A. Parsons, C.I. Nilsson, A. Alibhai, and W.A. Ghali. 2008. “Assessing Validity of ICD-9-CM and ICD-10 Administrative Data in Recording Clinical Conditions in a Unique Dually Coded Database.” Health Services Research 43: 1424–1441. Available at http://onlinelibrary.wiley.com/doi/10.1111/j.1475-6773.2007.00822.x/full (accessed February 2018).
Rousseeuw, P.J. and A.M. Leroy. 1987. Robust Regression and Outlier Detection. New York: John Wiley and Sons.
Silberstein, A.R. and C.A. Jacobs. 1989. Symptoms of Repeated Interview Effects in the Consumer Expenditure Survey. In Panel Surveys, edited by D. Kasprzyk, G. Duncan, G. Kalton, and M.P. Singh, 289–303. New York: John Wiley and Sons.
Tourangeau, R., R.M. Groves, and C. Redline. 2010. “Sensitive Topics and Reluctant Respondents: Demonstrating a Link between Nonresponse Bias and Measurement Error.” Public Opinion Quarterly 74(3): 413–432. Doi: http://dx.doi.org/10.1093/poq/nfq004.