The quantity and quality of administrative information available to National Statistical Institutes have been constantly increasing over the past several years. However, different sources of administrative data are not expected to each have the same population coverage, so that estimating the true population size from the collective set of data poses several methodological challenges that set the problem apart from a classical capture-recapture setting. In this article, we consider two specific aspects of this problem: (1) misclassification of the units, leading to lists with both overcoverage and undercoverage; and (2) lists focusing on a specific subpopulation, leaving a proportion of the population with null probability of being captured. We propose an approach to this problem that employs a class of capturerecapture methods based on Latent Class models. We assess the proposed approach via a simulation study, then apply the method to five sources of empirical data to estimate the number of active local units of Italian enterprises in 2011.
Biggeri, A., E. Stanghellini, F. Merletti, and M. Marchi. 1999. “Latent Class Models for Varying Catchability and Correlation Among Sources in Capture-Recapture Estimation of the Size of a Human Population.” Statistica Applicata 11: 563–576.
Consalvi, M., L. Costanzo, and D. Filipponi. 2008. “Evolution of Census Statistics on Enterprises in Italy 1996–2006: from the Traditional Census to a Register of Local Units.” In Proceedings of the IAOS Conference on Reshaping Official Statistics, October 2008, Shanghai.
Dempster, A.P., N.M. Laird, and D.B. Rubin. 1977. “Maximum Likelihood from Incomplete Data via the EM Algorithm.” Journal of the Royal Statistical Society, Series B (methodological) 39: 1–38.
Zwane, E.N., K.M. van der Pal-de Bruin, and P.G. van der Heijden. 2004. “The Multiple-Record Systems Estimator when Registrations Refer to Different but Overlapping Populations.” Statistics in Medicine 23: 2267–2281. Doi: http://dx.doi.org/10.1002/sim.1818.