The quantity and quality of administrative information available to National Statistical Institutes have been constantly increasing over the past several years. However, different sources of administrative data are not expected to each have the same population coverage, so that estimating the true population size from the collective set of data poses several methodological challenges that set the problem apart from a classical capture-recapture setting. In this article, we consider two specific aspects of this problem: (1) misclassification of the units, leading to lists with both overcoverage and undercoverage; and (2) lists focusing on a specific subpopulation, leaving a proportion of the population with null probability of being captured. We propose an approach to this problem that employs a class of capturerecapture methods based on Latent Class models. We assess the proposed approach via a simulation study, then apply the method to five sources of empirical data to estimate the number of active local units of Italian enterprises in 2011.
Bartolucci, F. and A. Forcina. 2001. “Analysis of Capture-Recapture Data with a Rasch Type Model Allowing for Conditional Dependence and Multidimensionality.” Biometrics 57: 714–719. Doi: http://dx.doi.org/10.1111/j.0006-341X.2001.00714.x.
Biggeri, A., E. Stanghellini, F. Merletti, and M. Marchi. 1999. “Latent Class Models for Varying Catchability and Correlation Among Sources in Capture-Recapture Estimation of the Size of a Human Population.” Statistica Applicata 11: 563–576.
Consalvi, M., L. Costanzo, and D. Filipponi. 2008. “Evolution of Census Statistics on Enterprises in Italy 1996–2006: from the Traditional Census to a Register of Local Units.” In Proceedings of the IAOS Conference on Reshaping Official Statistics, October 2008, Shanghai.
Dempster, A.P., N.M. Laird, and D.B. Rubin. 1977. “Maximum Likelihood from Incomplete Data via the EM Algorithm.” Journal of the Royal Statistical Society, Series B (methodological) 39: 1–38.
Fienberg, S.E. 1972. “The Multiple Recapture Census for Closed Populations and Incomplete 2k Contingency Tables.” Biometrika 59: 409–439.
Kamen, C.S. 2005. The 2008 Israel Integrated Census of Population and Housing Basic conception and procedure. Central Bureau of Statistics. Available at: http://www.cbs.gov.il/mifkad/census2008e.pdf (accessed May 2017).
Pledger, S.A. 2000. “Unified Maximum Likelihood Estimates for Closed Capture-Recapture Models Using Mixtures.” Biometrics 56: 434–442. Doi: http://dx.doi.org/10.1111/j.0006-341X.2000.00434.x.
Stanghellini, E. and P.G. Van der Heijden. 2004. “A Multiple-Record Systems Estimation Method that Takes Observed and Unobserved Heterogeneity into Account.” Biometrics 60: 510–516. Doi: http://dx.doi.org/10.1111/j.0006-341X.2004.00197.x.
Sutherland, J.M. 2003. Multi-List Methods in Closed Populations with Stratified or Incomplete Information. PhD Thesis. Simon Fraser University.
Sutherland, J.M. and C.J. Schwarz. 2005. “Multi-List Methods Using Incomplete Lists in Closed Populations.” Biometrics 61: 134–140. Doi: http://dx.doi.org/10.1111/j.0006-341X.2005.021126.x.
Zhang, L.-C. 2012. “Topics of statistical Theory for Register-Based Statistics and Data Integration.” Statistica Neerlandica 66: 41–63. Doi: http://dx.doi.org/10.1111/j.1467-9574.2011.00508.x.