Population Size Estimation Using Multiple Incomplete Lists with Overcoverage

Open access


The quantity and quality of administrative information available to National Statistical Institutes have been constantly increasing over the past several years. However, different sources of administrative data are not expected to each have the same population coverage, so that estimating the true population size from the collective set of data poses several methodological challenges that set the problem apart from a classical capture-recapture setting. In this article, we consider two specific aspects of this problem: (1) misclassification of the units, leading to lists with both overcoverage and undercoverage; and (2) lists focusing on a specific subpopulation, leaving a proportion of the population with null probability of being captured. We propose an approach to this problem that employs a class of capturerecapture methods based on Latent Class models. We assess the proposed approach via a simulation study, then apply the method to five sources of empirical data to estimate the number of active local units of Italian enterprises in 2011.

Agresti, A. 1994. “Simple Capture-Recapture Models Permitting Unequal Catchability and Variable Sampling Effort.” Biometrics 50: 494–500. Doi: http://dx.doi.org/10.2307/2533391.

Bartolucci, F. and A. Forcina. 2001. “Analysis of Capture-Recapture Data with a Rasch Type Model Allowing for Conditional Dependence and Multidimensionality.” Biometrics 57: 714–719. Doi: http://dx.doi.org/10.1111/j.0006-341X.2001.00714.x.

Biemer, P. 2011. Latent Class Analysis of Survey Error. John Wiley & Sons Inc., NY. Doi: http://dx.doi.org/10.1002/9780470891155.

Biggeri, A., E. Stanghellini, F. Merletti, and M. Marchi. 1999. “Latent Class Models for Varying Catchability and Correlation Among Sources in Capture-Recapture Estimation of the Size of a Human Population.” Statistica Applicata 11: 563–576.

Consalvi, M., L. Costanzo, and D. Filipponi. 2008. “Evolution of Census Statistics on Enterprises in Italy 1996–2006: from the Traditional Census to a Register of Local Units.” In Proceedings of the IAOS Conference on Reshaping Official Statistics, October 2008, Shanghai.

Dempster, A.P., N.M. Laird, and D.B. Rubin. 1977. “Maximum Likelihood from Incomplete Data via the EM Algorithm.” Journal of the Royal Statistical Society, Series B (methodological) 39: 1–38.

Fienberg, S.E. 1970. “An Iterative Procedure for Estimation in Contingency Tables.” Annals of Mathematical Statistics 41: 907–917. Doi: http://dx.doi.org/10.1214/aoms/1177696968.

Fienberg, S.E. 1972. “The Multiple Recapture Census for Closed Populations and Incomplete 2k Contingency Tables.” Biometrika 59: 409–439.

Hagenaars, J.A. 1988. “Latent Structure Models with Direct Effects Between Indicators Local Dependence Models.” Sociological Methods & Research 16: 379–405. Doi: http://dx.doi.org/10.1177/0049124188016003002.

Hagenaars, J.A. 1993. Loglinear Models with Latent Variables. Newbury Park: CA: Sage. Doi: http://dx.doi.org/10.4135/9781412984850.

Kamen, C.S. 2005. The 2008 Israel Integrated Census of Population and Housing Basic conception and procedure. Central Bureau of Statistics. Available at: http://www.cbs.gov.il/mifkad/census2008e.pdf (accessed May 2017).

Madigan, D. and J.C. York. 1997. “Bayesian Methods for Estimation of the Size of a Closed Population.” Biometrika 84: 19–31. Doi: https://doi.org/10.1093/biomet/84.1.19.

Pledger, S.A. 2000. “Unified Maximum Likelihood Estimates for Closed Capture-Recapture Models Using Mixtures.” Biometrics 56: 434–442. Doi: http://dx.doi.org/10.1111/j.0006-341X.2000.00434.x.

Stanghellini, E. and P.G. Van der Heijden. 2004. “A Multiple-Record Systems Estimation Method that Takes Observed and Unobserved Heterogeneity into Account.” Biometrics 60: 510–516. Doi: http://dx.doi.org/10.1111/j.0006-341X.2004.00197.x.

Sutherland, J.M. 2003. Multi-List Methods in Closed Populations with Stratified or Incomplete Information. PhD Thesis. Simon Fraser University.

Sutherland, J.M. and C.J. Schwarz. 2005. “Multi-List Methods Using Incomplete Lists in Closed Populations.” Biometrics 61: 134–140. Doi: http://dx.doi.org/10.1111/j.0006-341X.2005.021126.x.

Wallgren, A. and B. Wallgren. 2007. Register-Based Statistics: Administrative Data for Statistical Purposes. John Wiley and Sons: Chichester. Doi: http://dx.doi.org/10.1002/9780470061350.

Zhang, L.-C. 2012. “Topics of statistical Theory for Register-Based Statistics and Data Integration.” Statistica Neerlandica 66: 41–63. Doi: http://dx.doi.org/10.1111/j.1467-9574.2011.00508.x.

Zhang, L.-C. 2015. “On Modelling Register Coverage Errors.” Journal of Official Statistics 31: 381–396. Doi: http://doi.org/10.1515/jos-2015-0023.

Zwane, E.N., K.M. van der Pal-de Bruin, and P.G. van der Heijden. 2004. “The Multiple-Record Systems Estimator when Registrations Refer to Different but Overlapping Populations.” Statistics in Medicine 23: 2267–2281. Doi: http://dx.doi.org/10.1002/sim.1818.

Journal of Official Statistics

The Journal of Statistics Sweden

Journal Information

IMPACT FACTOR 2017: 0.662
5-year IMPACT FACTOR: 1.113

CiteScore 2017: 0.74

SCImago Journal Rank (SJR) 2017: 1.158
Source Normalized Impact per Paper (SNIP) 2017: 0.860


All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 311 311 27
PDF Downloads 188 188 17