Data integration is now common practice in official statistics and involves an increasing number of sources. When using multiple sources, an objective is to assess the unknown size of the population. To this aim, capture-recapture methods are applied. Standard capture-recapture methods are based on a number of strong assumptions, including the absence of errors in the integration procedures. However, in particular when the integrated sources were not originally collected for statistical purposes, this assumption is unlikely and linkage errors (false links and missing links) may occur. In this article, the problem of adjusting population estimates in the presence of linkage errors in multiple lists is tackled; under homogeneous linkage error probabilities assumption, a solution is proposed in a realistic and practical scenario of multiple lists linkage procedure.
Agresti, A. 1994. “Simple Capture-Recapture Models Permitting Unequal Catchability and Variable Sampling Effort.” Biometrics 50: 494–500. Doi: http://dx.doi.org/10.2307/2533391.
Bartolucci, F. and A. Forcina. 2006. “A Class of Latent Marginal Models for Capture-Recapture Data with Continuous Covariates.” Journal of the American Statistical Association 101: 786–794. Doi: http://dx.doi.org/10.1198/073500105000000243.
Chipperfield, J. and R. Chambers. 2015. “Using the Bootstrap to Account for Linkage Errors when Analysing Probabilistically Linked Categorical Data.” Journal of Official Statistics 31(3): 397–414. Doi: http://dx.doi.org/10.1515/jos-2015-0024.
Darroch, J.N., S.E. Fienberg, G.F.V. Glonek, and B.W. Junker. 1993. “A Three-Sample Multiple-Recapture Approach to Census Population Estimation with Heterogeneous Catchability.” Journal of the American Statistical Association 88: 1137–1148. Doi: http://dx.doi.org/10.2307/2290811.
Fienberg, S.E. and D. Manrique-Vallier. 2009. “Integrated Methodology for Multiple Systems Estimation and Record Linkage Using a Missing Data Formulation.” Advances in Statistical Analysis 93: 49–60. Doi: http://dx.doi.org/10.1007/s10182-008-0084-z.
Fortini, M., B. Liseo, A. Nuccitelli, and M. Scanu. 2001. “On Bayesian Record Linkage.” Research in Official Statistics 4(1): 185–198.
IWGDMF – International Working Group for Disease Monitoring and Forecasting. 1995. “Capture-Recapture and Multiple-Record Systems Estimation I: History and Theoretical Development.” American Journal of Epidemiology 142: 1047–1058. Doi: http://dx.doi.org/10.1093/oxfordjournals.aje.a117558.
Sadinle, M., R. Hall, and S.E. Fienberg. 2011. “Approaches to Multiple Record Linkage.” In Proceedings of the ISI World Statistical Congress, 21–26 August 2011, Dublin: 1064–1071. Available at: http://2011.isiproceedings.org/papers/450092.pdf (accessed November 2018).
Sadinle, M. and S.E. Fienberg. 2013. “A Generalized Fellegi-Sunter Framework for Multiple Record Linkage With Application to Homicide Record Systems.” Journal of the American Statistical Association 108: 385–397. Doi: http://dx.doi.org/10.1080/01621459.2012.757231.
Steorts, R., R. Hall, and S.E. Fienberg. 2014. “SMERED: A Bayesian Approach to Graphical Record Linkage and De-duplication.” Journal of Machine Learning Research 33: 922–930. Available at: http://proceedings.mlr.press/v33/steorts14.pdf (accessed November 2018).
Steorts, R., R. Hall, and S.E. Fienberg. 2015. “A Bayesian Approach to Graphical Record Linkage and De-duplication.” Journal of the American Statistical Association. Available at: URL http://arxiv.org/abs/1312.4645.
Tuoto, T., B.F.M. Bakker, L. Di Consiglio, D.J. van der Laan, P.-P. de Wolf, and D. Zult. 2017. “Two Improvements of the Method for Population Size Estimation.” in Proceedings of the 61st World Statistics Congress 16–21 July 2017, Marrakech.
Ventura, S. and R. Nugent. 2014. “Hierarchical Clustering with Distributions of Distances for Large-Scale Record Linkage.” In Privacy in Statistical Databases, edited by J. Domingo-Ferrer, 283–298. Berlin: Springer Link. Lecture Notes in Computer Science 8744.
Zwane, E. and P.G.M. van der Heijden. 2005. “Population Estimation using the Multiple System Estimator in the Presence of Continuous Covariates.” Statistical Modelling 5: 39–52. Doi: http://dx.doi.org/10.1191/1471082X05st086oa.