Coverage Evaluation on Probabilistically Linked Data

Open access


The Capture-recapture method is a well-known solution for evaluating the unknown size of a population. Administrative data represent sources of independent counts of a population and can be jointly exploited for applying the capture-recapture method. Of course, administrative sources are affected by over- or undercoverage when considered separately. The standard Petersen approach is based on strong assumptions, including perfect record linkage between lists. In reality, record linkage results can be affected by errors. A simple method for achieving linkage error-unbiased population total estimates is proposed in Ding and Fienberg (1994). In this article, an extension of the Ding and Fienberg model by relaxing their conditions is proposed. The procedures are illustrated for estimating the total number of road casualties, on the basis of a probabilistic record linkage between two administrative data sources. Moreover, a simulation study is developed, providing evidence that the adjusted estimator always performs better than the Petersen estimator.

Agresti, A. 1994. “Simple Capture-Recapture Models Permitting Unequal Catchability and Variable Sampling Effort”. Biometrics 50: 494-500.

Bartolucci, F. and A. Forcina 2006. “A Class of Latent Marginal Models for Capture- Recapture Data With Continuous Covariates”. Journal of the American Statistical Association 101: 786-794, Doi:

Belin, T.R. and D.B. Rubin 1995. “A Method for Calibrating False-Match Rates in Record Linkage”. Journal of the American Statistical Association 90: 694-707. Doi:

Bock, R.D. 1975. Multivariate Statistical Methods in Behavioral Research. New York: McGraw-Hill.

Chao, A. 2001. “An Overview of Closed Capture-Recapture Models”. Journal of Agricultural, Biological, and Environmental Statistics 6: 158-175. Doi:

Chen, Z. and L. Kuo 2001. “A Note on the Estimation of the Multinomial Logit Model with Random Effects”. The American Statistician 55: 89-95. Doi:

Cormack, R.M. 1989. “Log-Linear Models for Capture-Recapture”. Biometrics 45: 395-413. Doi:

Coull, B.A. and A. Agresti 1999. “The Use of Mixed Logit Models to Reflect Heterogeneity in Capture-Recapture Studies”. Biometrics 55: 294-301. Doi:

Ding, Y. and S.E. Fienberg 1994. “Dual System Estimation of Census Undercount in the Presence of Matching Error”. Survey Methodology 20: 149-158.

Fellegi, I.P. and A.B. Sunter 1969. “A Theory for Record Linkage”. Journal of the American Statistical Association 64: 1183-1210. Doi:

Fienberg, S.E. 1972. “The Multiple Recapture Census for Closed Populations and Incomplete 2k Contingency Tables”. Biometrika 59: 591-603. Doi:

Ghosh, S.K. and J.L. Norris 2005. “Bayesian Capture-Recapture Analysis and Model Selection Allowing for Heterogeneity and Behavioral Effects”. NCSU Institute of Statistics, Mimeo Series 2562: 1-27. Doi:

Jaro, M. 1989. “Advances in Record Linkage Methodology as Applied to Matching the 1985 Test Census of Tampa, Florida”. Journal of American Statistical Association 84: 414-420. Doi:

Large, A., J. Brown, O. Abbott, and A. Taylor 2011. “Estimating and Correcting for Over-Count in the 2011 Census”. Survey Methodology Bulletin 69: 35-48.

Larsen, M.D. and D.B. Rubin 2001. “Iterative Automated Record Linkage Using Mixture Models”. Journal of the American Statistical Association 96: 32-41. Doi:

Lincoln, F.C. 1930. Calculating Waterfowl Abundance on the Basis of Banding Returns 118: United States Department of Agriculture Circular, 1-4.

McFadden, D. 1974. “Conditional Logit Analysis of Qualitative Choice Behavior”. In Frontiers in Econometrics, edited by P. Zarembka, 105-142, New York: Academic Press.

McLeod, P., D. Heasman, and I. Forbes 2011. Simulated Data for the on the Job Training, Essnet DI. Available at: (accessed 20 July, 2015).

Petersen, C.G.J. 1896. “The Yearly Immigration of Young Plaice Into the Limfiord From the German Sea”. Report of the Danish Biological Station 6: 5-84.

RELAIS. 2011. User’s Guide Version 2.2 Available at: (accessed 20 July, 2015)

Wolter, K.M. 1986. “Some Coverage Error Models for Census Data”. Journal of the American Statistical Association 81: 338-346. Doi:

Zwane, E. and P. van der Heijden (2005). “Population Estimation Using the Multiple System Estimator in the Presence of Continuous Covariates”. Statistical Modelling 5: 39-52. Doi:

Journal of Official Statistics

The Journal of Statistics Sweden

Journal Information

IMPACT FACTOR 2017: 0.662
5-year IMPACT FACTOR: 1.113

CiteScore 2017: 0.74

SCImago Journal Rank (SJR) 2017: 1.158
Source Normalized Impact per Paper (SNIP) 2017: 0.860


All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 254 250 28
PDF Downloads 116 116 9