A commonly known problem in population size estimation using registers, is that registers do not necessarily cover the whole population. This may be because they intend to cover part of the population (e.g., students), due to administrative delay or because part of the target population is not registered by default (e.g., illegal persons). One of the methods to estimate the population size in the presence of undercount is the capture-recapture method that combines the information of two or more samples. In the context of census estimation registers are used instead of samples. However, the method assumes that perfect linkage between the registers can be achieved. It is known that this assumption is often violated.
In the setting of evaluating the population coverage of a census using a post-enumeration survey, a correction for linkage error was proposed. That correction was later generalized by relaxing some of the newly introduced conditions. However, the new correction method still implicitly assumed that the two registers are of equal size. We introduce a further generalization that includes both previously mentioned correction methods and at the same time deals with registers of different sizes. Specific parameter settings will correspond to the different correction methods. We show that the parameters of each method can be chosen such that the resulting estimates all equal the traditional Petersen estimate (1896) that would theoretically be obtained under truly perfect linkage.
If the inline PDF is not rendering correctly, you can download the PDF file here.
Cadwell B.L. P.J. Smith and A.L. Baughman. 2005. “Methods for Capture-Recapture Analysis When Cases Lack Personal Identifiers.” Statistics in Medicine 24(13): 2041–2051. Doi: https://doi.org/10.1002/sim.2081.
Di Consiglio L. and T. Tuoto. 2015. “Coverage Evaluation on Probabilistically Linked Data.” Journal of Official Statistics 31(3): 415–429. Doi: https://doi.org/10.1515/jos-2015-0025.
Ding Y. and S.E. Fienberg. 1994. “Dual System Estimation of Census Undercount in the Presence of Matching Error.” Survey Methodology 20: 149–158. Available at: https://www150.statcan.gc.ca/n1/en/pub/12-001-x/1994002/article/14422-eng.pdf (accessed June 2019).
Fienberg S.E. 1992. “Bibliography on Capture-Recapture Modelling with Application to Census Undercount Adjustment.” Survey Methodology 18: 143–154. Available at: https://www150.statcan.gc.ca/n1/pub/12-001-x/1992001/article/14494-eng.pdf (accessed June 2019).
Fellegi I.P. and A.B. Sunter. 1969. “A Theory for Record Linkage.” Journal of the American Statistical Association 64: 1183–1210. Doi: https://doi.org/10.1080/01621459.1969.10501049.
Gerritse S.C. B.F.M. Bakker P.P. de Wolf and P.G.M. van der Heijden. 2016a. “Under Coverage of the Population Register in the Netherlands 2010.” Discussion paper 2016-02 (Centraal Bureau voor de Statistiek Den Haag/Heerlen). Available at: https://www.cbs.nl/nl-nl/achtergrond/2016/08/under-coverage-of-the-population-register-in-the-netherlands-2010 (accessed June 2019).
Gerritse S.C. B.F.M. Bakker D. Zult and P.G.M. van der Heijden. 2016b. The Effects of Imperfect Linkage and Erroneous Captures on the Population Size Estimator Chapter 3 of PhD thesis An Application of Population Size Estimation to Official Statistics S.C. Gerritse ISBN 978-94-6233-323-9. Available at: https://dspace.library.uu.nl/bit-stream/handle/1874/337476/Gerritse.pdf (accessed June 2019).
Herzog T.N. F.J. Scheuren and W.E. Winkler. 2007. Data Quality and Record Linkage Techniques. Springer. Doi: https://doi.org/10.1007/0-387-69505-2.
Lincoln F.C. 1930. “Calculating Waterfowl Abundance on the Basis of Banding Returns.” United States Department of Agriculture Circular 118: 1 – 4. Available at: https://archive.org/details/calculatingwater118linc/page/n1.
McLeod P. D. Heasman and I. Forbes. 2011. Simulated Data for the on the Job Training ESSnet DI. Available at: https://ec.europa.eu/eurostat/cros/content/job-training_en (accessed 15 April 2019).
Petersen C.G. J. 1896. “The Yearly Immigration of Young Plaice into the Limfjord from the German Sea.” Report of the Danish Biological Station (1895) 6: 5–84. Available at: https://www.biodiversitylibrary.org/ia/reportofdanishbi06dans#page/8/mode/1up (accessed June 2019).
Sanathanan L. 1972. “Estimating the Size of a Multinomial Population.” The Annals of Mathematical Statistics 43(1): 142–152. Doi: https://doi.org/10.1214/aoms/1177692709.
Wolter K.M. 1986. “Some Coverage Error Models for Census Data.” Journal of the American Statistical Association 81: 338–346. Doi: https://doi.org/10.1080/01621459.1986.10478277.