Statistical Matching as a Supplement to Record Linkage: A Valuable Method to Tackle Nonconsent Bias?

Open access

Abstract

Record linkage has become an important tool for increasing research opportunities in the social sciences. Surveys that perform record linkage to administrative records are often required to obtain informed consent from respondents prior to linkage. A major concern is that nonconsent could introduce biases in analyses based on the linked data. One straightforward strategy to overcome the missing data problem created by nonconsent is to match nonconsenters with statistically similar units in the target administrative database. To assess the effectiveness of statistical matching in this context, we use data from two German panel surveys that have been linked to an administrative database of the German Federal Employment Agency. We evaluate the statistical matching procedure under various artificial nonconsent scenarios and show that the method can be effective in reducing nonconsent biases in marginal distributions, but that biases in multivariate estimates can sometimes be worsened. We discuss the implications of these findings for survey practice and elaborate on some of the practical challenges of implementing the statistical matching procedure in the context of linkage nonconsent. The developed simulation design can act as a roadmap for other statistical agencies considering the proposed approach for their data.

Andridge, R.R. and R.J. Little. 2010. “A Review of Hot Deck Imputation for Survey Non-response.” International Statistical Review 78(1): 40–64.

Antoni, M., A. Ganzer, and P. vom Berge. 2016. Sample of Integrated Labour Market Biographies (SIAB) 1975–2014. FDZ-Datenreport 4, Institute for Employment Research, Nuremberg, Germany. Avaiable at: http://doku.iab.de/fdz/reporte/2016/DR_04-16_EN.pdf.

Antoni, M. and S. Seth. 2011. ALWA-ADIAB – linked individual survey and administrative data for substantive and methodological research. FDZ-Methodenreport 12, Institute for Employment Research, Nuremberg, Germany. Avaiable at: http://doku.iab.de/fdz/reporte/2011/DR_05-11.pdf.

Biemer, P.P., R.M. Groves, L.E. Lyberg, N.A. Mathiowetz and S. Sudman. 2011. Measurement Errors in Surveys. John Wiley & Sons.

Blossfeld, H.-P., H-G. Roßbach, and J. Von Maurice. 2011. “Education as a Lifelong Process.” Zeitschrift für Erziehungswissenschaft Sonderheft 14. ISBN: 978-3-531-17785-4.

Brick, J.M. and G. Kalton. 1996. “Handling Missing Data in Survey Research.” Statistical Methods in Medical Research 5(3): 215 –238. Doi: http://dx.doi.org/10.1177/096228029600500302.

Brücker, H., M. Kroh, S. Bartsch, J. Goebel, S. Kühne, E. Liebau, P. Trübswetter, I. Tucci and J. Schupp. 2014. “The New IAB-SOEP Migration Sample: An Introduction into the Methodology and the Contents.” SOEP Survey Papers 216. Avaiable at: http://hdl.handle.net/10419/103964.

Calderwood, L. and C. Lessof. 2009. “Enhancing Longitudinal Surveys By Linking to Administrative Data.” In Methodology of Longitudinal Surveys, edited by P. Lynn, 55–72. New York: Wiley. ISBN: 978-0-470-01871-2.

Chen, J. and J. Shao. 2000. “Nearest Neighbor Imputation for Survey Data.” Journal of Official Statistics 16(2): 113–131. Available at: https://www.scb.se/contentassets/ca21efb41fee47d293bbee5bf7be7fb3/nearest-neighbor-imputation-for-survey-data.pdf.

Conti, P.L., D. Marella and M. Scanu. 2012. “Uncertainty Analysis in Statistical Matching.” Journal of Official Statistics 28(1): 69–88. Available at: https://www.scb.se/contentassets/ca21efb41fee47d293bbee5bf7be7fb3/uncertainty-analysis-in-statistical-matching.pdf.

Conti, P.L., D. Marella and M. Scanu. 2016. “Statistical Matching Analysis for Complex Survey Data with Applications.” Journal of the American Statistical Association 111(516): 1715–1725. Doi: http://dx.doi.org/01621459.2015.1112803.

Cox, D.R. and D. Oakes. 1984. Analysis of Survival Data. CRC Press.

da Silva, M.E.M., C.M. Coeli, M. Ventura, M. Palacios, M.M.F. Magnanini, T.M.C.R. Camargo and K.R. Camargo. 2012. “Informed Consent for Record Linkage: A Systematic Review.” Journal of Medical Ethics 38(10): 639 – 642. Doi: http://dx.doi.org/10.1136/medethics-2011-100208.

D’Orazio, M., M. Di Zio and M. Scanu. 2006a. “Statistical Matching for Categorical Data: Displaying Uncertainty using Logical Constraints.” Journal of Official Statistics 28(1): 137 – 157. Available at: https://www.scb.se/contentassets/ca21efb41fee47d293bbee5bf7be7fb3/statistical-matching-for-categorical-data-displaying-uncertainty-and-using-logical-constraints.pdf.

D’Orazio, M., M. Di Zio and M. Scanu. 2006b. Statistical Matching: Theory and Practice. John Wiley & Sons.

D’Orazio, M., M. Di Zio and M. Scanu. 2009. “Uncertainty Intervals for Nonidentifiable Parameters in Statistical Matching.” Proceedings of the 57th session of the International Statistical Institute, August 16–22, 2009, Durban, South Africa.

Fellegi, I.P. and A.B. Sunter. 1969. “A Theory for Record Linkage.” Journal of the American Statistical Association 64(328): 1183 – 1210. Doi: http://dx.doi.org/10.1080/01621459.1969.10501049.

Filippello, R., U. Guarnera and G. Jonas Lasinio. 2004. “Use of auxiliary information in statistical matching.” Proceedings of the XLII Conference of the Italian Statistical 9–11 June 2014, Bari, Italy: 37–40.

Fosdick, B.K., M. DeYoreo and J.P. Reiter. 2016. “Categorical Data Fusion using Auxiliary Information.” The Annals of Applied Statistics 10(4): 1907–1929. Doi: http://dx.doi.org/10.1214/16-AOAS925.

Fulton, J.A. 2012. Respondent Consent to Use Administrative Data, Ph. D. thesis, University of Maryland.

GDPR. 2016. “Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation).” Official Journal of the European Union L119: 1–88. Available at: https://eur-lex.europa.eu/eli/reg/2016/679/oj.

Gilula, Z. and R. McCulloch. 2013. “Multi Level Categorical Data Fusion using Partially Fused Data.” Quantitative Marketing and Economics 11(3): 353 – 377. Doi: http://dx.doi.org/10.1007/s11129-013-9136-0.

Gilula, Z., R.E. McCulloch and P.E. Rossi. 2006. “A Direct Approach to Data Fusion.” Journal of Marketing Research 43(1): 73–83. Doi: http://dx.doi.org/10.1509/jmkr.43.1.73.

Herzog, T.N., F.J. Scheuren and W.E. Winkler. 2007. Data Quality and Record Linkage Techniques. Springer Science & Business Media.

Jacobebbinghaus, P. and S. Seth. 2010. Linked-Employer-Employee-Daten des IAB: LIAB – Querschnittmodell 2, 1993–2008. FDZ-Datenreport, Institute for Employment Research, Nuremberg, Germany.

Jenkins, S.P., L. Cappellari, P. Lynn, A. Jäckle and E. Sala. 2006. “Patterns of Consent: Evidence from a General Household Survey.” Journal of the Royal Statistical Society: Series A (Statistics in Society) 169(4): 701–722. Doi: http://dx.doi.org/10.1111/j.1467-985X.2006.00417.x.

Kadane, J.B. 1978. “Some Statistical Problems in Merging Data Files.” Compendium of Tax Research, 159–179, Reprint in Journal of Official Statistics 17(3): 423–433. Avaiable at: https://www.scb.se/contentassets/ff271eeeca694f47ae99b942de61df83/some-statistical-problems-in-merging-data-files.pdf.

Kreuter, F., J.W. Sakshaug and R. Tourangeau. 2016. “The Framing of the Record Linkage Consent Question.” International Journal of Public Opinion Research 28(1): 142–152. Doi: http://dx.doi.org/10.1093/ijpor/edv006.

Little, R.J. and D.B. Rubin. 2002. Statistical Analysis with Missing Data, (2nd ed.). John Wiley & Sons.

Meinfelder, F. 2013. “Datenfusion: Theoretische Implikationen und praktische Umsetzung.” In Weiterentwicklung der amtlichen Haushaltsstatistiken, edited by T. Riede, N. Ott and S. Bechthold, 83–98. Berlin: GWI Wissenschaftspolitik Infrastrukturentwicklung.

Moriarity, C. and F. Scheuren. 2001. “Statistical Matching: A Paradigm for Assessing the Uncertainty in the Procedure.” Journal of Official Statistics 17(3): 407–422. Available at: https://www.scb.se/contentassets/ca21efb41fee47d293bbee5bf7be7fb3/statistical-matching-a-paradigm-for-assessing-the-uncertainty-in-the-procedure.pdf.

Moriarity, C. and F. Scheuren. 2003. “A Note On Rubin’s Statistical Matching using File Concatenation.” Journal of Business and Economic Statistics (21): 65–73. Doi: http://dx.doi.org/10.1198/073500102288618766.

Mostafa, T. 2016. “Variation within Households in Consent to Link Survey Data to Administrative Records: Evidence from the UK Millennium Cohort Study.” International Journal of Social Research Methodology 19(3): 355–375. Doi: http://dx.doi.org/10.1080/13645579.2015.1019264.

Ness, A.R. 2004. “The Avon Longitudinal Study of Parents and Children (ALSPAC) – A Resource for the Study of the Environmental Determinants of Childhood Obesity.” European Journal of Endocrinology 151(Suppl 3): U141–U149. Doi: http://dx.doi.org/10.1530/eje.0.151u141.

Oberski, D.L., A. Kirchner, S. Eckman and F. Kreuter. 2017. “Evaluating the Quality of Survey and Administrative Data with Generalized Multitrait-Multimethod Models.” Journal of the American Statistical Association. Doi: http://dx.doi.org/10.1080/01621459.2017.1302338.

Paass, G. 1985. “Statistical Record Linkage Methodology: State of the Art and Future Prospects.” Bulletin of the International Statistical Society. Proceedings of the 45th Session. Voorburg, Netherlands: ISI.

Rässler, S. 2002. Statistical Matching: A Frequentist Theory, Practical Applications, and Alternative Bayesian Approaches. Springer Science & Business Media.

Rässler, S. 2003. “A Non-Iterative Bayesian Approach to Statistical Matching.” Statistica Neerlandica 57(1): 58–74. Doi: http://dx.doi.org/10.1111/1467-9574.00221.

Rässler, S. and H. Kiesl. 2009. “How Useful are Uncertainty Bounds? Some Recent Theory with an Application to Rubin’s Causal Model.” Proceedings of the 57th Session of the International Statistical Institute, August 16–22, 2009, Durban, South Africa. Available at https://www.isi-web.org/index.php/publications/proceedings.

Renssen, R.H. 1998. “Use of Statistical Matching Techniques in Calibration Estimation.” Survey Methodology 24: 171–184. Available at: https://www150.statcan.gc.ca/n1/pub/12-001-x/1998002/article/4354-eng.pdf.

Rodgers, W.L. 1984. “An Evaluation of Statistical Matching.” Journal of Business & Economic Statistics 2(1): 91 – 102. Doi: http://dx.doi.org/10.1080/07350015.1984.10509373.

Rubin, D.B. 1976. “Inference and Missing Data.” Biometrika (3): 581–592. Doi: http://dx.doi.org/10.2307/2335739.

Rubin, D.B. 1978. “Multiple Imputation in Sample Surveys – a Phenomological Bayesian Approach to Nonresponse.” Proceedings of the Survey Research Method Section of the American Statistical Association: Joint Statistical Meetings 1978, San Diego, U.S.A.: 20–30. Available at: http://www.asasrms.org/Proceedings/index.html.

Rubin, D.B. 1986. “Statistical Matching using File Concatenation with Adjusted Weights and Multiple Imputations.” Journal of Business & Economic Statistics 4(1): 87–94. Doi: http://dx.doi.org/10.1080/07350015.1986.10509497.

Rubin, D.B. 1987. Multiple Imputation for Nonresponse in Surveys. Wiley.

Sakshaug, J.W., M.P. Couper, M.B. Ofstedal and D.R. Weir. 2012. “Linking Survey and Administrative Records: Mechanisms of Consent.” Sociological Methods & Research 41(4): 535–569. Doi: http://dx.doi.org/10.1177/0049124112460381.

Sakshaug, J.W. and M. Huber. 2016. “An Evaluation of Panel Nonresponse and Linkage Consent Bias in a Survey of Employees in Germany.” Journal of Survey Statistics and Methodology 4(1): 71–93. Doi: http://dx.doi.org/10.1093/jssam/smv034.

Sakshaug, J.W., S. Hülle, A. Schmucker and S. Liebig. 2017. “Exploring the Effects of Interviewer- and Self-administered Survey Modes on Record Linkage Consent Rates and Bias.” Survey Research Methods 11(forthcoming): 171 – 188. Doi: http://dx.doi.org/10.18148/srm/2017.v11i2.7158.

Sakshaug, J.W. and F. Kreuter. 2012. “Assessing the Magnitude of Non-Consent Biases in Linked Survey and Administrative Data.” Survey Research Methods 6(2): 113–122. Doi: http://dx.doi.org/10.18148/srm/2012.v6i2.5094.

Sakshaug, J.W. and B. Vicari. 2017. “Obtaining Record Linkage Consent from Establishments: The Impact of Question Placement on Consent Rates and Bias.” Journal of Survey Statistics and Methodology.Doi: http://dx.doi.org/10.1093/jssam/smx009.

Sala, E., J. Burton and G. Knies. 2012. “Correlates of Obtaining Informed Consent to Data Linkage: Respondent, Interview, and Interviewer Characteristics.” Sociological Methods & Research 41(3): 414– 439. Doi: http://dx.doi.org/10.1177/0049124112457330.

Schulte Nordholt, E., J. Van Zeijl and L. Hoeksma. 2014. Dutch Census 2011, Analysis and Methodology, Technical report, Statistics Netherlands. ISBN: 978-90-357-1948-4. Available at: https://www.cbs.nl/NR/rdonlyres/5FDCE1B4-0654-45DA-8D7E-807A0213DE66/0/2014b57pub.pdf.

Sims, C. 1972. “Comments on Okner (1972).” Annals of Economic and Social Measurement (1): 343–345.

Singh, A., H. Mantel, M. Kinack and G. Rowe. 1993. “Statistical Matching: Use of Auxiliary Information as an Alternative to the Conditional Independence Assumption.” Survey Methodology 19(1): 59–79. Available at: https://www150.statcan.gc.ca/n1/en/catalogue/12-001-X199300114475.

Sozialgesetzbuch. 1997. SGB Drittes Buch (III) – “Arbeitsförderung”.

Sozialgesetzbuch. 2003. SGB Zweites Buch (II) – “Grundsicherung für Arbeitsuchende”.

Trappmann, M., J. Beste, A. Bethmann and G. Müller. 2013. “The PASS Panel Survey After Six Waves.” Journal for Labour Market Research 46(4): 275–281. Doi: http://dx.doi.org/10.1007/s12651-013-0150-1.

Van Buuren, S. and K. Groothuis-Oudshoorn. 2011. “MICE: Multivariate Imputation By Chained Equations in R.” Journal of Statistical Software 45(3). Doi: http://dx.doi.org/10.18637/jss.v045.i03.

Wu, C. 2004. “Combining Information from Multiple Surveys through the Empirical Likelihood Method.” Canadian Journal of Statistics 32(1): 15 – 26. Doi: http://dx.doi.org/10.2307/3315996.

Journal of Official Statistics

The Journal of Statistics Sweden

Journal Information


IMPACT FACTOR 2017: 0.662
5-year IMPACT FACTOR: 1.113

CiteScore 2017: 0.74

SCImago Journal Rank (SJR) 2017: 1.158
Source Normalized Impact per Paper (SNIP) 2017: 0.860

Metrics

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 305 305 72
PDF Downloads 332 332 69