Disclosure-Protected Inference with Linked Microdata Using a Remote Analysis Server

James O. Chipperfield

Open Access

Disclosure-Protected Inference with Linked Microdata Using a Remote Analysis Server

James O. Chipperfield

| Feb 14, 2014

Journal of Official Statistics

Volume 30 (2014): Issue 1 (March 2014)

About this article

Cite

Page range: 123 - 146

DOI: https://doi.org/10.2478/jos-2014-0007

Keywords
Confidentiality, remote analysis, record linkage, statistical disclosure control

This content is open access.

Large amounts of microdata are collected by data custodians in the form of censuses and administrative records. Often, data custodians will collect different information on the same individual. Many important questions can be answered by linking microdata collected by different data custodians. For this reason, there is very strong demand from analysts, within government, business, and universities, for linked microdata. However, many data custodians are legally obliged to ensure the risk of disclosing information about a person or organisation is acceptably low. Different authors have considered the problem of how to facilitate reliable statistical inference from analysis of linked microdata while ensuring that the risk of disclosure is acceptably low. This article considers the problem from the perspective of an Integrating Authority that, by definition, is trusted to link the microdata and to facilitate analysts’ access to the linked microdata via a remote server, which allows analysts to fit models and view the statistical output without being able to observe the underlying linked microdata. One disclosure risk that must be managed by an Integrating Authority is that one data custodian may use the microdata it supplied to the Integrating Authority and statistical output released from the remote server to disclose information about a person or organisation that was supplied by the other data custodian. This article considers analysis of only binary variables. The utility and disclosure risk of the proposed method are investigated both in a simulation and using a real example. This article shows that some popular protections against disclosure (dropping records, rounding regression coefficients or imposing restrictions on model selection) can be ineffective in the above setting.

eISSN:: 2001-7367
Language:: English

Publication timeframe:: 4 times per year
Journal Subjects:: Mathematics, Probability and Statistics

Journal RSS Feed

Disclosure-Protected Inference with Linked Microdata Using a Remote Analysis Server

Published Online: Feb 14, 2014

Page range: 123 - 146

DOI: https://doi.org/10.2478/jos-2014-0007

KeywordsConfidentiality, remote analysis, record linkage, statistical disclosure control

This content is open access.

Keywords
Confidentiality, remote analysis, record linkage, statistical disclosure control