Quality Indicators for Statistical Disclosure Methods: A Case Study on the Structure of Earnings Survey

Scientific- or public-use files are typically produced by applying anonymisation methods to the original data. Anonymised data should have both low disclosure risk and high data utility.

Data utility is often measured by comparing well-known estimates from original data and anonymised data, such as comparing their means, covariances or eigenvalues.

However, it is a fact that not every estimate can be preserved. Therefore the aim is to preserve the most important estimates, that is, instead of calculating generally defined utility measures, evaluation on context/data dependent indicators is proposed.

In this article we define such indicators and utility measures for the Structure of Earnings Survey (SES) microdata and proper guidelines for selecting indicators and models, and for evaluating the resulting estimates are given. For this purpose, hundreds of publications in journals and from national statistical agencies were reviewed to gain insight into how the SES data are used for research and which indicators are relevant for policy making.

Besides the mathematical description of the indicators and a brief description of the most common models applied to SES, four different anonymisation procedures are applied and the resulting indicators and models are compared to those obtained from the unmodified data. The disclosure risk is reported and the data utility is evaluated for each of the anonymised data sets based on the most important indicators and a model which is often used in practice.

eISSN:: 2001-7367
Language:: English

Publication timeframe:: 4 times per year
Journal Subjects:: Mathematics, Probability and Statistics

Journal RSS Feed

Quality Indicators for Statistical Disclosure Methods: A Case Study on the Structure of Earnings Survey

Published Online: Dec 16, 2015

Page range: 737 - 761

Received: Oct 01, 2012

Accepted: Jan 01, 2015

DOI: https://doi.org/10.1515/jos-2015-0043

KeywordsStatistical disclosure control, data utility, quality indicators, R

© 2015 Matthias Templ, published by De Gruyter Open

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

Keywords
Statistical disclosure control, data utility, quality indicators, R