Data Smearing: An Approach to Disclosure Limitation for Tabular Data

Statistical agencies often collect sensitive data for release to the public at aggregated levels in the form of tables. To protect confidential data, some cells are suppressed in the publicly released data. One problem with this method is that many cells of interest must be suppressed in order to protect a much smaller number of sensitive cells. Another problem is that the covariates used to aggregate and level of aggregation must be fixed before the data is released. Both of these restrictions can severely limit the utility of the data. We propose a new disclosure limitation method that replaces the full set of microdata with synthetic data for use in producing released data in tabular form. This synthetic data set is obtained by replacing each unit’s values with a weighted average of sampled values from the surrounding area. The synthetic data is produced in a way to give asymptotically unbiased estimates for aggregate cells as the number of units in the cell increases. The method is applied to the U.S. Bureau of Labor Statistics Quarterly Census of Employment and Wages data, which is released to the public quarterly in tabular form and aggregated across varying scales of time, area, and economic sector.

eISSN:: 2001-7367
Language:: English

Publication timeframe:: 4 times per year
Journal Subjects:: Mathematics, Probability and Statistics

Journal RSS Feed

Data Smearing: An Approach to Disclosure Limitation for Tabular Data

Published Online: Dec 11, 2014

Page range: 839 - 857

Received: Dec 01, 2012

Accepted: Sep 01, 2014

DOI: https://doi.org/10.2478/jos-2014-0050

Keywords
Cell suppression, contingency tables, synthetic data, confidentiality, multiple imputation, nearest neighbor

© by Daniell Toth

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

Data Smearing: An Approach to Disclosure Limitation for Tabular Data

Published Online: Dec 11, 2014

Page range: 839 - 857

Received: Dec 01, 2012

Accepted: Sep 01, 2014

DOI: https://doi.org/10.2478/jos-2014-0050

KeywordsCell suppression, contingency tables, synthetic data, confidentiality, multiple imputation, nearest neighbor

© by Daniell Toth

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

Keywords
Cell suppression, contingency tables, synthetic data, confidentiality, multiple imputation, nearest neighbor