Disclosure Risk from Factor Scores

Open access

Abstract

Remote access can be a powerful tool for providing data access for external researchers. Since the microdata never leave the secure environment of the data-providing agency, alterations of the microdata can be kept to a minimum. Nevertheless, remote access is not free from risk. Many statistical analyses that do not seem to provide disclosive information at first sight can be used by sophisticated intruders to reveal sensitive information. For this reason the list of allowed queries is usually restricted in a remote setting. However, it is not always easy to identify problematic queries. We therefore strongly support the argument that has been made by other authors: that all queries should be monitored carefully and that any microlevel information should always be withheld. As an illustrative example, we use factor score analysis, for which the output of interest - the factor loading of the variables - seems to be unproblematic. However, as we show in the article, the individual factor scores that are usually returned as part of the output can be used to reveal sensitive information. Our empirical evaluations based on a German establishment survey emphasize that this risk is far from a purely theoretical problem.

Bartlett, M. (1937). The Statistical Conception of Mental Factors. British Journal of Psychology, 28, 97-104. DOI: http://www.dx.doi.org/10.1111/j.2044-8295.1937.tb00863.x

Bleninger, P., Drechsler, J., and Ronning, G. (2011). Remote Data Access and the Risk of Disclosure from Linear Regression. SORT, Special Issue: Privacy in Statistical Databases, 7-24.

Brandt, M., Franconi, L., Guerke, C., Hundepool, A., Lucarelli, M., Mol, J., Ritchie, F., Seri, G., and Welpton, R. (2010). Guidelines for the Checking of Output Based on Microdata Research. Final report of ESSnet sub-group on output SDC.

Cross-National Data Center in Luxembourg (2012a). Available at: http://www. lisdatacenter.org (accessed January 17, 2014).

Cross-National Data Center in Luxembourg (2012b). Available at: http://www. lisdatacenter.org/data-access/lissy/best-practices/ (accessed January 17, 2014).

Drechsler, J. (2011). Multiple Imputation in Practice - a Case Study Using a Complex German Establishment Survey. Advances in Statistical Analysis, 95, 1-26. DOI: http:// www.dx.doi.org/10.1007/s10182-010-0136-z Dwork, C. (2006). Differential Privacy. In Proceedings of the 33rd International Colloquium on Automata, Languages, and Programming (ICALP), 1-12.

Fahrmeir, L., Hamerle, A., and Tutz, G. (1996). Multivariate Statistische Verfahren, (2nd edn). Berlin: De Gruyter.

Fischer, G., Janik, F., Mu¨ller, D., and Schmucker, A. (2009). The IAB Establishment Panel - Things Users Should Know. Schmollers Jahrbuch - Journal of Applied Social Science Studies, 129, 133-148. DOI: http://www.dx.doi.org/10.3790/schm.129.1.133

Gomatam, S., Karr, A., Reiter, J., and Sanil, A. (2005). Data Dissemination and Disclosure Limitation in a World Without Microdata: A Risk-Utility Framework for Remote Access Analysis Servers. Statistical Science, 20, 163-177. DOI: http://www.dx.doi. org/10.1214/088342305000000043

Heining, J. (2009). The Research Data Centre of the German Federal Employment Agency: Data Supply and Demand Between 2004 and 2009. RatSWD working paper, 129.

Horst, P. (1965). Factor Analysis of Data Matrices. New York: Holt, Rinehart & Winston.

Kaiser, H. (1958). The Varimax Criterion for Analytic Rotation in Factor Analysis. Psychometrika, 23, 3, 187-200. DOI: http://www.dx.doi.org/10.1007/BF02289233

Kölling, A. (2000). The IAB-Establishment Panel. Journal of Applied Social Science Studies, 120, 291-300.

Lucero, J., Freiman, M., Singh, L., You, J., DePersio, M., and Zayatz, L. (2011). The Microdata Analysis System at the U.S. Census Bureau. SORT, Special Issue: Privacy in Statistical Databases, 77-98. McDonald, R. and Burr, E. (1967). A Comparison of Four Methods for Constructing Factor Scores. Psychometrika, 32, 381-401. DOI: http://www.dx.doi.org/10.1007/ BF02289653

O’Keefe, C., Sparks, R., McAullay, D., and Loong, B. (2012). Confidentialising Survival Analysis Output in a Remote Data Access System. Journal of Privacy and Confidentiality 4. Available at: http://repository.cmu.edu/jpc/vol4/iss1/6 (accessed January 17, 2014).

O’Keefe, C.M. and Good, N.M. (2008). A Remote Analysis Server - What Does Regression Output Look Like? In Privacy in Statistical Databases, J. Domingo-Ferrer and Y. Saygin (eds), vol 5262 of Lecture Notes in Computer Science. New York: Springer, 270-283. Press, S. (2005). Applied Multivariate Analysis: Using Bayesian and Frequentist Methods of Inference, (2nd edn). New York: Dover Publications.

Research Data Center of the National Center for Health Statistics (2012a). Available at: http://www.cdc.gov/rdc/B2AccessMod/ACs230.htm (accessed January 17, 2014).

Research Data Center of the National Center for Health Statistics (2012b). Available at: http://www.cdc.gov/rdc/Data/B2/SASSUDAANRestrictions.pdf (accessed January 17, 2014).

Ronning, G. and Bleninger, P. (2011). Disclosure Risk From Factor Scores. Technical Report, IAW Discussion Papers 73. Available at: http://www.iaw.edu/w/IAWPDF. php?id¼886&name¼iaw_dp_73.pdf (accessed January 17, 2014).

Sparks, R., Carter, C., Donnelly, J., O’Keefe, C., Duncan, J., Keighley, T., and McAullay,D. (2008). Remote Access Methods for Exploratory Data Analysis and Statistical Modelling: Privacy-preserving Analytics. Comput Methods Programs Biomed, 91, 208-222. DOI: http://www.dx.doi.org/10.1016/j.cmpb.2008.04.001

Stock, J. and Watson, M. (2002). Forecasting Using Principal Components From a Large Number of Predictors. Journal of the American Statistical Association, 97, 1167-1179. DOI: http://www.dx.doi.org/10.1198/016214502388618960

Thomson, G. (1939). The Factorial Analysis of Human Ability. London: University of London Press.

Thurstone, L. (1935). The Vectors of Mind. Chicago: University of Chicago Press.

Journal of Official Statistics

The Journal of Statistics Sweden

Journal Information


IMPACT FACTOR 2017: 0.662
5-year IMPACT FACTOR: 1.113

CiteScore 2016: 0.63

SCImago Journal Rank (SJR) 2016: 0.710
Source Normalized Impact per Paper (SNIP) 2016: 0.975

Metrics

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 48 48 19
PDF Downloads 13 13 4