On Proxy Variables and Categorical Data Fusion

Li-Chun Zhang

Open Access

On Proxy Variables and Categorical Data Fusion

Li-Chun Zhang

| Dec 16, 2015

Journal of Official Statistics

Volume 31 (2015): Issue 4 (December 2015)

About this article

Cite

Page range: 783 - 807

Received: Jul 01, 2013

Accepted: Sep 01, 2015

DOI: https://doi.org/10.1515/jos-2015-0045

Keywords
Identification problem, sampling uncertainty, uncertainty analysis, fusion distribution, fusion data, proxy variable, relative efficiency

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

The problem of inference about the joint distribution of two categorical variables based on knowledge or observations of their marginal distributions, to be referred to as categorical data fusion in this paper, is relevant in statistical matching, ecological inference, market research, and several other related fields. This article organizes the use of proxy variables, to be distinguished from other auxiliary variables, both in terms of their effects on the uncertainty of fusion and the techniques of fusion. A measure of the gains of efficiency is provided, which incorporates both the identification uncertainty associated with data fusion and the sampling uncertainty that arises when the theoretical bounds of the uncertainty space are unknown and need to be estimated. Several existing techniques for generating fusion distributions (or datasets) are described and some new ones proposed. Analysis of real-life data demonstrates empirically that proxy variables can make data fusion more precise and the constructed fusion distribution more plausible.

eISSN:: 2001-7367
Language:: English

Publication timeframe:: 4 times per year
Journal Subjects:: Mathematics, Probability and Statistics

Journal RSS Feed

On Proxy Variables and Categorical Data Fusion

Published Online: Dec 16, 2015

Page range: 783 - 807

Received: Jul 01, 2013

Accepted: Sep 01, 2015

DOI: https://doi.org/10.1515/jos-2015-0045

Keywords
Identification problem, sampling uncertainty, uncertainty analysis, fusion distribution, fusion data, proxy variable, relative efficiency

© 2015 Li-Chun Zhang, published by De Gruyter Open

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

On Proxy Variables and Categorical Data Fusion

Published Online: Dec 16, 2015

Page range: 783 - 807

Received: Jul 01, 2013

Accepted: Sep 01, 2015

DOI: https://doi.org/10.1515/jos-2015-0045

KeywordsIdentification problem, sampling uncertainty, uncertainty analysis, fusion distribution, fusion data, proxy variable, relative efficiency

© 2015 Li-Chun Zhang, published by De Gruyter Open

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

Keywords
Identification problem, sampling uncertainty, uncertainty analysis, fusion distribution, fusion data, proxy variable, relative efficiency