Statistical matching is the term for the integration of two or more data files that share a partially overlapping set of variables. Its aim is to obtain joint information on variables collected in different surveys based on different observation units. This naturally leads to an identification problem, since there is no observation that contains information on all variables of interest.
We develop the first statistical matching micro approach reflecting the natural uncertainty of statistical matching arising from the identification problem in the context of categorical data. A complete synthetic file is obtained by imprecise imputation, replacing missing entries by sets of suitable values. Altogether, we discuss three imprecise imputation strategies and propose ideas for potential refinements.
Additionally, we show how the results of imprecise imputation can be embedded into the theory of finite random sets, providing tight lower and upper bounds for probability statements. The results based on a newly developed simulation design–which is customised to the specific requirements for assessing the quality of a statistical matching procedure for categorical data–corroborate that the narrowness of these bounds is practically relevant and that these bounds almost always cover the true parameters.