Search Results

You are looking at 1 - 5 of 5 items for :

  • Author: Li-Chun Zhang x
  • Mathematics x
Clear All Modify Search
Open access

Li-Chun Zhang

Abstract

Register data that originate from administrative or other secondary sources are increasingly being used to generate statistical outputs directly. The coverage of the input datasets is an important issue in this respect. Traditionally capture-recapture models have been used to deal with multiple list enumerations subjected to undercoverage errors. The aim of this article is to scope possible approaches to modelling capture-recapture data with additional overcoverage error. Attention is primarily given to model interpretations and conditions under which a model may provide a plausible basis for estimation and uncertainty evaluation. The setting with two list enumerations is examined in depth as it is the most common in practice. Models that can be extended to include more than two lists are identified. An additional independent coverage survey with only undercoverage error is always needed for estimation. Potential application to census coverage-error adjustment is discussed.

Open access

Li-Chun Zhang

Abstract

The problem of inference about the joint distribution of two categorical variables based on knowledge or observations of their marginal distributions, to be referred to as categorical data fusion in this paper, is relevant in statistical matching, ecological inference, market research, and several other related fields. This article organizes the use of proxy variables, to be distinguished from other auxiliary variables, both in terms of their effects on the uncertainty of fusion and the techniques of fusion. A measure of the gains of efficiency is provided, which incorporates both the identification uncertainty associated with data fusion and the sampling uncertainty that arises when the theoretical bounds of the uncertainty space are unknown and need to be estimated. Several existing techniques for generating fusion distributions (or datasets) are described and some new ones proposed. Analysis of real-life data demonstrates empirically that proxy variables can make data fusion more precise and the constructed fusion distribution more plausible.

Open access

Arnout van Delden, Boris Lorenc, Peter Struijs and Li-Chun Zhang

Open access

Evangelos Ioannidis, Takis Merkouris, Li-Chun Zhang, Martin Karlberg, Michalis Petrakos, Fernando Reis and Photis Stavropoulos

Open access

Evangelos Ioannidis, Takis Merkouris, Li-Chun Zhang, Martin Karlberg, Michalis Petrakos, Fernando Reis and Photis Stavropoulos

Abstract

This article considers a modular approach to the design of integrated social surveys. The approach consists of grouping variables into ‘modules’, each of which is then allocated to one or more ‘instruments’. Each instrument is then administered to a random sample of population units, and each sample unit responds to all modules of the instrument. This approach offers a way of designing a system of integrated social surveys that balances the need to limit the cost and the need to obtain sufficient information. The allocation of the modules to instruments draws on the methodology of split questionnaire designs. The composition of the instruments, that is, how the modules are allocated to instruments, and the corresponding sample sizes are obtained as a solution to an optimisation problem. This optimisation involves minimisation of respondent burden and data collection cost, while respecting certain design constraints usually encountered in practice. These constraints may include, for example, the level of precision required and dependencies between the variables. We propose using a random search algorithm to find approximate optimal solutions to this problem. The algorithm is proved to fulfil conditions that ensure convergence to the global optimum and can also produce an efficient design for a split questionnaire.