Synthetic Multiple-Imputation Procedure for Multistage Complex Samples

Open access


Multiple imputation (MI) is commonly used when item-level missing data are present. However, MI requires that survey design information be built into the imputation models. For multistage stratified clustered designs, this requires dummy variables to represent strata as well as primary sampling units (PSUs) nested within each stratum in the imputation model. Such a modeling strategy is not only operationally burdensome but also inferentially inefficient when there are many strata in the sample design. Complexity only increases when sampling weights need to be modeled. This article develops a generalpurpose analytic strategy for population inference from complex sample designs with item-level missingness. In a simulation study, the proposed procedures demonstrate efficient estimation and good coverage properties. We also consider an application to accommodate missing body mass index (BMI) data in the analysis of BMI percentiles using National Health and Nutrition Examination Survey (NHANES) III data. We argue that the proposed methods offer an easy-to-implement solution to problems that are not well-handled by current MI techniques. Note that, while the proposed method borrows from the MI framework to develop its inferential methods, it is not designed as an alternative strategy to release multiply imputed datasets for complex sample design data, but rather as an analytic strategy in and of itself.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • Anderson D. and M. Aitkin. 1985. “Variance Component Models With Binary Response: Interviewer Variability.” Journal of the Royal Statistical Society Series B: Statistical Methodology 47: 203-210.

  • Cohen M. P. 1997. “The Bayesian Bootstrap and Multiple Imputation for Unequal Probability Sample Designs.” In Proceedings of the Section on Survey Research Methods American Statistical Association (ASA) Anaheim CA 1997 635-638.

  • Dong Q. M.R. Elliott and T.E. Raghunathan. 2014. “A Nonparametric Method to Generate Synthetic Populations to Adjust for Complex Sample Design.” Survey Methodology 40: 29-46

  • Efron B. 1979. “Bootstrap Methods: Another Look at the Jackknife.” Annals of Statistics 7: 1-26.

  • Francisco C.A. and W.A. Fuller. 1991. “Quantile Estimation With a Complex Survey Design.” Annals of Statististics 19: 454-469.

  • Kim J.K. M.J. Brick W.A. Fuller and G. Kalton. 2006. “On the Bias of the Multiple- Imputation Variance Estimator in Survey Sampling.” Journal of the Royal Statistical Society Series B: Statistical Methodology 68: 509-521. Doi:

  • King G. and L. Zeng. 2001. “Logistic Regression in Rare Events Data.” Political Analysis 9: 137-163.

  • Kovar J.G. J.N.K. Rao and C.F.J. Wu. 1988. “Bootstrap and Other Methods to Measure Errors in Survey Estimates.” Canadian Journal of Statistics 16: 25-45.

  • Little R.J. and D.B. Rubin. 2002. Statistical Analysis with Missing Data (2nd ed.). New York: Wiley and Sons New York.

  • Little R.J. and H. Zheng. 2007. “The Bayesian Approach to the Analysis of Finite Population Surveys.” Bayesian Statistics 8: 283-302.

  • Lo A.Y. 1988. “A Bayesian Bootstrap for a Finite Population.” The Annals of Statistics 16: 1684-1695.

  • McCarthy P.J. and C.B. Snowden. 1985. “The Bootstrap and Finite Population Sampling. Vital and Health Statistics.” Data Evaluation and Methods Research Series 2 No. 95. Public Health Service Publication 85-1369 U.S. Government Printing Office Washington Meng X.L. 1994. “Multiple Imputation Inferences With Uncongenial Sources of Input.” Statistical Science 9: 538-558. Doi:

  • National Center for Health Statistics. 1996. Analytic And Reporting Guidelines: The Third National Health and Nutrition Examination Survey NHANES III (1988-94). National Center for Health Statistics Centers for Disease Control and Prevention Hyattsville Maryland. Available at: (accessed May 22 2014)

  • Rao J.N.K. and C.F.J. Wu. 1988. “Resampling Inference With Complex Survey Data.” Journal of the American Statistical Association 83: 231-241. Doi:

  • Rao J.N.K.C.F. J. Wu and K. Yue. 1992. “Some Recent Work on Resampling Methods for Complex Surveys.” Survey Methodology 18: 209-217.

  • Reiter J.P. T.E. Raghunathan and S.K. Kinney. 2006. “The Importance of Modeling the Sampling Design in Multiple Imputation for Missing Data.” Survey Methodology 32: 143-149.

  • Rubin D.B. 1976. “Inference and Missing Data.” Biometrika 63: 581-592.

  • Rubin D.B. 1987. Multiple Imputation for Nonresponse in Surveys. New York: Wiley.

  • Rubin D.B. 1996. “Multiple Imputation After 18þYears.” Journal of the American Statistical Association 91: 473-489. Doi:

  • Rust K. and J.N.K. Rao. 1996. “Variance Estimation for Complex Estimators in Sample Surveys.” Statistics in Medical Research 5: 381-397.

  • Schafer J.L. 1997. Analysis of Incomplete Multivariate Data. London: Chapman and Hall.

  • Schenker N. T.E. Raghunathan P. Chiu D.M. Makuc G. Zhang and A.J. Cohen. 2006. “Multiple Imputation of Missing Income Data in the National Health Interview Survey.” Journal of the American Statistical Association 101: 924-933. Doi:

  • Stiratelli R. N. Laird and J. Ware. 1984. “Random-Effects Models for Serial Observations With Binary Response.” Biometrics 40: 961-971. Doi:

  • Wei Y. Y. Ma and R.J. Carroll. 2012. “Multiple Imputation in Quantile Regression.” Biometrika 99: 423-438. Doi:

  • Wolter K.M. 2007. Introduction to Variance Estimation. New York: Springer.

  • Woodruff R. 1952. “Confidence Interval for Medians and Other Position Measures.” Journal of the American Statistical Association 47: 635-646. Doi:

  • Yang S. J.K. Kim and D.W. Shin. 2013. “Imputation Methods for Quantile Estimation under Missing at Random.” Statistics and Its Interface 6: 369-377.

  • Yuan Y. and R.J. Little. 2007. “Parametric and Semiparametric Model-Based Estimates of the Finite Population Mean for Two-Stage Cluster Samples With Item Nonresponse.” Biometrics 63: 1172-1180. Doi:

  • Zhao E. and R.M. Yucel. 2009. “Performance of Sequential Imputation Method in Multilevel Applications.” In Proceedings of the Section on Survey Research Methods American Statistical Association ASA August Washington D.C. 2800-2810.

  • Zhou H. 2014. “Accounting for Complex Sample Designs in Multiple Imputation Using the Finite Population Bayesian Bootstrap.” Unpublished PhD Thesis

Journal information
Impact Factor

IMPACT FACTOR 2018: 0,837
5-year IMPACT FACTOR: 0,934

CiteScore 2018: 1.04

SCImago Journal Rank (SJR) 2018: 0.963
Source Normalized Impact per Paper (SNIP) 2018: 1.020

Cited By
All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 566 380 9
PDF Downloads 238 167 10