1 Mathematica Policy Research, Princeton, NJ 08543, USA.
2 Dept. of Biostatistics, University of Michigan School of Public Health, 1415 Washington Heights, Ann Arbor, MI USA 48109; Survey Methodology Program, Institute for Social Research, University of Michigan, 426 Thompson St., Ann Arbor, MI 48109, USA.
Multiple imputation (MI) is commonly used when item-level missing data are present. However, MI requires that survey design information be built into the imputation models. For multistage stratified clustered designs, this requires dummy variables to represent strata as well as primary sampling units (PSUs) nested within each stratum in the imputation model. Such a modeling strategy is not only operationally burdensome but also inferentially inefficient when there are many strata in the sample design. Complexity only increases when sampling weights need to be modeled. This article develops a generalpurpose analytic strategy for population inference from complex sample designs with item-level missingness. In a simulation study, the proposed procedures demonstrate efficient estimation and good coverage properties. We also consider an application to accommodate missing body mass index (BMI) data in the analysis of BMI percentiles using National Health and Nutrition Examination Survey (NHANES) III data. We argue that the proposed methods offer an easy-to-implement solution to problems that are not well-handled by current MI techniques. Note that, while the proposed method borrows from the MI framework to develop its inferential methods, it is not designed as an alternative strategy to release multiply imputed datasets for complex sample design data, but rather as an analytic strategy in and of itself.
Anderson, D. and M. Aitkin. 1985. “Variance Component Models With Binary Response: Interviewer Variability.” Journal of the Royal Statistical Society, Series B: Statistical Methodology 47: 203-210.
Cohen, M. P. 1997. “The Bayesian Bootstrap and Multiple Imputation for Unequal Probability Sample Designs.” In Proceedings of the Section on Survey Research Methods, American Statistical Association (ASA), Anaheim, CA, 1997, 635-638.
Dong, Q., M.R. Elliott, and T.E. Raghunathan. 2014. “A Nonparametric Method to Generate Synthetic Populations to Adjust for Complex Sample Design.” Survey Methodology 40: 29-46
Efron, B. 1979. “Bootstrap Methods: Another Look at the Jackknife.” Annals of Statistics 7: 1-26.
Francisco, C.A. and W.A. Fuller. 1991. “Quantile Estimation With a Complex Survey Design.” Annals of Statististics 19: 454-469.
Kim, J.K., M.J. Brick, W.A. Fuller, and G. Kalton. 2006. “On the Bias of the Multiple- Imputation Variance Estimator in Survey Sampling.” Journal of the Royal Statistical Society, Series B: Statistical Methodology 68: 509-521. Doi: http://dx.doi.org/10.1111/j.1467-9868.2006.00546.x.
King, G. and L. Zeng. 2001. “Logistic Regression in Rare Events Data.” Political Analysis 9: 137-163.
Kovar, J.G., J.N.K. Rao, and C.F.J. Wu. 1988. “Bootstrap and Other Methods to Measure Errors in Survey Estimates.” Canadian Journal of Statistics 16: 25-45.
Little, R.J. and D.B. Rubin. 2002. Statistical Analysis with Missing Data, (2nd ed.). New York: Wiley and Sons, New York.
Little, R.J. and H. Zheng. 2007. “The Bayesian Approach to the Analysis of Finite Population Surveys.” Bayesian Statistics 8: 283-302.
Lo, A.Y. 1988. “A Bayesian Bootstrap for a Finite Population.” The Annals of Statistics 16: 1684-1695.
McCarthy, P.J., and C.B. Snowden. 1985. “The Bootstrap and Finite Population Sampling. Vital and Health Statistics.” Data Evaluation and Methods Research, Series 2, No. 95. Public Health Service Publication 85-1369, U.S. Government Printing Office, Washington Meng, X.L. 1994. “Multiple Imputation Inferences With Uncongenial Sources of Input.” Statistical Science 9: 538-558. Doi: http://dx.doi.org/10.1214/ss/1177010269.
National Center for Health Statistics. 1996. Analytic And Reporting Guidelines: The Third National Health and Nutrition Examination Survey, NHANES III (1988-94). National Center for Health Statistics, Centers for Disease Control and Prevention, Hyattsville Maryland. Available at: http://www.cdc.gov/nchs/data/nhanes/nhanes3/nh3gui.pdf (accessed May 22, 2014)
Rao, J.N.K. and C.F.J. Wu. 1988. “Resampling Inference With Complex Survey Data.” Journal of the American Statistical Association 83: 231-241. Doi: http://dx.doi.org/10.2307/2288945.
Rao, J.N.K.C.F., J. Wu, and K. Yue. 1992. “Some Recent Work on Resampling Methods for Complex Surveys.” Survey Methodology 18: 209-217.
Reiter, J.P., T.E. Raghunathan, and S.K. Kinney. 2006. “The Importance of Modeling the Sampling Design in Multiple Imputation for Missing Data.” Survey Methodology 32: 143-149.
Rubin, D.B. 1976. “Inference and Missing Data.” Biometrika 63: 581-592.
Rubin, D.B. 1987. Multiple Imputation for Nonresponse in Surveys. New York: Wiley.
Rubin, D.B. 1996. “Multiple Imputation After 18þYears.” Journal of the American Statistical Association 91: 473-489. Doi: http://dx.doi.org/10.2307/2291635.
Rust, K. and J.N.K. Rao. 1996. “Variance Estimation for Complex Estimators in Sample Surveys.” Statistics in Medical Research 5: 381-397.
Schafer, J.L. 1997. Analysis of Incomplete Multivariate Data. London: Chapman and Hall.
Schenker, N., T.E. Raghunathan, P. Chiu, D.M. Makuc, G. Zhang, and A.J. Cohen. 2006. “Multiple Imputation of Missing Income Data in the National Health Interview Survey.” Journal of the American Statistical Association 101: 924-933. Doi: http://dx.doi.org/10.1198/016214505000001375.
Stiratelli, R., N. Laird, and J. Ware. 1984. “Random-Effects Models for Serial Observations With Binary Response.” Biometrics 40: 961-971. Doi: http://dx.doi.org/10.2307/2531147.
Wei, Y., Y. Ma, and R.J. Carroll. 2012. “Multiple Imputation in Quantile Regression.” Biometrika 99: 423-438. Doi: http://dx.doi.org/10.1093/biomet/ass007.
Wolter, K.M. 2007. Introduction to Variance Estimation. New York: Springer.
Woodruff, R. 1952. “Confidence Interval for Medians and Other Position Measures.” Journal of the American Statistical Association 47: 635-646. Doi: http://dx.doi.org/10.1080/01621459.1952.10483443.
Yang, S., J.K. Kim, and D.W. Shin. 2013. “Imputation Methods for Quantile Estimation under Missing at Random.” Statistics and Its Interface 6: 369-377.
Yuan, Y. and R.J. Little. 2007. “Parametric and Semiparametric Model-Based Estimates of the Finite Population Mean for Two-Stage Cluster Samples With Item Nonresponse.” Biometrics 63: 1172-1180. Doi: http://dx.doi.org/10.1111/j.1541-0420.2007.00816.x.
Zhao, E. and R.M. Yucel. 2009. “Performance of Sequential Imputation Method in Multilevel Applications.” In Proceedings of the Section on Survey Research Methods, American Statistical Association ASA, August, Washington D.C., 2800-2810.
Zhou, H. 2014. “Accounting for Complex Sample Designs in Multiple Imputation Using the Finite Population Bayesian Bootstrap.” Unpublished PhD Thesis