Linear Regression Diagnostics in Cluster Samples

Open access


An extensive set of diagnostics for linear regression models has been developed to handle nonsurvey data. The models and the sampling plans used for finite populations often entail stratification, clustering, and survey weights, which renders many of the standard diagnostics inappropriate. In this article we adapt some influence diagnostics that have been formulated for ordinary or weighted least squares for use with stratified, clustered survey data. The statistics considered here include DFBETAS, DFFITS, and Cook's D. The differences in the performance of ordinary least squares and survey-weighted diagnostics are compared using complex survey data where the values of weights, response variables, and covariates vary substantially.

Atkinson, A.C., and M. Riani. 2000. Robust Diagnostic Regression Analysis. New York: Springer-Verlag.

Atkinson, A.C., and M. Riani. 2004. “The Forward Search and Data Visualization.” Computational Statistics 19: 29-54.

Bates, D., M. Maechler, B. Bolker and S. Walker. 2014. “lme4: Linear Mixed-Effects Models Using Eiqen and S4. R package version 1.1-7.” Available at: lme4 (accessed February 2, 2015).

Belsley, D.A., R. E. Kuh, and R. Welsch. 1980. Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. New York: John Wiley.

Binder, D.A. 1983. “On the Variances of Asymptotically Normal Estimators from Complex Surveys.” International Statistical Review 51: 279-292. DOI: Li and Valliant: Diagnostics in Cluster Samples 73

Chambers, R.L., A.H. Dorfman, and T.E. Wehrly. 1993. “Bias Robust Estimation in Finite Populations Using Nonparametric Calibration.” Journal of the American Statistical Association 88: 268-277. DOI:

Chambers, R.L. 1996. “Robust Case-Weighting for Multipurpose Establishment Surveys.” Journal of Official Statistics 12: 3-32.

Chambers, R.L., and C.J. Skinner. 2003. Analysis of Survey Data. New York: John Wiley.

DuMouchel, W.H., and G.J. Duncan. 1983. “Using Sample Survey Weights in Multiple Regression Analysis of Stratified Samples.” Journal of the American Statistical Association 78: 535-543. DOI:

Elliott, M. 2007. “Bayesian Weight Trimming for Generalized Linear Regression Models.” Survey Methodology 33: 23-34.

Fuller, W.A. 2002. “Regression Estimation for Survey Samples.” Survey Methodology 28: 5-23.

Graubard, B.I., and E.L. Korn. 1996. “Modelling the Sampling Design in the Analysis of Health Surveys.” Statistical Methods in Medical Research 5: 263-281. DOI:

Henry, K.A., and R. Valliant. 2012. “Methods for Adjusting Survey Weights When Estimating a Total.” In Proceedings of the Federal Committee on Statistical Methodology, January 10-12. Washington, DC. Available at: (accessed February 2, 2015)

Korn, E.L., and B.I. Graubard. 1999. Analysis of Health Surveys. New York: Wiley.

Korn, E.L., and B.I. Graubard. 2003. “Estimating Variance Components by Using Survey Data.” Journal of Royal Statistical Society B 65: 175-190. Part 1. DOI: Kott, P.S. 1991. “A Model-Based Look at Linear Regression with Survey Data.” American Statistician 45: 107-112. DOI:

Li, J., and R. Valliant. 2009. “Survey Weighted Hat Matrix and Leverages.” Survey Methodology 35: 15-24.

Li, J., and R. Valliant. 2011a. “Linear Regression Influence Diagnostics for Unclustered Survey Data.” Journal of Official Statistics 27: 99-119.

Li, J., and R. Valliant. 2011b. “Detecting Groups of Influential Observations in Linear Regression using Survey Data––Adapting the Forward Search Method.” Pakistan Journal of Statistics 27: 507-528.

Liao, D., and R. Valliant. 2012a. “Variance Inflation Factors in the Analysis of Complex Survey Data.” Survey Methodology 38: 53-62.

Liao, D., and R. Valliant. 2012b. “Condition Indexes and Variance Decompositions for Diagnosing Collinearity in Linear Model Analysis of Survey Data.” Survey Methodology 38: 189-202.

Longford, N.T. 1995. Models for Uncertainty in Educational Testing. New York: Springer-Verlag.

Miller, R.G., Jr. 1974. “An Unbalanced Jackknife.” The Annals of Statistics 2: 880-891.

Pfeffermann, D., and D.J. Holmes. 1985. “Robustness Considerations in the Choice of Method of Inference for the Regression Analysis of Survey Data.” Journal of the Royal Statistical Society A 148: 268-278. DOI: 74 Journal of Official Statistics

Pfeffermann, D., C.J. Skinner, D.J. Holmes, H. Goldstein, and J. Rasbash. 1998. “Weighting for Unequal Selection Probabilities in Multilevel Models.” Journal of the Royal Statistical Society B 60: 23-40. DOI: doi/10.1111/1467-9868.00106/abstract

Potter, F.A. 1988. “Survey of Procedures to Control Extreme Sampling Weights.” In Proceedings of the Section on SurveyResearchMethods: American Statistical Association, 453-458. Available at:

Potter, F.A. 1990. “Study of Procedures to Identify and Trim Extreme Sample Weights.” In Proceedings of the Survey Research Methods Section: American Statistical Association, 225-230. Available at: http://www/

Pukelsheim, F. 1994. “The Three Sigma Rule.” The American Statistician 48: 88-91.

Scott, A.J., and D. Holt. 1982. “The Effect of Two-Stage Sampling on Ordinary Least Squares Methods.” Journal of the American Statistical Association 77: 848-854.

Skinner, C.J., D. Holt, and T.M.F. Smith (eds.). 1989. Analysis of Complex Surveys. New York: Wiley.

Valliant, R., A.H. Dorfman, and R.M. Royall. 2000. Finite Population Sampling and Inference: A Prediction Approach. New York: Wiley.

Wolter, K. 2007. Introduction to Variance Estimation. New York: Springer.

Zaslavsky, A., N. Schenker, and T. Belin. 2001. “Downweighting Influential Clusters in Surveys: Application to the 1990 Post Enumeration Survey.” Journal of the American Statistical Association 96: 858-869.

Journal of Official Statistics

The Journal of Statistics Sweden

Journal Information

IMPACT FACTOR 2017: 0.662
5-year IMPACT FACTOR: 1.113

CiteScore 2017: 0.74

SCImago Journal Rank (SJR) 2017: 1.158
Source Normalized Impact per Paper (SNIP) 2017: 0.860


All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 159 159 31
PDF Downloads 67 67 18