Small-Area Estimation with Zero-Inflated Data – a Simulation Study

Open access

Abstract

Many target variables in official statistics follow a semicontinuous distribution with a mixture of zeros and continuously distributed positive values. Such variables are called zero inflated. When reliable estimates for subpopulations with small sample sizes are required, model-based small-area estimators can be used, which improve the accuracy of the estimates by borrowing information from other subpopulations. In this article, three small-area estimators are investigated. The first estimator is the EBLUP, which can be considered the most common small-area estimator and is based on a linear mixed model that assumes normal distributions. Therefore, the EBLUP is model misspecified in the case of zero-inflated variables. The other two small-area estimators are based on a model that takes zero inflation explicitly into account. Both the Bayesian and the frequentist approach are considered. These small-area estimators are compared with each other and with design-based estimation in a simulation study with zero-inflated target variables. Both a simulation with artificial data and a simulation with real data from the Dutch Household Budget Survey are carried out. It is found that the small-area estimators improve the accuracy compared to the design-based estimator. The amount of improvement strongly depends on the properties of the population and the subpopulations of interest.

Bafumi, J. and A. Gelman. 2006. “Fitting Multilevel Models When Predictors and Group Effects Correlate.” Manuscript prepared for the 2006 Annual Meeting of the Midwest Political Science Association, Chicago. DOI: http://dx.doi.org/10.2139/ssrn.1010095.

Bates, D., M. Mächler, B. Bolker, and S. Walker. 2015. “Fitting Linear Mixed-Effects Models Using lme4.” Journal of Statistical Software 67(1). Doi: http://dx.doi.org/10.18637/jss.v067.i01.

Battese, G., R. Harter, and W. Fuller. 1988. “An Error-Components Model for Prediction of County Crop Areas Using Survey and Satellite Data.” Journal of the American Statistical Association 83: 28-36. Doi: http://dx.doi.org/10.2307/2288915.

Boonstra, H., J. van den Brakel, B. Buelens, S. Krieg, and M. Smeets. 2008. “Towards Small Area Estimation at Statistics Netherlands.” Metron LXVI (1): 21-49. Available at: https://www.researchgate.net/profile/Jan_Brakel/publication/227458249_Towards_small_area_estimation_at_Statistics_Netherlands/links/0c96052f8fdda8aedd000000.pdf.

Chambers, R. and N. Tzavidis. 2006. “M-Quantile Models for Small Area Estimation.” Biometrika 93: 255-268. Doi: http://dx.doi.org/10.1093/biomet/93.2.255.

Chandra, H. and R. Chambers. 2011a. “Small Area Estimation for Skewed Data in Presence of Zeros.” The Bulletin of Calcutta Statistical Association 63: 249-252.

Chandra, H. and R. Chambers. 2011b. “Small Area Estimation under Transformation to Linearity.” Survey Methodology 37: 39-51.

Chandra, H. and U. Sud. 2012. “Small Area Estimation for Zero-Inflated Data.” Communications in Statistics - Simulation and Computation 41: 632-642. Doi: http://dx.doi.org/10.1080/03610918.2011.598991.

Dreassi, E., A. Petrucci, and E. Rocco. 2012. “Small Area Estimation for Semicontinuous Skewed Georeferenced Data.” Technical Report Working paper 2012/05, Dipartimento di Statistica “G. Parenti”, Florence. Available at: http://local.disia.unifi.it/ricerca/pubblicazioni/working_papers/2012/wp2012_05.pdf.

Eurarea. 2004. Project Reference Volume, deliverable d7.1.4. Technical report, EURAREA consortium.

Gelfand, A. and A. Smith. 1990. “Sampling Based Approaches to Calculating Marginal Densities.” Journal of the American Statistical Association 85: 398-409. Doi: http://dx.doi.org/10.1080/01621459.1990.10476213.

Gelman, A. 2006. “Prior Distributions for Variance Parameters in Hierarchical Models.” Bayesian Analysis 1: 515-534. Doi: http://dx.doi.org/10.1214/06-BA117A.

Gelman, A., D.V. Dyk, Z. Huang, and W. Boscardin. 2008. “Using Redundant Parameterizations to Fit Hierarchical Models.” Journal of Computational and Graphical Statistics 17: 95-122. Doi: http://dx.doi.org/10.1198/106186008X287337.

Geman, S. and D. Geman. 1984. “Stochastic Relaxation, Gibbs Distributions and the Bayesian Restoration of Images.” IEEE Transactions on Pattern Analysis and Machine Intelligence 6: 721-741. Doi: http://dx.doi.org/10.1109/TPAMI.1984.4767596.

Hadfield, J.D. 2010. “Mcmc Methods for Multi-Response Generalized Linear Mixed Models: The MCMCglmm R Package.” Journal of Statistical Software 33: 1-22. Doi: http://dx.doi.org/10.18637/jss.v033.i02.

Neuhaus, J. and C. McCulloch. 2006. “Separating Between- and Within-Cluster Covariate Effects by Using Conditional and Partitioning Methods.” Journal of the Royal Statistical Society B 68: 859-872. Doi: http://dx.doi.org/10.1111/j.1467-9868.2006.00570.x.

Pfeffermann, D., B. Terryn, and F. Moura. 2008. “Small Area Estimation under a Two Part Random Effects Model with Application to Estimation of Literacy in Developing Countries.” Survey Methodology 34: 67-72.

Plummer, M., N. Best, K. Cowles, and K. Vines. 2006. “Coda: Convergence Diagnosis and Output Analysis for mcmc.”RNews 6: 7-11. Available at: http://oro.open.ac.uk/id/eprint/22547 (accessed 13 October, 2016).

R Development Core Team. 2009. R: A Language and Environment for Statistical Computing. Technical Report, R Foundation for Statistical Computing, Vienna. Available at: http://www.R-project.org. (accessed 13 October, 2016).

Rao, J.N.K. 2003. Small Area Estimation. New York: John Wiley.

Särndal, C., B. Swensson, and J. Wretman. 1992. Model Assisted Survey Sampling. New York: Springer Verlag.

Sinha, S.K. and J.N.K. Rao. 2009. “Robust Methods for Small Area Estimation.” Canadian Journal of Statistics 37: 381-399. Doi: http://dx.doi.org/10.1002/cjs.10029.

Woodruff, R. 1966. “Use of a Regression Technique to Produce Area Breakdowns of the Monthly National Estimates of Retail Trade.” Journal of the American Statistical Association 61: 496-504. Doi: http://dx.doi.org/10.2307/2282839.

Journal of Official Statistics

The Journal of Statistics Sweden

Journal Information


IMPACT FACTOR 2017: 0.662
5-year IMPACT FACTOR: 1.113

CiteScore 2017: 0.74

SCImago Journal Rank (SJR) 2017: 1.158
Source Normalized Impact per Paper (SNIP) 2017: 0.860

Metrics

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 234 234 28
PDF Downloads 112 112 11