Small Area Model-Based Estimators Using Big Data Sources

Open access

Abstract

The timely, accurate monitoring of social indicators, such as poverty or inequality, on a finegrained spatial and temporal scale is a crucial tool for understanding social phenomena and policymaking, but poses a great challenge to official statistics. This article argues that an interdisciplinary approach, combining the body of statistical research in small area estimation with the body of research in social data mining based on Big Data, can provide novel means to tackle this problem successfully. Big Data derived from the digital crumbs that humans leave behind in their daily activities are in fact providing ever more accurate proxies of social life. Social data mining from these data, coupled with advanced model-based techniques for fine-grained estimates, have the potential to provide a novel microscope through which to view and understand social complexity. This article suggests three ways to use Big Data together with small area estimation techniques, and shows how Big Data has the potential to mirror aspects of well-being and other socioeconomic phenomena.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • Bethlehem J.G. 2002. “Weighting Nonresponse Adjustments Based on Auxiliary Information.” In Survey Nonresponse edited by R.M. Groves D.A. Dillman J.L. Eltinge and R.J.A. Little. New York: John Wiley and Sons.

  • Bethlehem J. and S. Biffignandi. 2012. Handbook of Web Surveys. Hoboken NJ: John Wiley and Sons.

  • Chambers R.L. and N. Tzavidis. 2006. “M-Quantile Models for Small Area Estimation.” Biometrika 93: 255-268. Doi: http://dx.doi.org/10.1093/biomet/93.2.255.

  • Cheng C.L. and J.W. Van Ness. 1999. Statistical Regression with Measurement Error. London: Arnold.

  • Eagle N. M. Macy and R. Claxton. 2010. “Network Diversity and Economic Development.” Science 328: 1029-1031. Doi: http://dx.doi.org/10.1126/science.1186605.

  • European Commission. 2015. EU-SILC USER DATABASE DESCRIPTION Version 2007-1. Luxembourg: EC. Available at: http://ec.europa.eu/eurostat/web/income-andliving-conditions/methodology/list-variables (accessed April 26 2015).

  • Eurostat. 2014. Summary Record of 22nd Meeting of the European Statistical System Committee Riga September 26 2014. Available at: https://www.google.it/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0CCIQFjAA&url=http%3A%2F%2Fec.europa.eu%2Ftransparency%2Fregcomitology%2Findex.cfm%3Fdo%3DSearch.getPDF%26IU%2BSnl%2FK6tfKhHQT6oxF31qBB7fI4EnisQ1BdEUO8vC5SVAw47eF02NzJJLXFBE7MymAolL%2BDBgWkUQAUSR0vEUBA1Uxa7mJl1GidS%2BHNzw%3D&ei=5OE8VYKOLozfU9nNgtAH&usg=AFQjCNFEydu1g4aGiE_rpFJfOBD4EnRW9Q&sig2=qEgQ4yw9epL7R7eVYmTmQA&bvm=bv.91665533d.d24 (accessed April 26 2015).

  • Fabrizi E. C. Giusti N. Salvati and N. Tzavidis. 2014. “Mapping Average Equivalized Income Using Robust Small Area Methods.” Papers in Regional Science 93: 685-701. Available at: http://onlinelibrary.wiley.com/doi/10.1111/pirs.12015/abstract (accessed April 2015).

  • Fay R. and R. Herriot. 1979. “Estimates of Income for Small Places: An Application of James-Stein Procedures to Census Data.” Journal of the American Statistical Association 74: 269-277. DOI: http://dx.doi.org/10.1080/01621459.1979.10482505.

  • Filippucci C. 2011. “Statistical Sources and Statistical Systems in the Information Society.” Statistica 71: 189-211.

  • Foster J. J. Greer and E. Thorbecke. 1984. “A Class of Decomposable Poverty Measures.” Econometrica 52: 761-766.

  • Ghosh M. K. Sinha and D. Kim. 2006. “Empirical and Hierarchical Bayesian Estimation in Finite Population Sampling Under Structural Measurement Error Models.” Scandinavian Journal of Statistics 33: 591-608.

  • Giannotti F. D. Pedreschi A. Pentland P. Lukowicz D. Kossmann J. Crowley and D. Helbing. 2012. “A Planetary Nervous System for Social Mining and Collective Awareness.” European Physics Journal - Special Topics 214: 49-75. Doi: http://dx.doi.org/10.1140/epjst/e2012-01688-9.

  • Giusti C. S. Marchetti M. Pratesi and N. Salvati. 2012a. “Semiparametric Fay-Herriot Model Using Penalized Splines.” Journal of the Indian Society of Agricultural Statistics 66: 1-14.

  • Giusti C. S. Marchetti M. Pratesi and N. Salvati. 2012b. “Robust Small Area Estimation and Oversampling in the Estimation of Poverty Indicators.” Survey Research Methods 6: 155-163.

  • Hagenaars A.J.M. K. de Vos and M.A. Zaidi. 1994. Poverty Statistics in the Late 1980s: Research Based on Micro-data. Luxembourg: Eurostat.

  • Hastie T. R. Tibshirani and J. Friedman. 2009. The Elements of Statistical Learning: Data Mining Inference and Prediction. Springer Series in Statistics 2nd ed. New York: Springer.

  • Horrigan M.W. 2013. “Big Data: A Perspective From the BLS.” Amstat News January 2013: 25-27. Available at: http://magazine.amstat.org/blog/2013/01/01/sci-policyjan2013/ (accessed April 26 2015).

  • ISTAT 1997. I Sisitemi Locali del Lavoro. Rome: ISTAT. Available at: http://www.istat.it/it/strumenti/territorio-e-cartografia/sistemi-locali-del-lavoro (accessed April 26 2015).

  • Marchetti S. N. Tzavidis and M. Pratesi. 2012. “Non-Parametric Bootstrap Mean Squared Error Estimation for M-Quantile Estimators of Small Area Averages Quantiles and Poverty Indicators.” Computational Statistics and Data Analysis 56: 2889-2902.Doi: http://dx.doi.org/10.1016/j.csda.2012.01.023.

  • Pappalardo L. S. Rinzivillo Z. Qu D. Pedreschi and F. Giannotti. 2013. “Understanding the Patterns of Car Travel.” The European Physical Journal - Special Topics 215: 61-73. Doi: http://dx.doi.org/10.1140/epjst/e2013-01715-5.

  • Pentland A. 2012. “Society’s Nervous System: Building Effective Government Energy and Public Health Systems.” Computer 45: 31-38.

  • Porter A.T. S.H. Holan C.K. Wikle and N. Cressie. 2014. “Spatial Fay-Herriot Models for Small Area Estimation with Functional Covariates.” Spatial Statistics 10: 27-42.Doi: http://dx.doi.org/10.1016/j.spasta.2014.07.001.

  • Pratesi M. C. Giusti S. Marchetti N. Salvati N. Tzavidis I. Molina M. Durban A. Grane´ J.M. Marı`n M.H. Veiga D. Morales M.D. Esteban A. Sanchez L. Santamaria Y. Marhuenda A. Perez M. Pagliarella C. Ferretti and J.N.K.

  • Rao. 2010. SAMPLE Project - Pilot Application. Brussels: European Commission - Directorate General for Research and Innovation. Available at: http://www.sampleproject.eu/SAMPLEwp2d17.pdf (accessed April 26 2015).

  • Rao J.N.K. 2003. Small Area Estimation. New York: John Wiley and Sons.

  • Salvati N. C. Giusti and M. Pratesi. 2014. “The Use of Spatial Information for the Estimation of Poverty Indicators at the Small Area Level.” In Poverty and Social Exclusion New Methods of Analysis edited by G. Betti and A. Lemmi. London: Routledge.

  • Tan P.N. M. Steinbach and V. Kumar. 2006. Introduction to Data Mining. Boston: Addison-Wesley.

  • Torabi M. G.S. Datta and J.N.K. Rao. 2009. “Empirical Bayes Estimation of Small Area Means under a Nested Error Linear Regression Model with Measurement Errors in the Covariates.” Scandinavian Journal of Statistics 36: 355-368. Doi: http://dx.doi.org/10.1111/j.1467-9469.2008.00623.x.

  • Tzavidis N. S. Marchetti and R. Chambers. 2010. “Robust Prediction of Small Area Means and Distributions.” Australian and New Zealand Journal of Statistics 52: 167-186. Doi: http://dx.doi.org/10.1111/j.1467-842X.2010.00572.x.

  • Wolter K.M. 2007. Introduction to Variance Estimation. New York: Springer. Ybarra L.M.R. 2003. Small Area Estimation Using Data from Multiple Surveys. Unpublished PhD thesis Arizona State University.

  • Ybarra L.M.R. and S.L. Lohr. 2008. “Small Area Estimation When Auxiliary Information is Measured With Error.” Biometrika 95: 919-931. Doi: http://dx.doi.org/10.1093/biomet/asn048.

Search
Journal information
Impact Factor


IMPACT FACTOR 2018: 0,837
5-year IMPACT FACTOR: 0,934

CiteScore 2018: 1.04

SCImago Journal Rank (SJR) 2018: 0.963
Source Normalized Impact per Paper (SNIP) 2018: 1.020

Cited By
Metrics
All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 725 490 20
PDF Downloads 347 261 11