Supplementing Small Probability Samples with Nonprobability Samples: A Bayesian Approach

Open access

Abstract

Carefully designed probability-based sample surveys can be prohibitively expensive to conduct. As such, many survey organizations have shifted away from using expensive probability samples in favor of less expensive, but possibly less accurate, nonprobability web samples. However, their lower costs and abundant availability make them a potentially useful supplement to traditional probability-based samples. We examine this notion by proposing a method of supplementing small probability samples with nonprobability samples using Bayesian inference. We consider two semi-conjugate informative prior distributions for linear regression coefficients based on nonprobability samples, one accounting for the distance between maximum likelihood coefficients derived from parallel probability and non-probability samples, and the second depending on the variability and size of the nonprobability sample. The method is evaluated in comparison with a reference prior through simulations and a real-data application involving multiple probability and nonprobability surveys fielded simultaneously using the same questionnaire. We show that the method reduces the variance and mean-squared error (MSE) of coefficient estimates and model-based predictions relative to probability-only samples. Using actual and assumed cost data we also show that the method can yield substantial cost savings (up to 55%) for a fixed MSE.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • AAPOR. 2016. Standard Definitions: Final Dispositions of Case Codes and Outcome Rates for Surveys (9th ed.). American Association for Public Opinion Research. Available at: https://www.aapor.org/AAPOR_Main/media/publications/Standard-Definitions20169theditionfinal.pdf (accessed July 2019).

  • Ansolabehere S. and D. Rivers. 2013. “Cooperative Survey Research.” Annual Review of Political Science 16: 307–329. Doi: https://doi.org/10.1146/annurev-polisci-022811-160625.

  • Ansolabehere S. and B.F. Schaffner. 2014. “Does Survey Mode Still Matter? Findings from a 2010 Multi-Mode Comparison.” Political Analysis 22(3): 285–303. Doi: https://doi.org/10.1093/pan/mpt025.

  • Baker R. J.M. Brick N.A. Bates M. Battaglia M.P. Couper J.A. Dever K.J. Gile and R. Tourangeau. 2013. Report of the AAPOR Task Force on Non-Probability Sampling. American Association for Public Opinion Research. Available at: https://www.aapor.org/AAPOR_Main/media/MainSiteFiles/NPS_TF_Report_Final_7_revised_FNL_6_22_13.pdf (accessed July 2019).

  • Blom A.G. D. Ackermann-Piek S.C. Helmschrott C. Cornesse and J.W. Sakshaug. 2017. “The Representativeness of Online Panels: Coverage Sampling and Weighting.” Paper Presented at the General Online Research Conference.

  • Blom A.G. C. Gathmann and U. Krieger. 2015. “Setting Up an Online Panel Representative of the General Population: The German Internet Panel.” Field Methods 27(4): 391–408. Doi: https://doi.org/10.1177/1525822X15574494.

  • Blom A.G. J.M.E. Herzing C. Cornesse J.W. Sakshaug U. Krieger and D. Bossert. 2016a. “Does the Recruitment of Offline Households Increase the Sample Representativeness of Probability-Based Online Panels? Evidence from the German Internet Panel.” Social Science Computer Review 35(4): 498 – 520. Doi: https://doi.org/10.1177/0894439316651584.

  • Blom A.G. M. Bosnjak A. Cornilleau A.-S. Cousteaux M. Das S. Douhou and U. Krieger. 2016b. “A Comparison of Four Probability-Based Online and Mixed-Mode Panels in Europe.” Social Science Computer Review 35(1): 8 – 25. Doi: https://doi.org/10.1177/0894439315574825.

  • Bosnjak M. T. Dannwolf T. Enderle I. Schaurer B. Struminskaya A. Tanner and K.W. Weyandt. 2017. “Establishing an Open Probability-Based Mixed-Mode Panel of the General Population in Germany: The GESIS Panel.” Social Science Computer Review 36(1): 103–115. Doi: https://doi.org/10.1177/0894439317697949.

  • Briggs D. D. Fecht and K. De Hoogh. 2007. “Census Data Issues for Epidemiology and Health Risk Assessment: Experiences from the Small Area Health Statistics Unit.” Journal of the Royal Statistical Society: Series A (Statistics in Society) 170(2): 355–378. Doi: https://doi.org/10.1111/j.1467-985X.2006.00467.x.

  • Cacioppo J.T. and R.E. Petty. 1982. “The Need for Cognition.” Journal of Personality and Social Psychology 42(1): 116. Doi: https://doi.org/10.1037/0022-3514.42.1.116.

  • Callegaro M. A. Villar J. Krosnick and D. Yeager. 2014. “A Critical Review of Studies Investigating the Quality of Data Obtained with Online Panels.” In Online Panel Research. A Data Quality Perspective edited by M. Callegaro R.P. Baker J. Bethlehem A.S. Goeritz J.A. Krosnick and P.J. Lavrakas 23–53. Chichester UK: John Wiley & Sons. Doi: https://doi.org/10.1002/9781118763520.ch2.

  • Chang L. and J.A. Krosnick. 2009. “National Surveys via RDD Telephone Interviewing Versus the Internet Comparing Sample Representativeness and Response Quality.” Public Opinion Quarterly 73(4): 641–678. Doi: https://doi.org/10.1093/poq/nfp075.

  • Digman J.M. 1990. “Personality Structure: Emergence of the Five-factor Model.” Annual Review of Psychology 41(1): 417–440. Doi: https://doi.org/10.1146/annurev.ps.41.020190.002221.

  • DiSogra C. C. Cobb E. Chan and J. Dennis. 2012. “Using Probability-Based Online Samples to Calibrate Non-Probability Opt-in Samples.” Presentation at: 67th Annual Conference of the American Association for Public Opinion Research (AAPOR). Available at: http://www.websm.org/uploadi/editor/1361444163DiSogra_et_al_2012_Using_Probability_Based_Online_Samples.ppt (accessed July 2019).

  • Dutwin D. and T.D. Buskirk. 2017. “Apples to Oranges or Gala Versus Golden Delicious? Comparing Data Quality of Nonprobability Internet Samples to Low Response Rate Probability Samples.” Public Opinion Quarterly 81(S1): 213–239. Doi: https://doi.org/10.1093/poq/nfw061.

  • Efron B. 1979. “Bootstrap Methods: Another Look at the Jackknife.” The Annals of Statistics 1–26. Doi: https://doi.org/10.1007/978-1-4612-4380-9_41.

  • Elliott M.N. and A. Haviland. 2007. “Use of a Web-based Convenience Sample to Supplement a Probability Sample.” Survey Methodology 33(2): 211–215. Available at: https://www150.statcan.gc.ca/n1/en/pub/12-001-x/2007002/article/10498-eng.pdf?st=A8NHMZ2v (accessed July 2019).

  • Elliott M.R. 2013. “Combining Data from Probability and Non-probability Samples Using Pseudo-weights.” Survey Practice 2(6). Doi: https://doi.org/10.29115/SP-2009-0025.

  • Erens B. S. Burkill M.P. Couper F. Conrad S. Clifton C. Tanton A. Phelps J. Datta C.H. Mercer P. Sonnenberg et al. 2014. “Nonprobability Web Surveys to Measure Sexual Behaviors and Attitudes in the General Population: A Comparison with a Probability Sample Interview Survey.” Journal of Medical Internet Research 16(12). Doi: https://doi.org/10.2196/jmir.3382.

  • Fahimi M. F.M. Barlas W. Gross and R.K. Thomas. 2014. “Towards a New Math for Nonprobability Sampling Alternatives.” Presented at the 69th Annual Conference of the American Association for Public Opinion Research (AAPOR).

  • Gelman A. J.B. Carlin H.S. Stern and D.B. Rubin. 2013. Bayesian Data Analysis Third Edition. Boca Raton FL USA: Chapman & Hall/CRC. ISBN: 9781439840955.

  • Gelman A. S. Goel D. Rothschild and W. Wang. 2016. “High-frequency Polling with Non-representative Data.” In Political Communication in Real Time: Theoretical and Applied Research Approaches (eds. D. Schill R. Kirk and A.E. Jasperson). Routledge 117–133.

  • Goldberg L.R. 1993. “The Structure of Phenotypic Personality Traits.” American Psychologist 48(1): 26. Doi: https://doi.org/10.1037/0003-066X.48.1.26.

  • Herzing J.M.E. and A.G. Blom. 2019. “The Influence of a Person’s IT Literacy on Unit Nonresponse and Attrition in an Online Panel.” Social Science Computer Review 37(3): 404–424. Doi: https://doi.org/10.1177/0894439318774758.

  • Kennedy C. A. Mercer S. Keeter N. Hatley K. McGeeney and A. Gimenez. 2016. Evaluating Online Nonprobability Surveys. Vendor Choice Matters; Widespread Errors Found for Estimates Based on Blacks and Hispanics Pew Research Center. Available at: http://www.pewresearch.org/2016/05/02/evaluatingonline-nonprobability-surveys/ (accessed July 2019).

  • Lee S. 2006. “Propensity Score Adjustment as a Weighting Scheme for Volunteer Panel Web Surveys.” Journal of Official Statistics 22(2): 329. Available at: https://www.scb.se/contentassets/f6bcee6f397c4fd68db6452fc9643e68/propensity-score-adjustment-as-a-weighting-scheme-for-volunteer-panel-web-surveys.pdf (accessed July 2019).

  • Lee S. and R. Valliant. 2009. “Estimation for Volunteer Panel Web Surveys using Propensity Score Adjustment and Calibration Adjustment.” Sociological Methods & Research 37(3): 319–343. Doi: https://doi.org/10.1177/0049124108329643.

  • MacInnis G. J.A. Krosnick S. Ho and M.J. Cho. 2018. “The Accuracy of Measurements with Probability and Nonprobability Survey Samples: Replication and Extension.” Public Opinion Quarterly. Volume 82 Issue 4 707–744. Doi: https://doi.org/10.1093/poq/nfy038.

  • Malhotra N. and J.A. Krosnick. 2007. “The Effect of Survey Mode and Sampling on Inferences About Political Attitudes and Behavior: Comparing the 2000 and 2004 ANES to Internet Surveys with Nonprobability Samples.” Political Analysis 286–323. Doi: https://doi.org/10.1093/pan/mpm003.

  • Marchetti S. C. Giusti and M. Pratesi. 2016. “The Use of Twitter Data to Improve Small Area Estimates of Households’ Share of Food Consumption Expenditure in Italy.” AStA Wirtschafts-und Sozialstatistisches Archiv 10(2–3): 79–93. Doi: https://doi.org/10.1007/s11943-016-0190-4.

  • Mercer A.W. F. Kreuter S. Keeter and E.A. Stuart. 2017. “Theory and Practice in Nonprobability Surveys: Parallels between Causal Inference and Survey Inference.” Public Opinion Quarterly 81(S1): 250–271. Doi: https://doi.org/10.1093/poq/nfw060.

  • Pasek J. 2016. “When Will Nonprobability Surveys Mirror Probability Surveys? Considering Types of Inference and Weighting Strategies as Criteria for Correspondence.” International Journal of Public Opinion Research 28(2): 269–291. Doi: https://doi.org/10.1093/ijpor/edv016.

  • Pennay D.W. D. Neiger P.J. Lavrakas K.A. Borg S. Mission and N. Honey. 2018. “The Online Panels Benchmarking Study: a Total Survey Error Comparison of Findings from Probability-Based Surveys and Nonprobability Online Panel Surveys in Australia.” Australian National University Centre for Social Research and Methods Paper NO. 2/2018. Available at: http://csrm.cass.anu.edu.au/sites/default/files/docs/2018/12/CSRM_MP2_2018_ONLINE_PANELS.pdf (accessed July 2019).

  • Porter A.T. S.H. Holan C.K. Wikle and N. Cressie. 2014. “Spatial Fay-Herriot Models for Small Area Estimation with Functional Covariates.” Spatial Statistics 10: 27–42. Doi: https://doi.org/10.1016/j.spasta.2014.07.001.

  • R Core Team. 2016. R: A Language and Environment for Statistical Computing. Vienna Austria: R Foundation for Statistical Computing. Available at: https://www.r-project.org/ (accessed July 2019).

  • Rao J.N. 2003. Small-Area Estimation. Wiley Online Library. Doi: https://doi.org/10.1002/0471722189.

  • Rivers D. 2007. “Sampling for Web Surveys.” Presented at the Joint Statistical Meetings. Available at: https://pdfs.semanticscholar.org/fffa/a7e52c5d163a0944974a68160ee6e0a6b481.pdf (accessed July 2019).

  • Rivers D. and D. Bailey. 2009. “Inference from Matched Samples in the 2008 US National Elections.” In Proceedings of the Joint Statistical Meetings Volume 1 627–639. Palo Alto CA: YouGov/Polimetrix. Available at: https://pdfs.semanticscholar.org/e566/fb48f88ae34640b729387cbd4006249f8c45.pdf (accessed July 2019).

  • Schmertmann C.P. S.M. Cavenaghi R.M. Assunção and J.E. Potter. 2013. “Bayes Plus Brass: Estimating Total Fertility for Many Small Areas from Sparse Census Data.” Population Studies 67(3): 255 – 273. Doi: https://doi.org/10.1080/00324728.2013.795602.

  • Spiegelhalter D. A. Thomas N. Best and D. Lunn. 2007. OpenBUGS user manual version 3.0.2. MRC Biostatistics Unit Cambridge.

  • Sturtz S. U. Ligges A. Gelman et al. 2005. “R2WinBUGS: A Package for Running WinBUGS from R.” Journal of Statistical Software 12(3): 1 – 16. Doi: https://doi.org/10.18637/jss.v012.i03.

  • Tourangeau R. and T. Plewes. 2013. Nonresponse in Social Science Surveys: A Research Agenda. National Academies Press. Doi: https://doi.org/10.17226/18293.

  • Valliant R. and J.A. Dever. 2011. “Estimating Propensity Adjustments for Volunteer Web Surveys.” Sociological Methods & Research 40(1): 105 – 137. Doi: https://doi.org/10.1177/0049124110392533.

  • Wang W. D. Rothschild S. Goel and A. Gelman. 2015. “Forecasting Elections with Non-representative Polls.” International Journal of Forecasting 31(3): 980–991. Doi: https://doi.org/10.1016/j.ijforecast.2014.06.001.

  • Yeager D.S. J.A. Krosnick L. Chang H.S. Javitz M.S. Levendusky A. Simpser and R. Wang. 2011. “Comparing the Accuracy of RDD Telephone Surveys and Internet Surveys Conducted with Probability and Non-probability Samples.” Public Opinion Quarterly 75(1): 709–747. Doi: https://doi.org/10.1093/poq/nfr020.

Search
Journal information
Impact Factor


IMPACT FACTOR 2018: 0,837
5-year IMPACT FACTOR: 0,934

CiteScore 2018: 1.04

SCImago Journal Rank (SJR) 2018: 0.963
Source Normalized Impact per Paper (SNIP) 2018: 1.020

Metrics
All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 174 174 174
PDF Downloads 108 108 108