Carefully designed probability-based sample surveys can be prohibitively expensive to conduct. As such, many survey organizations have shifted away from using expensive probability samples in favor of less expensive, but possibly less accurate, nonprobability web samples. However, their lower costs and abundant availability make them a potentially useful supplement to traditional probability-based samples. We examine this notion by proposing a method of supplementing small probability samples with nonprobability samples using Bayesian inference. We consider two semi-conjugate informative prior distributions for linear regression coefficients based on nonprobability samples, one accounting for the distance between maximum likelihood coefficients derived from parallel probability and non-probability samples, and the second depending on the variability and size of the nonprobability sample. The method is evaluated in comparison with a reference prior through simulations and a real-data application involving multiple probability and nonprobability surveys fielded simultaneously using the same questionnaire. We show that the method reduces the variance and mean-squared error (MSE) of coefficient estimates and model-based predictions relative to probability-only samples. Using actual and assumed cost data we also show that the method can yield substantial cost savings (up to 55%) for a fixed MSE.
Blom, A.G., D. Ackermann-Piek, S.C. Helmschrott, C. Cornesse, and J.W. Sakshaug. 2017. “The Representativeness of Online Panels: Coverage, Sampling and Weighting.” Paper Presented at the General Online Research Conference.
Blom, A.G., C. Gathmann, and U. Krieger. 2015. “Setting Up an Online Panel Representative of the General Population: The German Internet Panel.” Field Methods 27(4): 391–408. Doi: https://doi.org/10.1177/1525822X15574494.
Blom, A.G., J.M.E. Herzing, C. Cornesse, J.W. Sakshaug, U. Krieger, and D. Bossert. 2016a. “Does the Recruitment of Offline Households Increase the Sample Representativeness of Probability-Based Online Panels? Evidence from the German Internet Panel.” Social Science Computer Review 35(4): 498 – 520. Doi: https://doi.org/10.1177/0894439316651584.
Blom, A.G., M. Bosnjak, A. Cornilleau, A.-S. Cousteaux, M. Das, S. Douhou and U. Krieger. 2016b. “A Comparison of Four Probability-Based Online and Mixed-Mode Panels in Europe.” Social Science Computer Review 35(1): 8 – 25. Doi: https://doi.org/10.1177/0894439315574825.
Bosnjak, M., T. Dannwolf, T. Enderle, I. Schaurer, B. Struminskaya, A. Tanner, and K.W. Weyandt. 2017. “Establishing an Open Probability-Based Mixed-Mode Panel of the General Population in Germany: The GESIS Panel.” Social Science Computer Review 36(1): 103–115. Doi: https://doi.org/10.1177/0894439317697949.
Briggs, D., D. Fecht, and K. De Hoogh. 2007. “Census Data Issues for Epidemiology and Health Risk Assessment: Experiences from the Small Area Health Statistics Unit.” Journal of the Royal Statistical Society: Series A (Statistics in Society) 170(2): 355–378. Doi: https://doi.org/10.1111/j.1467-985X.2006.00467.x.
Callegaro, M., A. Villar, J. Krosnick, and D. Yeager. 2014. “A Critical Review of Studies Investigating the Quality of Data Obtained with Online Panels.” In Online Panel Research. A Data Quality Perspective, edited by M. Callegaro, R.P. Baker, J. Bethlehem, A.S. Goeritz, J.A. Krosnick, and P.J. Lavrakas, 23–53. Chichester, UK: John Wiley & Sons. Doi: https://doi.org/10.1002/9781118763520.ch2.
Chang, L. and J.A. Krosnick. 2009. “National Surveys via RDD Telephone Interviewing Versus the Internet Comparing Sample Representativeness and Response Quality.” Public Opinion Quarterly 73(4): 641–678. Doi: https://doi.org/10.1093/poq/nfp075.
Dutwin, D. and T.D. Buskirk. 2017. “Apples to Oranges or Gala Versus Golden Delicious? Comparing Data Quality of Nonprobability Internet Samples to Low Response Rate Probability Samples.” Public Opinion Quarterly 81(S1): 213–239. Doi: https://doi.org/10.1093/poq/nfw061.
Erens, B., S. Burkill, M.P. Couper, F. Conrad, S. Clifton, C. Tanton, A. Phelps, J. Datta, C.H. Mercer, P. Sonnenberg, et al. 2014. “Nonprobability Web Surveys to Measure Sexual Behaviors and Attitudes in the General Population: A Comparison with a Probability Sample Interview Survey.” Journal of Medical Internet Research 16(12). Doi: https://doi.org/10.2196/jmir.3382.
Fahimi, M., F.M. Barlas, W. Gross, and R.K. Thomas. 2014. “Towards a New Math for Nonprobability Sampling Alternatives.” Presented at the 69th Annual Conference of the American Association for Public Opinion Research (AAPOR).
Gelman, A., J.B. Carlin, H.S. Stern, and D.B. Rubin. 2013. Bayesian Data Analysis, Third Edition. Boca Raton, FL, USA: Chapman & Hall/CRC. ISBN: 9781439840955.
Gelman, A., S. Goel, D. Rothschild, and W. Wang. 2016. “High-frequency Polling with Non-representative Data.” In Political Communication in Real Time: Theoretical and Applied Research Approaches (eds. D. Schill, R. Kirk, and A.E. Jasperson). Routledge, 117–133.
Herzing, J.M.E. and A.G. Blom. 2019. “The Influence of a Person’s IT Literacy on Unit Nonresponse and Attrition in an Online Panel.” Social Science Computer Review 37(3): 404–424. Doi: https://doi.org/10.1177/0894439318774758.
Lee, S. and R. Valliant. 2009. “Estimation for Volunteer Panel Web Surveys using Propensity Score Adjustment and Calibration Adjustment.” Sociological Methods & Research 37(3): 319–343. Doi: https://doi.org/10.1177/0049124108329643.
MacInnis, G., J.A. Krosnick, S. Ho, and M.J. Cho. 2018. “The Accuracy of Measurements with Probability and Nonprobability Survey Samples: Replication and Extension.” Public Opinion Quarterly. Volume 82, Issue 4, 707–744. Doi: https://doi.org/10.1093/poq/nfy038.
Malhotra, N. and J.A. Krosnick. 2007. “The Effect of Survey Mode and Sampling on Inferences About Political Attitudes and Behavior: Comparing the 2000 and 2004 ANES to Internet Surveys with Nonprobability Samples.” Political Analysis, 286–323. Doi: https://doi.org/10.1093/pan/mpm003.
Marchetti, S., C. Giusti, and M. Pratesi. 2016. “The Use of Twitter Data to Improve Small Area Estimates of Households’ Share of Food Consumption Expenditure in Italy.” AStA Wirtschafts-und Sozialstatistisches Archiv 10(2–3): 79–93. Doi: https://doi.org/10.1007/s11943-016-0190-4.
Mercer, A.W., F. Kreuter, S. Keeter, and E.A. Stuart. 2017. “Theory and Practice in Nonprobability Surveys: Parallels between Causal Inference and Survey Inference.” Public Opinion Quarterly 81(S1): 250–271. Doi: https://doi.org/10.1093/poq/nfw060.
Pasek, J. 2016. “When Will Nonprobability Surveys Mirror Probability Surveys? Considering Types of Inference and Weighting Strategies as Criteria for Correspondence.” International Journal of Public Opinion Research 28(2): 269–291. Doi: https://doi.org/10.1093/ijpor/edv016.
Pennay, D.W., D. Neiger, P.J. Lavrakas, K.A. Borg, S. Mission, and N. Honey. 2018. “The Online Panels Benchmarking Study: a Total Survey Error Comparison of Findings from Probability-Based Surveys and Nonprobability Online Panel Surveys in Australia.” Australian National University, Centre for Social Research and Methods Paper NO. 2/2018. Available at: http://csrm.cass.anu.edu.au/sites/default/files/docs/2018/12/CSRM_MP2_2018_ONLINE_PANELS.pdf (accessed July 2019).
Porter, A.T., S.H. Holan, C.K. Wikle, and N. Cressie. 2014. “Spatial Fay-Herriot Models for Small Area Estimation with Functional Covariates.” Spatial Statistics 10: 27–42. Doi: https://doi.org/10.1016/j.spasta.2014.07.001.
R Core Team. 2016. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. Available at: https://www.r-project.org/ (accessed July 2019).
Schmertmann, C.P., S.M. Cavenaghi, R.M. Assunção, and J.E. Potter. 2013. “Bayes Plus Brass: Estimating Total Fertility for Many Small Areas from Sparse Census Data.” Population Studies 67(3): 255 – 273. Doi: https://doi.org/10.1080/00324728.2013.795602.
Spiegelhalter, D., A. Thomas, N. Best, and D. Lunn. 2007. OpenBUGS user manual, version 3.0.2. MRC Biostatistics Unit, Cambridge.
Sturtz, S., U. Ligges, A. Gelman, et al. 2005. “R2WinBUGS: A Package for Running WinBUGS from R.” Journal of Statistical Software 12(3): 1 – 16. Doi: https://doi.org/10.18637/jss.v012.i03.
Tourangeau, R. and T. Plewes. 2013. Nonresponse in Social Science Surveys: A Research Agenda. National Academies Press. Doi: https://doi.org/10.17226/18293.
Yeager, D.S., J.A. Krosnick, L. Chang, H.S. Javitz, M.S. Levendusky, A. Simpser, and R. Wang. 2011. “Comparing the Accuracy of RDD Telephone Surveys and Internet Surveys Conducted with Probability and Non-probability Samples.” Public Opinion Quarterly 75(1): 709–747. Doi: https://doi.org/10.1093/poq/nfr020.