Modeling Nonresponse in Establishment Surveys: Using an Ensemble Tree Model to Create Nonresponse Propensity Scores and Detect Potential Bias in an Agricultural Survey

Open access

Abstract

Increasing nonresponse rates in federal surveys and potentially biased survey estimates are a growing concern, especially with regard to establishment surveys. Unlike household surveys, not all establishments contribute equally to survey estimates. With regard to agricultural surveys, if an extremely large farm fails to complete a survey, the United States Department of Agriculture (USDA) could potentially underestimate average acres operated among other things. In order to identify likely nonrespondents prior to data collection, the USDA’s National Agricultural Statistics Service (NASS) began modeling nonresponse using Census of Agriculture data and prior Agricultural Resource Management Survey (ARMS) response history. Using an ensemble of classification trees, NASS has estimated nonresponse propensities for ARMS that can be used to predict nonresponse and are correlated with key ARMS estimates.

Abraham, K.G., A. Mailand, and S.M. Bianchi. 2006. “Nonresponse in the American Time Use Survey. Who is Missing from the Data and How Much Does it Matter?” Public Opinion Quarterly 70: 676-703. DOI: http://dx.doi.org/10.1093/poq/nfl037.

Axinn, W., C. Link, and R. Groves. 2011. “Responsive Survey Design, Demographic Data Collection, and Models of Demographic Behavior.” Demography 48: 1127-1149. DOI: http://dx.doi.org/10.1007/s13524-011-0044-1.

Banfield, E., L.O. Hall, K.W. Bowyer, and W.P. Kegelmeyer. 2007. “A Comparison of Decision Tree Ensemble Creation Techniques.” IEEE Transactions on Pattern Analysis and Machine Intelligence 29: 173-180. DOI: http://dx.doi.org/10.1109/TPAMI.2007.250609.

Bauer, E. and R. Kohavi. 1999. “An Emperical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants.” Machine Learning 36: 105-132. DOI: http://dx.doi.org/10.1023/A:1007515423169.

Breiman, L. 1998. “Arcing Classifiers (with discussion).” Annals of Statistics 26: 801-849.

Brick, J.M. and D. Williams. 2009. “Reasons for Increasing Nonresponse in U.S. Household Surveys.” Paper presented at the Workshop of the Committee on National Statistics, Washington, DC, December 14.

Curtin, R., S. Presser, and E. Singer. 2005. “Changes in Telephone Survey Nonresponse over the Last Quarter Century.” Public Opinion Quarterly 69: 87-98. DOI: http://dx.doi.org/10.1093/poq/nfi002.

Dietterich, T.G. 2000. “Ensemble Methods in Machine Learning.” In Proceedings of the Multiple Classifier Systems: First International Workshop, MCS 2000, June 21-23, Cagliari, Italy. Available at: http://www.eecs.wsu.edu/,holder/courses/CptS570/fall07/papers/Dietterich00.pdf (accessed August 2014).

Dillman, D. 1978. Mail and Telephone Surveys: The Total Design Method. New York: Wiley & Sons.

Earp, M., J. McCarthy, E. Porter, and P. Kott. 2010. “Assessing the Effect of Calibration on Nonresponse Bias in the 2008 ARMS Phase III Sample Using Census 2007 Data.” In Proceedings of the Joint Statistical Meetings: American Statistical Association. Alexandria, VA: American Statistical Association. Available at: http://www.nass.usda.gov/Education_and_Outreach/Reports,_Presentations_and_Conferences/reports/conferences/JSM-2010/earp-2010_jsm_paper_arms_calibration.pdf (accessed August 2014).

Eltinge, J.L. and I.S. Yansaneh. 1997. “Diagnostics for Formation of Nonresponse Adjustment Cells, with an Application to Income Nonresponse in the US Consumer Expenditure Survey.” Survey Methodology 23: 33-40.

Groves, R. 2006. “Nonresponse Rates and the Nonresponse Bias in Household Surveys.” Public Opinion Quarterly 70: 646-675. DOI: http://dx.doi.org/10.1093/poq/nfl033.

Groves, R. and M. Couper. 1998. Nonresponse in Household Interview Surveys. New York: Wiley.

Groves, R.M., D. Dillman, J.L. Eltinge, and R.J. Little. 2002. Survey Nonresponse. New York: Wiley.

Groves, R. and S. Heeringa. 2006. “Responsive Design for Household Surveys: Tools for Actively Controlling Survey Errors and Costs.” Journal of the Royal Statistical Society Series A: Statistics in Society 169: 439-457. DOI: http://dx.doi.org/10.1111/j.1467-985X.2006.00423.x.

Johansson, F. and A. Klevmarken. 2008. “Explaining the Size and Nature of Response in a Survey on Health Status and Economic Standard.” Journal of Official Statistics 24: 431-449.

Johnson, T.P., I.K. Cho, R.T. Campbell, and A.L. Holbrook. 2006. “Using Community- Level Correlates to Evaluate Nonresponse Effects in a Telephone Survey.” Public Opinion Quarterly 70: 704-719. DOI: http://dx.doi.org/10.1093/poq/nfl032.

Kalton, G. and I. Flores-Cervantes. 2003. “Weighting Methods.” Journal of Official Statistics 19: 81-97.

Laflamme, F. and M. Karaganis. 2010. “Development and Implementation of Responsive Design for CATI Surveys at Statistics Canada.” In Proceedings of the European Quality Conference: Helsinki, Finland.

Lepkowsi, J.M. and M.P. Couper. 2002. “Nonresponse in the Second Wave of Longitudinal Household Surveys.” In Survey Nonresponse, edited by R.M. Groves, D.A. Dillman, J.L. Eltinge, and R.J.A. Little. New York: Wiley and Sons.

Little, J. and D. Rubin. 2002. Statistical Analysis with Missing Data. New York: Wiley.

Little, R. 1986. “Survey Nonresponse Adjustments for Estimates of Means.” Journal of the American Statistical Association 77: 237-250.

Little, R. and S. Vartivarian. 2005. “Does Weighting for Nonresponse Increase the Variance of Survey Means?” Survey Methodology 31: 161-168.

Luzi, O., T. De Waal, B. Hulliger, M. Di Zio, J. Pannekoek, D. Kilchmann, and C. Tempelman. 2007. Recommended Practices for Editing and Imputation in Crosssectional Business Surveys. Italian Statistical Institute ISTAT. Matignon, R. 2008. Data Mining Using SAS Enterprise Miner. Cary, NC: SAS Institute Inc.

McFadden, D. 1974. “Conditional Logit Analysis of Qualitative Choice Behavior.” In Frontiers in Econometrics, edited by P. Zarembka. New York: Academic Press.

Miller, D., M. Robbins, and J. Habiger. 2010. “Examining the Challenges of Missing Data Analysis in Phase Three of the Agricultural Resource Management Survey.” In Proceedings of the Joint Statistical Meetings: American Statistical Association.

Alexandria, VA: American Statistical Association. Available at: https://www.amstat.org/sections/srms/proceedings/y2010/Files/306438_56491.pdf (accessed August 2014).

Mohl, C. and F. Laflamme. 2007. “Research and Responsive Design Options for Survey Data Collection at Statistics Canada.” In Proceedings of the Joint Statistical Meetings: American Statistical Association. Alexandria, VA: American Statistical Association. Available at: https://www.amstat.org/sections/srms/proceedings/y2007/Files/JSM2007-000421.pdf (accessed August 2014).

Neville, P. 1999. Decision Trees for Predictive Modeling. Carey, NC: SAS Institute, Inc.

Nicoletti, C. and F. Peracchi. 2005. “Survey Response and Survey Characteristics: Microlevel Evidence from the European Community Household Panel.” Journal of the Royal Statistical Society Series A: 168: 763-781. DOI: http://dx.doi.org/10.1111/j.1467-985X.2005.00369.x.

Petroni, R., R. Sigman, D. Willimack, S. Cohen, and C. Tucker. 2004. “Response Rates and Nonresponse in Establishment Surveys - BLS and Census Bureau.” Federal Economic Statistics Advisory Committee, 1-50.

Phipps, P. and D. Toth. 2012. “Analyzing Establishment Nonresponse Using an Interpretable Regression Tree Model with Linked Administrative Data.” Annals of Applied Statistics 6: 772-794. DOI: http://dx.doi.org/10.1214/11-AOAS521.

Powers, R., J. Eltinge, and M. Cho. 2006. “Evaluation of the Detectability and Inferential Impact of Nonresponse Bias in Establishment Surveys.” In Proceedings of the Joint Statistical Meetings: American Statistical Association. Alexandria, VA: American Statistical Association. Available at: http://www.bls.gov/ore/pdf/st060130.pdf (accessed August 2014).

Rosenbaum, P. and D. Rubin. 1983. “The Central Role of the Propensity Score in Observational Studies for Causal Effects.” Biometrika 70: 41-55. DOI: http://dx.doi.org/10.1093/biomet/70.1.41.

Särndal, C.-E. 2011. “The 2010 Morris Hansen Lecture Dealing with Survey Nonresponse in Data Collection, in Estimation.” Journal of Official Statistics 27: 1-21. SAS Institute Inc. Enterprise Miner 6.2 Help and Documentation. Cary, NC: SAS Institute Inc., 2009.

Schouten, B. 2007. “A Selection Strategy for Weighting Variables Under a Not-Missingat- Random Assumption.” Journal of Official Statistics 23: 51-68.

Schouten, B. and G. de Nooij. 2005. Nonresponse Adjustment Using Classification Trees. CBS, Statistics Netherlands. Available at: http://www.cbs.nl/NR/rdonlyres/1245916E-80D5-40EB-B047-CC45E728B2A3/0/200501x10pub.pdf (accessed August 2014).

Stussman, B., J. Dahlhamer, and C. Simile. 2005. “The Effect of Interviewer Strategies on Contact and Cooperation Rates in the National Health Interview Survey.” Federal Committee on Statistical Methodology, Washington, DC Thompson, K.J. 2009. “Conducting Nonresponse Bias Analysis for Two Business Surveys at the US Census Bureau: Methods and (Some) Results.” In Proceedings of the Section on Survey Research Methods: American Statistical Association Alexandria, VA: American Statistical Association. Available at: http://www.scs.gmu.edu/,wss/wss100922linebackpaper.pdf (accessed August 2014).

Thompson, K.J. and K.T. Washington. 2013. “Challenges in the Treatment of Unit Nonresponse for Selected Business Surveys: A Case Study.” Survey Methods: Insights from the Field. Retrieved from http://surveyinsights.org/?p¼2991.

United States Department of Agriculture. 2012. 2012 Agricultural Resource Management Survey - Phase III Cost and Returns Report Survey Administration Manual. Washington, DC: US Department of Agriculture.

United States Department of Agriculture. 2007. 2007 Census of Agriculture. Washington, DC: US Department of Agriculture. Available at: http://www.agCensus.usda.gov/ Publications/2007/Full_Report/ (accessed August 2014).

United States Executive Office of the President. 2006. Office of Management and Budget Standards and Guidelines for Statistical Surveys. Washington, DC: U.S. Executive Office of the President. Available at: http://www.whitehouse.gov/sites/default/files/omb/inforeg/statpolicy/standards_stat_surveys.pdf (accessed August 2014).

Uther, W.T.B. and M.M. Veloso. 1998. “Tree Based Discretization for Continuous State Space Reinforcement Learning.” In Proceedings of AAAI-98, the Fifteenth National Conference on Artificial Intelligence: 769-774. Available at: http://www.cs.cmu.edu/,mmv/papers/will-aaai98.pdf (accessed August 2014).

Wagner, J. 2012. “A Comparison of Alternative Indicators for the Risk of Nonresponse Bias.” Public Opinion Quarterly 76: 555-575. DOI: http://dx.doi.org/10.1093/poq/nfs032.

Journal of Official Statistics

The Journal of Statistics Sweden

Journal Information


IMPACT FACTOR 2018: 0,837
5-year IMPACT FACTOR: 0,934

CiteScore 2018: 1.04

SCImago Journal Rank (SJR) 2018: 0.963
Source Normalized Impact per Paper (SNIP) 2018: 1.020

Metrics

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 412 326 30
PDF Downloads 250 220 16