Search Results

You are looking at 1 - 10 of 28 items for :

  • Probability and Statistics x
Clear All
Open access

Steven Pedlow

Abstract

This article describes a case study on the potential of using smaller geographical units in an area probability design, and reports the challenges of collecting a nationally representative sample for this hard-to-reach population. The Census Integrated Communications Program Evaluation (CICPE) was designed to evaluate the promotional campaign’s effect on Decennial Census participation for six race/ethnicity groups of interest. A nationally representative Core sample was designed to collect interviews for Hispanics, non-Hispanic African-Americans, and non- Hispanic Whites. However, it was impractical to include the rarer Asian, American Indian and Alaska Native (AIAN), and Native Hawaiian and Other Pacific Islander (NHOPI) populations in the Core design. For the Asian sample, we designed a separate area probability sample.

Traditional area probability sampling designs use counties or metropolitan areas as first-stage units, but smaller geographical units can better target hard-to-reach populations. The CICPE Asian sample used cities as the first-stage units.

Open access

Morgan Earp, Daniell Toth, Polly Phipps and Charlotte Oslund

Abstract

This article introduces and discusses a method for conducting an analysis of nonresponse for a longitudinal establishment survey using regression trees. The methodology consists of three parts: analysis during the frame refinement and enrollment phases, common in longitudinal surveys; analysis of the effect of time on response rates during data collection; and analysis of the potential for nonresponse bias. For all three analyses, regression tree models are used to identify establishment characteristics and subgroups of establishments that represent vulnerabilities during the data collection process. This information could be used to direct additional resources to collecting data from identified establishments in order to improve the response rate.

Open access

Taylor Lewis, Elizabeth Goldberg, Nathaniel Schenker, Vladislav Beresovsky, Susan Schappert, Sandra Decker, Nancy Sonnenfeld and Iris Shimizu

Abstract

The National Ambulatory Medical Care Survey collects data on office-based physician care from a nationally representative, multistage sampling scheme where the ultimate unit of analysis is a patient-doctor encounter. Patient race, a commonly analyzed demographic, has been subject to a steadily increasing item nonresponse rate. In 1999, race was missing for 17 percent of cases; by 2008, that figure had risen to 33 percent. Over this entire period, single imputation has been the compensation method employed. Recent research at the National Center for Health Statistics evaluated multiply imputing race to better represent the missing-data uncertainty. Given item nonresponse rates of 30 percent or greater, we were surprised to find many estimates’ ratios of multiple-imputation to single-imputation estimated standard errors close to 1. A likely explanation is that the design effects attributable to the complex sample design largely outweigh any increase in variance attributable to missing-data uncertainty.

Open access

Nanhua Zhang, Henian Chen and Michael R. Elliott

Abstract

Nonresponse is very common in epidemiologic surveys and clinical trials. Common methods for dealing with missing data (e.g., complete-case analysis, ignorable-likelihood methods, and nonignorable modeling methods) rely on untestable assumptions. Nonresponse two-phase sampling (NTS), which takes a random sample of initial nonrespondents for follow-up data collection, provides a means to reduce nonresponse bias. However, traditional weighting methods to analyze data from NTS do not make full use of auxiliary variables. This article proposes a method called nonrespondent subsample multiple imputation (NSMI), where multiple imputation (Rubin 1987) is performed within the subsample of nonrespondents in Phase I using additional data collected in Phase II. The properties of the proposed methods by simulation are illustrated and the methods applied to a quality of life study. The simulation study shows that the gains from using the NTS scheme can be substantial, even if NTS sampling only collects data from a small proportion of the initial nonrespondents.

Open access

Anja Mohorko, Edith de Leeuw and Joop Hox

Abstract

To estimate the coverage error for web surveys in Europe over time, we analyzed data from the Eurobarometer. The Eurobarometer collects data for the European Community across member and applicant states. Since 2005, the Eurobarometer has contained a straightforward question on Internet access. We compared respondents with and without Internet access and estimated coverage bias for demographic variables (sex, age, length of education) and sociopolitical variables (left-right position on a political scale, life satisfaction). Countries in Europe do differ in Internet penetration and resulting coverage bias. Over time, Internet penetration dramatically increases and coverage bias decreases, but the rate of change differs across countries. In addition, the countries’ development significantly affects the pace of these changes.

Open access

Kristen Himelein, Stephanie Eckman and Siobhan Murray

Abstract

Livestock are an important component of rural livelihoods in developing countries, but data about this source of income and wealth are difficult to collect due to the nomadic and seminomadic nature of many pastoralist populations. Most household surveys exclude those without permanent dwellings, leading to undercoverage. In this study, we explore the use of a random geographic cluster sample (RGCS) as an alternative to the household-based sample. In this design, points are randomly selected and all eligible respondents found inside circles drawn around the selected points are interviewed. This approach should eliminate undercoverage of mobile populations. We present results of an RGCS survey with a total sample size of 784 households to measure livestock ownership in the Afar region of Ethiopia in 2012. We explore the RGCS data quality relative to a recent household survey, and discuss the implementation challenges.

Open access

Jaki S. McCarthy, Kathleen Ott, Heather Ridolfo, Pam McGovern, Robyn Sirkis and Danna Moore

Abstract

There are many methods that can be used to test questionnaires, each with its own strengths and weaknesses. The best approaches to questionnaire testing combine different methods to both broaden and strengthen the results. The US Census of Agriculture (COA) is conducted every five years and collects detailed information on agricultural production, inventories, practices, and operator demographics from agricultural establishments. Preceding each COA, evaluation and testing is done to test new items in the questionnaire and improve data quality for the subsequent COA. This article will describe how a multi-method approach, which we call Bento Box Testing, was applied to establishment questionnaire testing leading up to the 2017 COA. Testing included solicitation of expert opinion, historical data review, cognitive testing, a large scale field test, and qualitative follow-up interviews. The benefits of these testing methods, considerations for establishment survey testing, and how their results in combination provide a stronger evaluation are discussed.

Open access

J. Michael Brick

References Kulka, R., McNeill, J., and Bonito, A. (1982). On the Manifest Designation of Key Items: a Cost Effective Procedure for Improving the Collection and Processing of Survey Data. Paper Presented at the 37th Annual Conference of the American Association for Public Opinion Research. Maryland, USA: Hunt Valley. Lohr, S. (2007). Comment: Struggles with Survey Weighting and Regression Modeling. Statistical Science, 22, 175-178. Lynn, P. (2003). PEDAKSI: Methodology for Collecting Data About Survey Non

Open access

Wieger Coutinho, Ton de Waal and Natalie Shlomo

Abstract

A major challenge faced by basically all institutes that collect statistical data on persons, households or enterprises is that data may be missing in the observed data sets. The most common solution for handling missing data is imputation. Imputation is complicated owing to the existence of constraints in the form of edit restrictions that have to be satisfied by the data. Examples of such edit restrictions are that someone who is less than 16 years old cannot be married in the Netherlands, and that someone whose marital status is unmarried cannot be the spouse of the head of household. Records that do not satisfy these edits are inconsistent, and are hence considered incorrect. A further complication when imputing categorical data is that the frequencies of certain categories are sometimes known from other sources or have previously been estimated. In this article we develop imputation methods for imputing missing values in categorical data that take both the edit restrictions and known frequencies into account.

Open access

Kirstin Early, Jennifer Mankoff and Stephen E. Fienberg

Abstract

Online surveys have the potential to support adaptive questions, where later questions depend on earlier responses. Past work has taken a rule-based approach, uniformly across all respondents. We envision a richer interpretation of adaptive questions, which we call Dynamic Question Ordering (DQO), where question order is personalized. Such an approach could increase engagement, and therefore response rate, as well as imputation quality. We present a DQO framework to improve survey completion and imputation. In the general survey-taking setting, we want to maximize survey completion, and so we focus on ordering questions to engage the respondent and collect hopefully all information, or at least the information that most characterizes the respondent, for accurate imputations. In another scenario, our goal is to provide a personalized prediction. Since it is possible to give reasonable predictions with only a subset of questions, we are not concerned with motivating users to answer all questions. Instead, we want to order questions to get information that reduces prediction uncertainty, while not being too burdensome. We illustrate this framework with two case studies, for the prediction and survey-taking settings. We also discuss DQO for national surveys and consider connections between our statistics-based question-ordering approach and cognitive survey methodology.