Search Results

1 - 2 of 2 items

  • Author: Matthias Schonlau x
Clear All Modify Search


Respondent-driven sampling (RDS) is a network sampling technique typically employed for hard-to-reach populations when traditional sampling approaches are not feasible (e.g., homeless) or do not work well (e.g., people with HIV). In RDS, seed respondents recruit additional respondents from their network of friends. The recruiting process repeats iteratively, thereby forming long referral chains.

RDS is typically implemented face to face in individual cities. In contrast, we conducted Internet-based RDS in the American Life Panel (ALP), a web survey panel, targeting the general US population. We found that when friends are selected at random, as RDS methodology requires, recruiting chains die out. When self-selecting friends, self-selected friends tend to be older than randomly selected friends but share the same demographic characteristics otherwise.

Using randomized experiments, we also found that respondents list more friends when the respondent’s number of friends is preloaded from an earlier question. The results suggest that with careful selection of parameters, RDS can be used to select population-wide Internet panels and we discuss a number of elements that are critical for success.


Occupation coding, an important task in official statistics, refers to coding a respondent’s text answer into one of many hundreds of occupation codes. To date, occupation coding is still at least partially conducted manually, at great expense. We propose three methods for automatic coding: combining separate models for the detailed occupation codes and for aggregate occupation codes, a hybrid method that combines a duplicate-based approach with a statistical learning algorithm, and a modified nearest neighbor approach. Using data from the German General Social Survey (ALLBUS), we show that the proposed methods improve on both the coding accuracy of the underlying statistical learning algorithm and the coding accuracy of duplicates where duplicates exist. Further, we find defining duplicates based on ngram variables (a concept from text mining) is preferable to one based on exact string matches.