Random Walks on Directed Networks: Inference and Respondent-Driven Sampling

Jens Malmros 1 , Naoki Masuda 2  and Tom Britton 3
  • 1 Department of Mathematics, Stockholm University, SE-106 91 Stockholm, Sweden.
  • 2 Department of Mathematical Informatics, The University of Tokyo, 7-3-1 Hongo, Bunkyo, Tokyo 113-8656, Japan.
  • 3 Department of Engineering Mathematics, University of Bristol, Merchant Venturers Building, Woodland Road, Clifton, Bristol BS8 1UB, United Kingdom.


Respondent-driven sampling (RDS) is often used to estimate population properties (e.g., sexual risk behavior) in hard-to-reach populations. In RDS, already sampled individuals recruit population members to the sample from their social contacts in an efficient snowball-like sampling procedure. By assuming a Markov model for the recruitment of individuals, asymptotically unbiased estimates of population characteristics can be obtained. Current RDS estimation methodology assumes that the social network is undirected, that is, all edges are reciprocal. However, empirical social networks in general also include a substantial number of nonreciprocal edges. In this article, we develop an estimation method for RDS in populations connected by social networks that include reciprocal and nonreciprocal edges. We derive estimators of the selection probabilities of individuals as a function of the number of outgoing edges of sampled individuals. The proposed estimators are evaluated on artificial and empirical networks and are shown to generally perform better than existing estimators. This is the case in particular when the fraction of directed edges in the network is large.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • Abramovitz, D., E.M. Volz, S.A. Strathdee, T.L. Patterson, A. Vera, and S.D. Frost. 2009. “Using Respondent Driven Sampling in a Hidden Population at Risk of HIV Infection: Who Do HIV-Positive Recruiters Recruit?” Sexually Transmitted Diseases 36: 750–756. Doi: http://dx.doi.org/10.1097/OLQ.0b013e3181b0f311.

  • Bernhardt, A., M.W. Spiller, and D. Polson. 2013. “All Work and No Pay: Violations of Employment and Labor Laws in Chicago, Los Angeles and New York City.” Social Forces 91: 725–746. Doi: http://dx.doi.org/10.1093/sf/sos193.

  • Boldi, P., M. Rosa, M. Santini, and S. Vigna. 2011. “Layered Label Propagation: A Multiresolution Coordinate-Free Ordering for Compressing Social Networks.” In Proceedings of the 20th International Conference on World Wide Web. 587–596. Available at: dl.acm.org/citation.cfm?id=1963405. (accessed Feb 2014).

  • Boldi, P. and S. Vigna. 2004. “The Webgraph Framework I: Compression Techniques.” In Proceedings of the 13th International Conference on World Wide Web. 595–602. Available at: dl.acm.org/citation.cfm?id=988672. (accessed Feb 2014).

  • Broadhead, R.S. 2008. “Notes on a Cautionary (Tall) Tale About Respondent-Driven Sampling: A Critique of Scott’s Ethnography.” The International Journal of Drug Policy 19: 235–237. Doi: http://dx.doi.org/10.1016/j.drugpo.2008.02.014.

  • Bui, T., J. Nyoni, M. Ross, J. Mbwambo, C. Markham, and S. McCurdy. 2014. “Sexual Motivation, Sexual Transactions and Sexual Risk Behaviors in Men Who Have Sex with Men in Dar es Salaam, Tanzania.” AIDS and Behavior 18: 2432–2441. Doi: http://dx.doi.org/10.1007/s10461-014-0808-x.

  • Chung, F. and L.Y. Lu. 2002. “The Average Distances in Random Graphs with Given Expected Degrees.” Proceedings of the National Academy of Sciences of the United States of America 99: 15879–15882. Doi: http://dx.doi.org/10.1073/pnas.252631999.

  • Chung, F., L.Y. Lu, and V. Vu. 2003. “Spectra of Random Graphs with Given Expected Degrees.” Proceedings of the National Academy of Sciences of the United States of America 100: 6313–6318. Doi: http://dx.doi.org/10.1073/pnas.0937490100.

  • Deaux, E. and J. Callaghan. 1985. “Key Informant Versus Self-Report Estimates of Health-Risk.” Evaluation Review 9: 365–368. Doi: http://dx.doi.org/10.1177/0193841X8500900308.

  • Dombrowski, K., B. Khan, J. Moses, E. Channell, and E. Misshula. 2013. “Assessing Respondent Driven Sampling for Network Studies in Ethnographic Contexts.” Advances in Anthropology 3: 1–9. Doi: http://dx.doi.org/10.4236/aa.2013.31001.

  • Donato, D., L. Laura, S. Leonardi, and S. Millozzi. 2004. “Large Scale Properties of the Webgraph.” European Physical Journal B 38: 239–243. Doi: http://dx.doi.org/10.1140/epjb/e2004-00056-6.

  • Doyle, P.G. and J.L. Snell. 1984. Random Walks and Electric Networks. The Mathematical Association of America: Washington.

  • Erdős, P. and A. Renyi. 1960. “On the Evolution of Random Graphs.” Publications of the Mathematical Institute of the Hungarian Academy of Science 5: 17–61.

  • Erickson, B.H. 1979. “Some Problems of Inference from Chain Data.” Sociological Methodology 10: 276–302. Doi: http://dx.doi.org/10.2307/270774.

  • Feller, W. 1950. An Introduction to Probability Theory and Its Applications, Vol. 1. New York: Wiley.

  • Fortunato, S., M. Boguñá, A. Flammini, and F. Menczer. 2008. “Approximating PageRank from In-Degree.” In Algorithms and Models for the Web-Graph, edited by W. Aiello, A. Broder, J. Janssen, and E. Milios, 59–71. Heidelberg: Springer.

  • Freeman, L.C., C.M. Webster, and D.M. Kirke. 1998. “Exploring Social Structure Using Dynamic Three-Dimensional Color Images.” Social Networks 20: 109–118. Doi: http://dx.doi.org/10.1016/S0378-8733(9700016-6).

  • Ghoshal, G. and A.L. Barabási. 2011. “Ranking Stability and Super-Stable Nodes in Complex Networks.” Nature Communications 2: 394. Doi: http://dx.doi.org/10.1038/ncomms1396.

  • Gile, K.J. 2011. “Improved Inference for Respondent-Driven Sampling Data with Application to HIV Prevalence Estimation.” Journal of the Americal Statistical Association 106: 135–146. Doi: http://dx.doi.org/10.1198/jasa.2011.ap09475.

  • Gile, K.J. and M.S. Handcock. 2010. “Respondent-Driven Sampling: An Assessment of Current Methodology.” Sociological Methodology 40: 285–327. Doi: http://dx.doi.org/10.1111/j.1467-9531.2010.01223.x.

  • Gile, K.J. and M.S. Handcock. 2015. “Network Model-Assisted Inference from Respondent-Driven Sampling Data.” Journal of the Royal Statistical Society A 178: 619–639. Doi: http://dx.doi.org/10.1111/rssa.12091.

  • Gile, K.J., L.G. Johnston, and M.J. Salganik. 2015. “Diagnostics for Respondent-Driven Sampling.” Journal of the Royal Statistical Society A 178: 241–269. Doi: http://dx.doi.org/10.1111/rssa.12059.

  • Gleiser, P. and L. Danon. 2003. “Community Structure in Jazz.” Advances in Complex Systems 6: 565–573. Doi: http://dx.doi.org/10.1142/S0219525903001067.

  • Goel, S. and M.J. Salganik. 2010. “Assessing Respondent-Driven Sampling.” Proceedings of the National Academy of Sciences of the United States of America 107: 6743–6747. Doi: http://dx.doi.org/10.1073/pnas.1000261107.

  • Goh, K.I., B. Kahng, and D. Kim. 2001. “Universal Behavior of Load Distribution in Scale-Free Networks.” Physical Review Letters 87: 278701-4. Doi: http://dx.doi.org/10.1103/PhysRevLett.87.278701.

  • Gong, N.Z. and W. Xu. 2014. “Reciprocal Versus Parasocial Relationships in Online Social Networks.” Social Network Analysis and Mining 4: 1–14. Doi: http://dx.doi.org/10.1007/s13278-014-0184-6.

  • Hakre, S., G. Arteaga, A. Núñez, N. Arambu, B. Aumakhan, M. Liu, S. Peel, J. Pascale, and P. Scott. 2014. “Prevalence of HIV, Syphilis, and Other Sexually Transmitted Infections among MSM from Three Cities in Panama.” Journal of the Urban Health 91: 793–808. Doi: http://dx.doi.org/10.1007/s11524-014-9885-4.

  • Heckathorn, D.D. 1997. “Respondent-Driven Sampling: A New Approach to the Study of Hidden Populations.” Social Problems 44: 174–199. Doi: http://dx.doi.org/10.2307/3096941.

  • Hobkirk, A.L., M.H. Watt, K.T. Green, J.C. Beckham, D. Skinner, and C.S. Meade. 2015. “Mediators of Interpersonal Violence and Drug Addiction Severity Among Methamphetamine Users in Cape Town, South Africa.” Addictive Behaviors 42: 167–171. Doi: http://dx.doi.org/10.1016/j.addbeh.2014.11.030.

  • Johnston, L.G., M. Malekinejad, C. Kendall, I.M. Iuppa, and G.W. Rutherford. 2008. “Implementation Challenges to Using Respondent-Driven Sampling Methodology for HIV Biological and Behavioral Surveillance: Field Experiences in International Settings.” AIDS and Behavior 12: 131–141. Doi: http://dx.doi.org/10.1007/s10461-008-9413-1.

  • Kazerooni, P.A., N. Motazedian, M. Motamedifar, M. Sayadi, M. Sabet, M.A. Lari, and K. Kamali. 2013. “The Prevalence of Human Immunodeficiency Virus and Sexually Transmitted Infections Among Female Sex Workers in Shiraz, South of Iran: By Respondent-Driven Sampling.” International Journal of STD and AIDS 25: 155–161. Doi: http://dx.doi.org/10.1177/0956462413496227.

  • Killworth, P.D. and H.R. Bernard. 1976. “Informant Accuracy in Social Network Data.” Human Organization 35: 269–286.

  • Kwak, H., C. Lee, H. Park, and S. Moon. 2010. “What is Twitter, a Social Network or a News Media?” In Proceedings of the 19th International Conference on World Wide Web. 591–600. Available at: dl.acm.org/citation.cfm?id=1772690 (accessed Feb 2014).

  • Langville, A.N., and C.D. Meyer. 2006. Google’s PageRank and Beyond. Princeton: Princeton University Press.

  • Levin, D.A., Y. Peres, and E.L. Wilmer. 2009. Markov Chains and Mixing Times. Providence: American Mathematical Society.

  • Lovász, L. 1993. “Random Walks on Graphs: A Survey.” Bolyai Society Mathematical Studies 2: 1–46.

  • Lu, X. 2013. “Linked Ego Networks: Improving Estimate Reliability and Validity with Respondent-Driven Sampling.” Social Networks 35: 669–685. Doi: http://dx.doi.org/10.1016/j.socnet.2013.10.001.

  • Lu, X., L. Bengtsson, T. Britton, M. Camitz, B.J. Kim, A. Thorson, and F. Liljeros. 2012. “The Sensitivity of Respondent-Driven Sampling.” Journal of the Royal Statistical Society A 175: 191–216. Doi: http://dx.doi.org/10.1111/j.1467-985X.2011.00711.x.

  • Lu, X., J. Malmros, F. Liljeros, and T. Britton. 2013. “Respondent-Driven Sampling on Directed Networks.” Electronic Journal of Statistics 7: 292–322. Doi: http://dx.doi.org/10.1214/13-EJS772.

  • Magnani, R., K. Sabin, T. Saidel, and D. Heckathorn. 2005. “Review of Sampling Hard-to-Reach and Hidden Populations for HIV Surveillance.” AIDS 19 (Supplement 2): 67–72. Doi: http://dx.doi.org/10.1097/01.aids.0000172879.20628.e1.

  • Marsden, P.V. 1990. “Network Data and Measurement.” Annual Review of Sociology 16: 435–463. Doi: http://dx.doi.org/10.1146/annurev.so.16.080190.002251.

  • Masuda, N. and H. Ohtsuki. 2009. “Evolutionary Dynamics and Fixation Probabilities in Directed Networks.” New Journal of Physics 11: 033012. Doi: http://dx.doi.org/10.1088/1367-2630/11/3/033012.

  • McCreesh, N., S.D.W. Frost, J. Seeley, J. Katongole, M.N. Tarsh, R. Ndunguse, F. Jichi, N.L. Lunel, D. Maher, L.G. Johnston, P. Sonnenberg, A.J. Copas, R.J. Hayes, and R.G. White. 2012. “Evaluation of Respondent-Driven Sampling.” Epidemiology 23: 138–147. Doi: http://dx.doi.org/10.1097/EDE.0b013e31823ac17c.

  • Mislove, A., M. Marcon, K.P. Gummadi, P. Druschel, and B. Bhattacharjee. 2007. “Measurement and Analysis of Online Social Networks.” In Proceedings of the 7th ACM SIGCOMM Conference on Internet measurement. 29–42. October 23–26, 2007 San Diego, CA, USA. Available at: dl.acm.org/citation.cfm?id=1298306 (accessed Feb 2014).

  • Moreno, J.L. 1960. The Sociometry Reader. New York: Free Press.

  • Muhib, F.B., L.S. Lin, A. Stueve, R.L. Miller, W.L. Ford, W.D. Johnson, and P.J. Smith, Community Intervention Trial for Youth Study Team. 2001. “A Venue-Based Method for Sampling Hard-to-Reach Populations.” Public Health Reports 116 (Suppl. 1): 216–222.

  • Newman, M. 2010. Networks: an Introduction. Oxford: Oxford University Press.

  • Newman, M.E., S. Forrest, and J. Balthrop. 2002. “Email Networks and the Spread of Computer Viruses.” Physical Review E 66: 035101. Doi: http://dx.doi.org/10.1103/PhysRevE.66.035101.

  • Ouellet, L.J. 2008. “Cautionary Comments on an Ethnographic Tale Gone Wrong.” International Journal of Drug Policy 19: 238–240. Doi: http://dx.doi.org/10.1016/j.drugpo.2008.02.013.

  • Paquette, D.M., J. Bryant, and J.D. Wit. 2011. “Use of Respondent-Driven Sampling to Enhance Understanding of Injecting Networks: A Study of People Who Inject Drugs in Sydney, Australia.” International Journal of Drug Policy 22: 267–273. Doi: http://dx.doi.org/10.1016/j.drugpo.2011.03.007.

  • Phillips II, G., L.M. Kuhns, R. Garofalo, and B. Mustanski. 2014. “Do Recruitment Patterns of Young Men Who Have Sex With Men (YMSM) Recruited Through Respondent-Driven Sampling (RDS) Violate Assumptions?” Journal of Epidemiology and Community Health 68: 1207–1212. Doi: http://dx.doi.org/10.1136/jech-2014-204206.

  • Robins, G., P. Pattison, Y. Kalish, and D. Lusher. 2007. “An Introduction to Exponential Random Graph (p*) Models for Social Networks.” Social Networks 29: 173–191. Doi: http://dx.doi.org/10.1016/j.socnet.2006.08.002.

  • Rybski, D., S.V. Buldyrev, S. Havlin, F. Liljeros, and H.A. Makse. 2009. “Scaling Laws of Human Interaction Activity.” Proceedings of the National Academy of Sciences of the United States of America 106: 12640–12645. Doi: http://dx.doi.org/10.1073/pnas.0902667106.

  • Salganik, M.J. and D.D. Heckathorn. 2004. “Sampling and Estimation in Hidden Populations Using Respondent-Driven Sampling.” Sociological Methodology 34: 193–240. Doi: http://dx.doi.org/10.1111/j.0081-1750.2004.00152.x.

  • Särndal, C.-E., B. Swensson, and J.H. Wretman. 1992. Model Assisted Survey Sampling. New York: Springer.

  • Schwitters, A., M. Swaminathan, D. Serwadda, M. Muyonga, R. Shiraishi, I. Benech, S. Mital, R. Bosa, G. Lubwama, and W. Hladik. 2012. “Prevalence of Rape and Client-Initiated Gender-Based Violence Among Female Sex Workers: Kampala, Uganda, 2012.” AIDS and Behavior 19: 68–76. Doi: http://dx.doi.org/10.1007/s10461-014-0957-y.

  • Scott, G. 2008. “‘They Got Their Program, and I Got Mine’: A Cautionary Tale Concerning the Ethical Implications of Using Respondent-Driven Sampling to Study Injection Drug Users.” International Journal of Drug Policy 19: 42–51. Doi: http://dx.doi.org/10.1016/j.drugpo.2007.11.014.

  • Solomon, S.S., S.H. Mehta, A.K. Srikrishnan, S. Solomon, A.M. McFall, O. Laeyendecker, D.D. Celentano, S.H. Iqbal, S. Anand, C.K. Vasudevan, S. Saravanan, G.M. Lucas, H.R. Kumar, M.S. Sulkowski, and T.C. Quinn. 2015. “Burden of Hepatitis C Virus Disease and Access to Hepatitis C Virus Services in People Who Inject Drugs in India: A Cross-Sectional Study.” The Lancet Infectious Diseases 15: 36–45. Doi: http://dx.doi.org/10.1016/S1473-3099(14)71045-X.

  • Tomas, A. and K.J. Gile. 2011. “The Effect of Differential Recruitment, Non-Response and Non-Recruitment on Estimators for Respondent-Driven Sampling.” Electronic Journal of Statistics 5: 899–934. Doi: http://dx.doi.org/10.1214/11-EJS630.

  • Van de Bunt, G., M. van Duijn, and T. Snijders. 1999. “Friendship Networks Through Time: An Actor-Oriented Dynamic Statistical Network Model.” Computational and Mathematical Organization Theory 5: 167–192. Doi: http://dx.doi.org/10.1023/A:1009683123448.

  • Verdery, A.M., M.G. Merli, J. Moody, J.A. Smith, and J.C. Fisher. 2015. “Brief Report: Respondent-Driven Sampling Estimators Under Real and Theoretical Recruitment Conditions of Female Sex Workers in China.” Epidemiology 26: 661–665. Doi: http://dx.doi.org/10.1097/EDE.0000000000000335.

  • Volz, E. and D.D. Heckathorn. 2008. “Probability Based Estimation Theory for Respondent Driven Sampling.” Journal of Official Statistics 24: 79–97.

  • Wasserman, S. and K. Faust. 1994. Social Network Analysis. New York: Cambridge University Press.

  • Zhang, S.X., M.W. Spiller, B.K. Finch, and Y. Qin. 2014. “Estimating Labor Trafficking among Unauthorized Migrant Workers in San Diego.” The Annals of the American Academy of Political and Social Science 653: 65–86. Doi: http://dx.doi.org/10.1177/0002716213519237.


Journal + Issues