Search engine queries contain a great deal of private and potentially compromising information about users. One technique to prevent search engines from identifying the source of a query, and Internet service providers (ISPs) from identifying the contents of queries is to query the search engine over an anonymous network such as Tor.
In this paper, we study the extent to which Website Fingerprinting can be extended to fingerprint individual queries or keywords to web applications, a task we call Keyword Fingerprinting (KF). We show that by augmenting traffic analysis using a two-stage approach with new task-specific feature sets, a passive network adversary can in many cases defeat the use of Tor to protect search engine queries.
We explore three popular search engines, Google, Bing, and Duckduckgo, and several machine learning techniques with various experimental scenarios. Our experimental results show that KF can identify Google queries containing one of 300 targeted keywords with recall of 80% and precision of 91%, while identifying the specific monitored keyword among 300 search keywords with accuracy 48%. We also further investigate the factors that contribute to keyword fingerprintability to understand how search engines and users might protect against KF.
If the inline PDF is not rendering correctly, you can download the PDF file here.
 Internet live stats. http://www.internetlivestats.com/onesecond/#google-band.
 Tor homepage. https://www.torproject.org/.
 M. Backes G. Doychev and B. Köpf. Preventing Side-Channel Leaks in Web Traffic: A Formal Approach. NDSS 2013.
 E. Balsa C. Troncoso and C. Diaz. OB-PWS: Obfuscation-based private web search. In Proceedings - IEEE Symposium on Security and Privacy pages 491–505 2012.
 X. Cai R. Nithyanand and R. Johnson. Cs-buflo: A congestion sensitive website fingerprinting defense. In Proceedings of the 13th Workshop on Privacy in the Electronic Society pages 121–130. ACM 2014.
 X. Cai R. Nithyanand T. Wang R. Johnson and I. Goldberg. A systematic approach to developing and evaluating website fingerprinting defenses. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security CCS ’14 pages 227–238 New York NY USA 2014. ACM.
 X. Cai X. Zhang B. Joshi and R. Johnson. Touching from a distance: Website fingerprinting attacks and defenses. Proceeding of the 2012 ACM conference on Computer and Communications Security pages 605–616 2012.
 F. F. Chamasemani and Y. P. Singh. Multi-class Support Vector Machine (SVM) Classifiers – An Application in Hypothyroid Detection and Classification. In 2011 Sixth International Conference on Bio-Inspired Computing: Theories and Applications pages 351–356. IEEE Sep 2011.
 C. Chang and C. Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2:27:1–27:27 2011. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
 P. Chapman and D. Evans. Automated Black-Box Detection of Side-Channel Vulnerabilities in Web Applications. Proceedings of the 18th ACM conference on Computer and communications security - CCS ’11 (October):263 2011.
 S. Chen R. Wang X. Wang and K. Zhang. Side-channel leaks in web applications: A reality today a challenge tomorrow. In Proceedings - IEEE Symposium on Security and Privacy pages 191–206 2010.
 J. Domingo - Ferrer A. Solanas and J. Castella - Roca. h(k)–private information retrieval from privacy-uncooperative queryable databases. Online Information Review 33(4):720–744 Aug 2009.
 G. Dudek. Aol-user-ct-collection. http://www.cim.mcgill.ca/~dudek/206/Logs/AOL-user-ct-collection//.
 K. P. Dyer S. E. Coull T. Ristenpart and T. Shrimpton. Peek-a-boo i still see you: Why efficient traffic analysis countermeasures fail. In Proceedings of the 2012 IEEE Symposium on Security and Privacy SP ’12 pages 332–346 Washington DC USA 2012. IEEE Computer Society.
 M. Fredrikson and B. Livshits. RePriv: Re-imagining Content Personalization and In-browser Privacy. In 2011 IEEE Symposium on Security and Privacy pages 131–146. IEEE May 2011.
 G. Greenwald and E. MacAskill. NSA Prism Program Taps in to User Data of Apple Google and Others. The Guardian June 2013.
 S. Hansell. AOL Removes Search Data On Vast Group Of Web Users 2006.
 J. Hayes and G. Danezis. k-fingerprinting: a Robust Scalable Website Fingerprinting Technique.
 D. Herrmann R. Wendolsky and H. Federrath. Website Fingerprinting: Attacking Popular Privacy Enhancing Technologies with the Multinomial Naïve-Bayes Classifier. CCSW 2009.
 D. C. Howe and H. Nissenbaum. TrackMeNot: Resisting Surveillance in Web Search. Lessons from the Identity Trail: Anonymity Privacy and Identity in a Networked Society pages 417–436 2009.
 M. Juarez S. Afroz G. Acar C. Diaz and R. Greenstadt. A Critical Evaluation of Website Fingerprinting Attacks. Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security - CCS ’14 pages 263–274 2014.
 M. Juarez and V. Torra. DisPA: An Intelligent Agent for Private Web Search. pages 389–405. Springer International Publishing 2015.
 Keyword-Tool. http://keywordtool.io.
 W. H. Kruskal and W. A. Wallis. Use of Ranks in One-Criterion Variance Analysis. Source Journal of the American Statistical Association 4710087:583–621 1952.
 X. Luo P. Zhou E. W. Chan W. Lee R. K. Chang and R. Perdisci. Httpos: Sealing information leaks with browser-side obfuscation of encrypted flows. In NDSS 2011.
 B. McDonald. How often does google update its search results? https://hdwebpros.com/blog/how-often-does-google-update-its-search-results.html 2013.
 M. Miller. Google launches knowledge graph ‘first step in next generation search’. https://searchenginewatch.com/sew/news/2175783/google-launches-knowledge-graph-step-generation-search 2012.
 J. Ng. Blocked on Weibo: What Gets Suppressed on China’s Version of Twitter (And Why). New Press The 2013.
 A. Panchenko F. Lanze A. Zinnen M. Henze J. Pennekamp K. Wehrle and T. Engel. Website Fingerprinting at Internet Scale. 16th NDSS (NDSS 16) pages 143–157 2016.
 A. Panchenko L. Niessen A. Zinnen and T. Engel. Website Fingerprinting in Onion Routing Based Anonymization Networks. WPES 2011.
 M. Perry. Experimental defense for website traffic fingerprinting. https://blog.torproject.org/blog/experimental-defense-website-traffic-fingerprinting 2011. Accessed: 2015-11-23.
 M. Perry. A critique of website traffic fingerprinting attacks. https://blog.torproject.org/blog/critique-website-traffic-fingerprinting-attacks 2013. Accessed: 2015-11-1.
 A. Schaub E. Schneider A. Hollender V. Calasans L. Jolie R. Touillon A. Heuser S. Guilley and O. Rioul. Attacking Suggest Boxes in Web Applications Over HTTPS Using Side-Channel Stochastic Algorithms. Risks and Security of Internet and Systems 2015.
 S. A. Sharma and B. L. Menezes. Implementing side-channel attacks on suggest boxes in web applications. In Proceedings of the First International Conference on Security of Internet of Things - SecurIT ’12 pages 57–62 New York New York USA 2012. ACM Press.
 J. Slegg. Google adds “people also search for” thumbnails to search results. http://www.thesempost.com/google-adds-people-also-search-for-thumbnails-to-search-results/ 2016.
 A. Smarty. Google’s “people also ask” (related questions): What are they and why you should care. http://www.internetmarketingninjas.com/blog/search-engine-optimization/googles-people-also-ask-related-questions/ 2016.
 J. Titanium. AOL Search Log Special Part 1. http://www.somethingawful.com/weekend-web/aol-search-log/ 2006.
 D. Wagner B. Schneier et al. Analysis of the ssl 3.0 protocol. In The Second USENIX Workshop on Electronic Commerce Proceedings pages 29–40 1996.
 L. Wang K. P. Dyer A. Akella T. Ristenpart and T. Shrimpton. Seeing through Network-Protocol Obfuscation. Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security pages 57–69 2015.
 T. Wang X. Cai R. Nithyanand R. Johnson and I. Goldberg. Effective Attacks and Provable Defenses for Website Fingerprinting. 23rd USENIX Security Symposium (USENIX Security 14) pages 143–157 2014.
 T. Wang and I. Goldberg. Improved website fingerprinting on Tor. Proceedings of the 12th ACM workshop on Workshop on privacy in the electronic society - WPES ’13 pages 201–212 2013.
 T. Wang and I. Goldberg. On Realistically Attacking Tor with Website Fingerprinting. Proceedings on Privacy Enhancing Technologies (4):21–36 2016.
 T. Wang and I. Goldberg. Walkie-Talkie: An Efficient Defense Against Passive Website Fingerprinting Attacks. 26th USENIX Security Symposium (USENIX Security 17) 2017.
 C. V. Wright S. E. Coull and F. Monrose. Traffic morphing: An efficient defense against statistical traffic analysis. In NDSS 2009.
 K. Zhang Z. Li R. Wang X. F. Wang and S. Chen. Sidebuster: Automated detection and quantification of side-channel leaks in web application development. In Proceedings of the ACM Conference on Computer and Communications Security pages 595–606 2010.