Website fingerprinting allows a local, passive observer monitoring a web-browsing client’s encrypted channel to determine her web activity. Previous attacks have shown that website fingerprinting could be a threat to anonymity networks such as Tor under laboratory conditions. However, there are significant differences between laboratory conditions and realistic conditions. First, in laboratory tests we collect the training data set together with the testing data set, so the training data set is fresh, but an attacker may not be able to maintain a fresh data set. Second, laboratory packet sequences correspond to a single page each, but for realistic packet sequences the split between pages is not obvious. Third, packet sequences may include background noise from other types of web traffic. These differences adversely affect website fingerprinting under realistic conditions. In this paper, we tackle these three problems to bridge the gap between laboratory and realistic conditions for website fingerprinting. We show that we can maintain a fresh training set with minimal resources. We demonstrate several classification-based techniques that allow us to split full packet sequences effectively into sequences corresponding to a single page each. We describe several new algorithms for tackling background noise. With our techniques, we are able to build the first website fingerprinting system that can operate directly on packet sequences collected in the wild.
If the inline PDF is not rendering correctly, you can download the PDF file here.
 G. D. Bissias M. Liberatore D. Jensen and B. N. Levine. Privacy Vulnerabilities in Encrypted HTTP Streams. In Privacy Enhancing Technologies pages 1-11. Springer 2006.
 X. Cai R. Nithyanand T. Wang I. Goldberg and R. Johnson. A Systematic Approach to Developing and Evaluating Website Fingerprinting Defenses. In Proceedings of the 21th ACM Conference on Computer and Communications Security 2014.
 X. Cai X. Zhang B. Joshi and R. Johnson. Touching from a Distance: Website Fingerprinting Attacks and Defenses. In Proceedings of the 19th ACM Conference on Computer and Communications Security pages 605-616 2012.
 E. Casalicchio and M. Colajanni. A client-aware dispatching algorithm for web clusters providing multiple services. In Proceedings of the 10th international conference on World Wide Web pages 535-544 2001.
 H. Cheng and R. Avnur. Traffic Analysis of SSL-Encrypted Web Browsing. http://www.cs.berkeley.edu/~daw/teaching/cs261-f98/projects/final-reports/ronathan-heyning.ps.
 M. E. Crovella and A. Bestavros. Self-similarity in World Wide Web traffic: evidence and possible causes. Networking IEEE/ACM Transactions on 5(6):835-846 1997.
 R. Dingledine N. Mathewson and P. Syverson. Tor: The Second-Generation Onion Router. In Proceedings of the 13th USENIX Security Symposium 2004.
 K. Dyer S. Coull T. Ristenpart and T. Shrimpton. Peek-a- Boo I Still See You: Why Efficient Traffic Analysis Countermeasures Fail. In Proceedings of the 2012 IEEE Symposium on Security and Privacy pages 332-346 2012.
 G. Greenwald. XKeyscore: NSA tool collects ’nearly everything a user does on the internet’. http://www.theguardian.com/world/2013/jul/31/nsa-top-secret-program-online-data July 2013. Accessed Feb. 2015.
 J. Hayes and G. Danezis. k-fingerprinting: a Robust Scalable Website Fingerprinting Technique. arXiv:1509.00789v3 19 Feb 2016.
 D. Herrmann R. Wendolsky and H. Federrath. Website Fingerprinting: Attacking Popular Privacy Enhancing Technologies with the Multinomial Naïve-Bayes Classifier. In Proceedings of the 2009 ACM Workshop on Cloud Computing Security pages 31-42 2009.
 A. Hintz. Fingerprinting Websites Using Traffic Analysis. In Privacy Enhancing Technologies pages 171-178. Springer 2003.
 M. Juarez S. Afroz G. Acar C. Diaz and R. Greenstadt. A Critical Evaluation of Website Fingerprinting Attacks. In Proceedings of the 21th ACM Conference on Computer and Communications Security 2014.
 A. Kwon M. AlSabah D. Lazar M. Dacier and S. Devadas. Circuit fingerprinting attacks: passive deanonymization of tor hidden services. In 24th USENIX Security Symposium (USENIX Security 15) pages 287-302 2015.
 M. Liberatore and B. Levine. Inferring the Source of Encrypted HTTP Connections. In Proceedings of the 13th ACM Conference on Computer and Communications Security pages 255-263 2006.
 C. Liu R. White and S. Dumais. Understanding web browsing behaviors through Weibull analysis of dwell time. In Proceedings of the 33rd international ACM SIGIR Conference pages 379-386 2010.
 L. Lu E.-C. Chang and M. C. Chan. Website Fingerprinting and Identification Using Ordered Feature Sequences. In Computer Security-ESORICS 2010 pages 199-214. Springer 2010.
 M. Molina P. Castelli and G. Foddis. Web traffic modeling exploiting TCP connections’ temporal clustering through HTML-REDUCE. Network IEEE 14(3):46-55 2000.
 A. Panchenko F. Lanze A. Zinnen M. Henze J. Pennekamp K. Wehrle and T. Engel. Website fingerprinting at internet scale. In Proceedings of the 23rd Network and Distributed System Security Symposium 2016.
 A. Panchenko L. Niessen A. Zinnen and T. Engel. Website Fingerprinting in Onion Routing Based Anonymization Networks. In Proceedings of the 10th ACM Workshop on Privacy in the Electronic Society pages 103-114 2011.
 M. Perry. A Critique of Website Traffic Fingerprinting Attacks. https://blog.torproject.org/blog/critique-website-trafficfingerprinting-attacks November 2013. Accessed Feb. 2015.
 Q. Sun D. R. Simon Y.-M. Wang W. Russell V. N. Padmanabhan and L. Qiu. Statistical Identification of Encrypted Web Browsing Traffic. In Proceedings of the 2002 IEEE Symposium on Security and Privacy pages 19-30. IEEE 2002.
 Tor. Tor Metrics Portal. https://metrics.torproject.org/. Accessed Feb. 2015.
 T. Wang X. Cai R. Nithyanand R. Johnson and I. Goldberg. Effective Attacks and Provable Defenses for Website Fingerprinting. In Proceedings of the 23rd USENIX Security Symposium 2014.
 T. Wang and I. Goldberg. Improved Website Fingerprinting on Tor. In Proceedings of the 12th ACM Workshop on Privacy in the Electronic Society pages 201-212 2013.