Var-CNN: A Data-Efficient Website Fingerprinting Attack Based on Deep Learning

Open access

Abstract

In recent years, there have been several works that use website fingerprinting techniques to enable a local adversary to determine which website a Tor user visits. While the current state-of-the-art attack, which uses deep learning, outperforms prior art with medium to large amounts of data, it attains marginal to no accuracy improvements when both use small amounts of training data. In this work, we propose Var-CNN, a website fingerprinting attack that leverages deep learning techniques along with novel insights specific to packet sequence classification. In open-world settings with large amounts of data, Var-CNN attains over 1% higher true positive rate (TPR) than state-of-the-art attacks while achieving 4× lower false positive rate (FPR). Var-CNN’s improvements are especially notable in low-data scenarios, where it reduces the FPR of prior art by 3.12% while increasing the TPR by 13%. Overall, insights used to develop Var-CNN can be applied to future deep learning based attacks, and substantially reduce the amount of training data needed to perform a successful website fingerprinting attack. This shortens the time needed for data collection and lowers the likelihood of having data staleness issues.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • [1] The Top 500 Sites on the Web. https://www.alexa.com/topsites 2017.

  • [2] Martín Abadi Ashish Agarwal Paul Barham Eugene Brevdo Zhifeng Chen Craig Citro Gregory S. Corrado Andy Davis Jeffrey Dean Matthieu Devin Sanjay Ghemawat Ian J. Goodfellow Andrew Harp Geoffrey Irving Michael Isard Yangqing Jia Rafal Józefowicz Lukasz Kaiser Manjunath Kudlur Josh Levenberg Dan Mané Rajat Monga Sherry Moore Derek Gordon Murray Chris Olah Mike Schuster Jonathon Shlens Benoit Steiner Ilya Sutskever Kunal Talwar Paul A. Tucker Vincent Vanhoucke Vijay Vasudevan Fernanda B. Viégas Oriol Vinyals Pete Warden Martin Wattenberg Martin Wicke Yuan Yu and Xiaoqiang Zheng. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. arXiv preprint arXiv:1603.04467 2015.

  • [3] Kota Abe and Shigeki Goto. Fingerprinting Attack on Tor Anonymity using Deep Learning. In Proceedings of the Asia-Pacific Advanced Network Research Workshop volume 42 pages 15–20 2016.

  • [4] George D. Bissias Marc Liberatore David Jensen and Brian N. Levine. Privacy Vulnerabilities in Encrypted HTTP Streams. Privacy Enhancing Technologies pages 1–11 2006.

  • [5] Xiang Cai Rishab Nithyanand Tao Wang Rob Johnson and Ian Goldberg. A Systematic Approach to Developing and Evaluating Website Fingerprinting Defenses. In Proceedings of the ACM Conference on Computer and Communications Security pages 227–238 2014.

  • [6] Xiang Cai Xin C. Zhang Brijesh Joshi and Rob Johnson. Touching from a Distance: Website Fingerprinting Attacks and Defenses. In Proceedings of the ACM Conference on Computer and Communications Security pages 605–616 2012.

  • [7] Heyning Cheng and Ron Avnur. Traffic Analysis of SSL Encrypted Web Browsing. https://pdfs.semanticscholar.org/1a98/7c4fe65fa347a863dece665955ee7e01791b.pdf 1998.

  • [8] François Chollet et al. Keras. https://keras.io 2015.

  • [9] Tor Developers. Tor metrics portal. https://metrics.torproject.org 2018.

  • [10] Thomas G. Dietterich. Ensemble Methods in Machine Learning. In Proceedings of the International Workshop on Multiple Classifier Systems 2000.

  • [11] Roger Dingledine Nick Mathewson and Paul Syverson. Tor: The Second-Generation Onion Router. In Proceedings of the 13th USENIX Security Symposium pages 303–320 2004.

  • [12] Kevin P. Dyer Scott E. Coull Thomas Ristenpart and Thomas Shrimpton. Peek-a-Boo I Still See You: Why Efficient Traffic Analysis Countermeasures Fail. In Proceedings of the IEEE Symposium on Security and Privacy pages 332–346 2012.

  • [13] Ian J. Goodfellow Jonathon Shlens and Christian Szegedy. Explaining and Harnessing Adversarial Examples. In Proceedings of the International Conference on Learning Representations 2015.

  • [14] Priya Goyal Piotr Dollár Ross Girshick Pieter Noordhuis Lukasz Wesolowski Aapo Kyrola Andrew Tulloch Yangqing Jia and Kaiming He. Accurate Large Mini-batch SGD: Training ImageNet in 1 Hour. arXiv preprint arXiv:1706.02677 2017.

  • [15] Ankit Gupta and Alexander M. Rush. Dilated Convolutions for Modeling Long-Distance Genomic Dependencies. In Proceedings of the 34th International Conference on Machine Learning Workshop on Computational Biology 2017.

  • [16] Jamie Hayes and George Danezis. k-fingerprinting: A Robust Scalable Website Fingerprinting Technique. In Proceedings of the 25th USENIX Security Symposium pages 1187–1203 2016.

  • [17] Kaiming He Xiangyu Zhang Shaoqing Ren and Jian Sun. Deep Residual Learning for Image Recognition. arXiv preprint arXiv:1512.03385 2015.

  • [18] Dominik Herrmann Rolf Wendolsky and Hannes Federrath. Website Fingerprinting: Attacking Popular Privacy Enhancing Technologies with the Multinomial Naïve-Bayes Classifier. In Proceedings of the ACM Workshop on Cloud Computing Security pages 31–42 2009.

  • [19] Andrew Hintz. Fingerprinting Websites Using Traffic Analysis. Privacy Enhancing Technologies pages 171–178 2003.

  • [20] Sepp Hochreiter and Jürgen Schmidhuber. Long Short-Term Memory. Neural Computation 9(8):1735–1780 1997.

  • [21] Gao Huang Zhuang Liu Laurens van der Maaten and Kilian Q. Weinberger. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2017.

  • [22] Sergey Ioffe and Christian Szegedy. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of the 32nd International Conference on Machine Learning 2015.

  • [23] Max Jaderberg Wojciech M. Czarnecki Simon Osindero Oriol Vinyals Alex Graves David Silver and Koray Kavukcuoglu. Decoupled Neural Interfaces using Synthetic Gradients. In Proceedings of the 34th International Conference on Machine Learning 2017.

  • [24] Marc Juarez Sadia Afroz Gunes Acar Claudia Diaz and Rachel Greenstadt. A Critical Evaluation of Website Fingerprinting Attacks. In Proceedings of the ACM Conference on Computer and Communications Security 2014.

  • [25] Marc Juarez Mohsen Imani Mike Perry Claudia Diaz and Matthew Wright. Toward an Efficient Website Fingerprinting Defense. In Proceedings of the European Symposium on Research in Computer Security pages 27–46 2016.

  • [26] Diederik P. Kingma and Jimmy Ba. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference on Learning Representations 2015.

  • [27] Alex Krizhevsky Ilya Sutskever and Geoffrey E. Hinton. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the Conference on Neural Information Processing Systems pages 1097–1105 2012.

  • [28] Yann LeCun Yoshua Bengio and Geoffrey E. Hinton. Deep Learning. Nature 521:436–444 2015.

  • [29] Yann LeCun Leon Bottou Yoshua Bengio and Patrick Haffner. Gradient-Based Learning Applied to Document Recognition. Proceedings of the IEEE 86(11):2278–2324 1998.

  • [30] Marc Liberatore and Brian N. Levine. Inferring the Source of Encrypted HTTP Connections. In Proceedings of the 13th ACM Conference on Computer and Communications Security pages 255–263 2006.

  • [31] David Lu Sanjit Bhat Albert Kwon and Srinivas Devadas. DynaFlow: An Efficient Website Fingerprinting Defense Based on Dynamically-Adjusting Flows. In Proceedings of the ACM Workshop on Privacy in the Electronic Society 2018.

  • [32] Liming Lu Ee-Chien Chang and Mun C. Chan. Website Fingerprinting and Identification Using Ordered Feature Sequences. In Proceedings of the European Symposium on Research in Computer Security pages 199–214 2010.

  • [33] Aleksander Mądry Aleksandar Makelov Ludwig Schmidt Dimitris Tsipras and Adrian Vladu. Towards Deep Learning Models Resistant to Adversarial Attacks. In Proceedings of the International Conference on Learning Representations 2018.

  • [34] Andriy Panchenko Fabian Lanze Aandreas Zinnen and Martin Henze. Website Fingerprinting at Internet Scale. In Proceedings of the 16th Network and Distributed System Security Symposium 2016.

  • [35] Andriy Panchenko Lukas Niessen Andreas Zinnen and Thomas Engel. Website Fingerprinting in Onion Routing Based Anonymization Networks. In Proceedings of the ACM Workshop on Privacy in the Electronic Society pages 103–114 2011.

  • [36] Vera Rimmer Davy Preuveneers Marc Juarez Tom V. Goethem and Wouter Joosen. Automated Feature Extraction for Website Fingerprinting through Deep Learning. In Proceedings of the Network and Distributed System Security Symposium 2018.

  • [37] Karen Simonyan and Andrew Zisserman. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv preprint arXiv:1409.1556 2014.

  • [38] Payap Sirinam Mohsen Imani Marc Juarez and Matthew Wright. Deep Fingerprinting: Undermining Website Fingerprinting Defenses with Deep Learning. In Proceedings of the ACM Conference on Computer and Communications Security 2018.

  • [39] Nitish Srivastava Geoffrey H. Hinton Alex Krizhevsky Ilya Sutskever and Ruslan Salakhutdinov. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research 15:1929–1958 2014.

  • [40] Qixiang Sun Daniel R. Simon Yi-Min Wang Wilf Russell Venkata N. Padmanabhan and Lili Qiu. Statistical Identification of Encrypted Web Browsing Traffic. In Proceedings of the IEEE Symposium on Security and Privacy pages 19–30 2002.

  • [41] Christian Szegedy Sergey Ioffe Vincent Vanhoucke and Alex Alemi. Inception-v4 Inception-ResNet and the Impact of Residual Connections on Learning. arXiv preprint arXiv:1602.07261 2016.

  • [42] Aaron van den Oord Sander Dieleman Heiga Zen Karen Simonyan Oriol Vinyals Alex Graves Nal Kalchbrenner Andrew Senior and Koray Kavukcuoglu. WaveNet: A Generative Model for Raw Audio. arXiv preprint arXiv:1609.03499 2016.

  • [43] Tao Wang Xiang Cai Rob Johnson and Ian Goldberg. Effective Attacks and Provable Defenses for Website Fingerprinting. In Proceedings of the 23rd USENIX Security Symposium pages 143–157 2014.

  • [44] Tao Wang and Ian Goldberg. Improved Website Fingerprinting on Tor. In Proceedings of the ACM Workshop on Privacy in the Electronic Society 2013.

  • [45] Tao Wang and Ian Goldberg. On Realistically Attacking Tor with Website Fingerprinting. In Proceedings on Privacy Enhancing Technologies pages 21–36 2016.

  • [46] Tao Wang and Ian Goldberg. Walkie-Talkie: An Efficient Defense Against Passive Website Fingerprinting Attacks. In Proceedings of the USENIX Security Symposium pages 1375–1390 2017.

  • [47] Fisher Yu and Vladlen Koltun. Multi-Scale Context Aggregation By Dilated Convolutions. In Proceedings of the International Conference on Learning Representations 2016.

  • [48] Chiyuan Zhang Samy Bengio Moritz Hardt Benjamin Recht and Oriol Vinyals. Understanding Deep Learning Requires Rethinking Generalization. In Proceedings of the International Conference on Learning Representations 2017.

Search
Journal information
Metrics
All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 82 82 61
PDF Downloads 60 60 43