Var-CNN: A Data-Efficient Website Fingerprinting Attack Based on Deep Learning

Open access

Abstract

In recent years, there have been several works that use website fingerprinting techniques to enable a local adversary to determine which website a Tor user visits. While the current state-of-the-art attack, which uses deep learning, outperforms prior art with medium to large amounts of data, it attains marginal to no accuracy improvements when both use small amounts of training data. In this work, we propose Var-CNN, a website fingerprinting attack that leverages deep learning techniques along with novel insights specific to packet sequence classification. In open-world settings with large amounts of data, Var-CNN attains over 1% higher true positive rate (TPR) than state-of-the-art attacks while achieving 4× lower false positive rate (FPR). Var-CNN’s improvements are especially notable in low-data scenarios, where it reduces the FPR of prior art by 3.12% while increasing the TPR by 13%. Overall, insights used to develop Var-CNN can be applied to future deep learning based attacks, and substantially reduce the amount of training data needed to perform a successful website fingerprinting attack. This shortens the time needed for data collection and lowers the likelihood of having data staleness issues.

[1] The Top 500 Sites on the Web. https://www.alexa.com/topsites, 2017.

[2] Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Gregory S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian J. Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Józefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Rajat Monga, Sherry Moore, Derek Gordon Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul A. Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda B. Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. arXiv preprint arXiv:1603.04467, 2015.

[3] Kota Abe and Shigeki Goto. Fingerprinting Attack on Tor Anonymity using Deep Learning. In Proceedings of the Asia-Pacific Advanced Network Research Workshop, volume 42, pages 15–20, 2016.

[4] George D. Bissias, Marc Liberatore, David Jensen, and Brian N. Levine. Privacy Vulnerabilities in Encrypted HTTP Streams. Privacy Enhancing Technologies, pages 1–11, 2006.

[5] Xiang Cai, Rishab Nithyanand, Tao Wang, Rob Johnson, and Ian Goldberg. A Systematic Approach to Developing and Evaluating Website Fingerprinting Defenses. In Proceedings of the ACM Conference on Computer and Communications Security, pages 227–238, 2014.

[6] Xiang Cai, Xin C. Zhang, Brijesh Joshi, and Rob Johnson. Touching from a Distance: Website Fingerprinting Attacks and Defenses. In Proceedings of the ACM Conference on Computer and Communications Security, pages 605–616, 2012.

[7] Heyning Cheng and Ron Avnur. Traffic Analysis of SSL Encrypted Web Browsing. https://pdfs.semanticscholar.org/1a98/7c4fe65fa347a863dece665955ee7e01791b.pdf, 1998.

[8] François Chollet et al. Keras. https://keras.io, 2015.

[9] Tor Developers. Tor metrics portal. https://metrics.torproject.org, 2018.

[10] Thomas G. Dietterich. Ensemble Methods in Machine Learning. In Proceedings of the International Workshop on Multiple Classifier Systems, 2000.

[11] Roger Dingledine, Nick Mathewson, and Paul Syverson. Tor: The Second-Generation Onion Router. In Proceedings of the 13th USENIX Security Symposium, pages 303–320, 2004.

[12] Kevin P. Dyer, Scott E. Coull, Thomas Ristenpart, and Thomas Shrimpton. Peek-a-Boo, I Still See You: Why Efficient Traffic Analysis Countermeasures Fail. In Proceedings of the IEEE Symposium on Security and Privacy, pages 332–346, 2012.

[13] Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and Harnessing Adversarial Examples. In Proceedings of the International Conference on Learning Representations, 2015.

[14] Priya Goyal, Piotr Dollár, Ross Girshick, Pieter Noordhuis, Lukasz Wesolowski, Aapo Kyrola, Andrew Tulloch, Yangqing Jia, and Kaiming He. Accurate, Large Mini-batch SGD: Training ImageNet in 1 Hour. arXiv preprint arXiv:1706.02677, 2017.

[15] Ankit Gupta and Alexander M. Rush. Dilated Convolutions for Modeling Long-Distance Genomic Dependencies. In Proceedings of the 34th International Conference on Machine Learning, Workshop on Computational Biology, 2017.

[16] Jamie Hayes and George Danezis. k-fingerprinting: A Robust Scalable Website Fingerprinting Technique. In Proceedings of the 25th USENIX Security Symposium, pages 1187–1203, 2016.

[17] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep Residual Learning for Image Recognition. arXiv preprint arXiv:1512.03385, 2015.

[18] Dominik Herrmann, Rolf Wendolsky, and Hannes Federrath. Website Fingerprinting: Attacking Popular Privacy Enhancing Technologies with the Multinomial Naïve-Bayes Classifier. In Proceedings of the ACM Workshop on Cloud Computing Security, pages 31–42, 2009.

[19] Andrew Hintz. Fingerprinting Websites Using Traffic Analysis. Privacy Enhancing Technologies, pages 171–178, 2003.

[20] Sepp Hochreiter and Jürgen Schmidhuber. Long Short-Term Memory. Neural Computation, 9(8):1735–1780, 1997.

[21] Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.

[22] Sergey Ioffe and Christian Szegedy. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of the 32nd International Conference on Machine Learning, 2015.

[23] Max Jaderberg, Wojciech M. Czarnecki, Simon Osindero, Oriol Vinyals, Alex Graves, David Silver, and Koray Kavukcuoglu. Decoupled Neural Interfaces using Synthetic Gradients. In Proceedings of the 34th International Conference on Machine Learning, 2017.

[24] Marc Juarez, Sadia Afroz, Gunes Acar, Claudia Diaz, and Rachel Greenstadt. A Critical Evaluation of Website Fingerprinting Attacks. In Proceedings of the ACM Conference on Computer and Communications Security, 2014.

[25] Marc Juarez, Mohsen Imani, Mike Perry, Claudia Diaz, and Matthew Wright. Toward an Efficient Website Fingerprinting Defense. In Proceedings of the European Symposium on Research in Computer Security, pages 27–46, 2016.

[26] Diederik P. Kingma and Jimmy Ba. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference on Learning Representations, 2015.

[27] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the Conference on Neural Information Processing Systems, pages 1097–1105, 2012.

[28] Yann LeCun, Yoshua Bengio, and Geoffrey E. Hinton. Deep Learning. Nature, 521:436–444, 2015.

[29] Yann LeCun, Leon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-Based Learning Applied to Document Recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.

[30] Marc Liberatore and Brian N. Levine. Inferring the Source of Encrypted HTTP Connections. In Proceedings of the 13th ACM Conference on Computer and Communications Security, pages 255–263, 2006.

[31] David Lu, Sanjit Bhat, Albert Kwon, and Srinivas Devadas. DynaFlow: An Efficient Website Fingerprinting Defense Based on Dynamically-Adjusting Flows. In Proceedings of the ACM Workshop on Privacy in the Electronic Society, 2018.

[32] Liming Lu, Ee-Chien Chang, and Mun C. Chan. Website Fingerprinting and Identification Using Ordered Feature Sequences. In Proceedings of the European Symposium on Research in Computer Security, pages 199–214, 2010.

[33] Aleksander Mądry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards Deep Learning Models Resistant to Adversarial Attacks. In Proceedings of the International Conference on Learning Representations, 2018.

[34] Andriy Panchenko, Fabian Lanze, Aandreas Zinnen, and Martin Henze. Website Fingerprinting at Internet Scale. In Proceedings of the 16th Network and Distributed System Security Symposium, 2016.

[35] Andriy Panchenko, Lukas Niessen, Andreas Zinnen, and Thomas Engel. Website Fingerprinting in Onion Routing Based Anonymization Networks. In Proceedings of the ACM Workshop on Privacy in the Electronic Society, pages 103–114, 2011.

[36] Vera Rimmer, Davy Preuveneers, Marc Juarez, Tom V. Goethem, and Wouter Joosen. Automated Feature Extraction for Website Fingerprinting through Deep Learning. In Proceedings of the Network and Distributed System Security Symposium, 2018.

[37] Karen Simonyan and Andrew Zisserman. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv preprint arXiv:1409.1556, 2014.

[38] Payap Sirinam, Mohsen Imani, Marc Juarez, and Matthew Wright. Deep Fingerprinting: Undermining Website Fingerprinting Defenses with Deep Learning. In Proceedings of the ACM Conference on Computer and Communications Security, 2018.

[39] Nitish Srivastava, Geoffrey H. Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research, 15:1929–1958, 2014.

[40] Qixiang Sun, Daniel R. Simon, Yi-Min Wang, Wilf Russell, Venkata N. Padmanabhan, and Lili Qiu. Statistical Identification of Encrypted Web Browsing Traffic. In Proceedings of the IEEE Symposium on Security and Privacy, pages 19–30, 2002.

[41] Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alex Alemi. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. arXiv preprint arXiv:1602.07261, 2016.

[42] Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. WaveNet: A Generative Model for Raw Audio. arXiv preprint arXiv:1609.03499, 2016.

[43] Tao Wang, Xiang Cai, Rob Johnson, and Ian Goldberg. Effective Attacks and Provable Defenses for Website Fingerprinting. In Proceedings of the 23rd USENIX Security Symposium, pages 143–157, 2014.

[44] Tao Wang and Ian Goldberg. Improved Website Fingerprinting on Tor. In Proceedings of the ACM Workshop on Privacy in the Electronic Society, 2013.

[45] Tao Wang and Ian Goldberg. On Realistically Attacking Tor with Website Fingerprinting. In Proceedings on Privacy Enhancing Technologies, pages 21–36, 2016.

[46] Tao Wang and Ian Goldberg. Walkie-Talkie: An Efficient Defense Against Passive Website Fingerprinting Attacks. In Proceedings of the USENIX Security Symposium, pages 1375–1390, 2017.

[47] Fisher Yu and Vladlen Koltun. Multi-Scale Context Aggregation By Dilated Convolutions. In Proceedings of the International Conference on Learning Representations, 2016.

[48] Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. Understanding Deep Learning Requires Rethinking Generalization. In Proceedings of the International Conference on Learning Representations, 2017.

Journal Information

Metrics

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 47 47 47
PDF Downloads 38 38 38