Privacy in the Web has become a major concern resulting in the popular use of various tools for blocking tracking services. Most of these tools rely on manually maintained blacklists, which need to be kept up-to-date to protect Web users’ privacy efficiently. It is challenging to keep pace with today’s quickly evolving advertisement and analytics landscape. In order to support blacklist maintainers with this task, we identify a set of Web traffic features for identifying privacyintrusive services. Based on these features, we develop an automatic approach that learns the properties of advertisement and analytics services listed by existing blacklists and proposes new services for inclusion on blacklists. We evaluate our technique on real traffic traces of a campus network and find in the order of 200 new privacy-intrusive Web services that are not listed by the most popular Firefox plug-in Adblock Plus. The proposed Web traffic features are easy to derive, allowing a distributed implementation of our approach.
 M. E. Crovella and A. Bestavros. Self-similarity in world wide web traffic: evidence and possible causes. IEEE/ACM Trans. Netw., 5(6):835–846, 1997.
 J. Demšar, T. Curk, A. Erjavec, Črt Gorup, T. Hočevar, M. Milutinovič, M. Možina, M. Polajnar, M. Toplak, A. Starič, M. Štajdohar, L. Umek, L. Žagar, J. Žbontar, M. Žitnik, and B. Zupan. Orange: Data mining toolbox in python. Journal of Machine Learning Research, 14:2349–2353, 2013.
 P. Domingos and M. Pazzani. On the optimality of the simple bayesian classifier under zero-one loss. Machine learning, 29(2-3):103–130, 1997.
 F. Douglis, A. Feldmann, B. Krishnamurthy, and J. Mogul. Rate of change and other metrics: a live study of the world wide web. In Proc. USENIX Symp. on Internet Technologies and Systems, Dec. 1997.
 U. M. Fayyad and K. B. Irani. Multi-interval discretization of continuous-valued attributes for classification learning. In Proc. 13th Int. Joint Conf. on Artificial Intelligence, pages 1022–1027, 1993.
 R. Fielding and J. Reschke. Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content. RFC 7231.
 P. Gill, V. Erramilli, A. Chaintreau, B. Krishnamurthy, K. Papagiannaki, and P. Rodriguez. Follow the money: Understanding economics of online aggregation and advertising. In Proc. IMC ’13, pages 141–148, 2013.
 D. Gugelmann, B. Ager, and V. Lenders. Towards classifying third-party web services at scale. In Proc. CoNEXT Student Workshop ’14, pages 34–36, 2014.
 T. Hastie, R. Tibshirani, and J. Friedman. The elements of statistical learning, volume 2. Springer, 2009.
 S. Ihm and V. S. Pai. Towards understanding modern web traffic. In Proc. IMC ’11, pages 295–312, 2011.
 T. Karagiannis, K. Papagiannaki, and M. Faloutsos. Blinc: multilevel traffic classification in the dark. SIGCOMM Comput. Commun. Rev., 35(4):229–240, 2005.
 H. Kim, K. Claffy, M. Fomenkov, D. Barman, M. Faloutsos, and K. Lee. Internet traffic classification demystified: Myths, caveats, and the best practices. In Proc. ACM CoNEXT ’08, pages 11:1–11:12, 2008.
 B. Krishnamurthy. I know what you will do next summer. SIGCOMM Comput. Commun. Rev., 40(5):65–70, 2010.
 B. Krishnamurthy, D. Malandrino, and C. E. Wills. Measuring privacy loss and the impact of privacy protection in web browsing. In Proc. 3rd Symp. on Usable Privacy and Security (SOUPS ’07), pages 52–63, 2007.
 B. Krishnamurthy, K. Naryshkin, and C. E. Wills. Privacy leakage vs. protection measures: the growing disconnect. In Proc. Web 2.0 Security and Privacy Workshop, 2011.
 T. Libert. Privacy implications of health information seeking on the web. Commun. ACM, 58(3):68–77, 2015.
 J. Ma, L. K. Saul, S. Savage, and G. M. Voelker. Beyond blacklists: Learning to detect malicious web sites from suspicious urls. In Proc. 15th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, KDD ’09, pages 1245–1254, 2009.
 X. Ma, J. Zhu, Z. Wan, J. Tao, X. Guan, and Q. Zheng. Honeynet-based collaborative defense using improved highly predictive blacklisting algorithm. In 8th World Congr. on Intelligent Control and Automation, WCICA ’10, pages 1283–1288, 2010.
 G. Maier, A. Feldmann, V. Paxson, and M. Allman. On dominant characteristics of residential broadband internet traffic. In Proc. IMC ’09, pages 90–102, 2009.
 J. R. Mayer and J. C. Mitchell. Third-party web tracking: Policy and technology. In Proc. SP ’12, pages 413–427, 2012.
 J. Mikians, L. Gyarmati, V. Erramilli, and N. Laoutaris. Detecting price and search discrimination on the internet. In Proc. HotNets-XI ’12, pages 79–84, 2012.
 L. Olejnik, C. Castelluccia, and A. Janc. Why johnny can’t browse in peace: On the uniqueness of web browsing history patterns. In Proc. HotPETs ’12, 2012.
 L. Olejnik, T. Minh-Dung, and C. Castelluccia. Selling off privacy at auction. In Proc. NDSS ’14, 2014.
 H.-K. Pao, Y.-L. Chou, and Y.-J. Lee. Malicious url detection based on kolmogorov complexity estimation. In Proc. Int. Conf. on Web Intelligence and Intelligent Agent Technology, WI-IAT ’12, pages 380–387, 2012.
 V. Paxson. Bro: a system for detecting network intruders in real-time. Computer Networks, 31(23-24):2435–2463, 1999.
 M. Tran, X. Dong, Z. Liang, and X. Jiang. Tracking the trackers: Fast and scalable dynamic analysis of web content for privacy violations. In Proc. Conf. on Applied Cryptography and Network Security, ACNS ’12, pages 418–435, 2012.
 H. Zhang. The Optimality of Naive Bayes. In Proc. FLAIRS ’04, 2004.
 J. Zhang, P. A. Porras, and J. Ullrich. Highly predictive blacklisting. In Proc. USENIX Security ’08, 2008.