MAPS: Scaling Privacy Compliance Analysis to a Million Apps

Sebastian Zimmeck 1 , Peter Story 2 , Daniel Smullen 3 , Abhilasha Ravichander 3 , Ziqi Wang 3 , Joel Reidenberg 4 , N. Cameron Russell 4 , and Norman Sadeh 5
  • 1 Department of Mathematics and Computer Science, Wesleyan University
  • 2 School of Computer Science, Carnegie Mellon University
  • 3 School of Computer Science, Carnegie Mellon University
  • 4 School of Law, Fordham University
  • 5 School of Computer Science, Carnegie Mellon University

Abstract

The app economy is largely reliant on data collection as its primary revenue model. To comply with legal requirements, app developers are often obligated to notify users of their privacy practices in privacy policies. However, prior research has suggested that many developers are not accurately disclosing their apps’ privacy practices. Evaluating discrepancies between apps’ code and privacy policies enables the identification of potential compliance issues. In this study, we introduce the Mobile App Privacy System (MAPS) for conducting an extensive privacy census of Android apps. We designed a pipeline for retrieving and analyzing large app populations based on code analysis and machine learning techniques. In its first application, we conduct a privacy evaluation for a set of 1,035,853 Android apps from the Google Play Store. We find broad evidence of potential non-compliance. Many apps do not have a privacy policy to begin with. Policies that do exist are often silent on the practices performed by apps. For example, 12.1% of apps have at least one location-related potential compliance issue. We hope that our extensive analysis will motivate app stores, government regulators, and app developers to more effectively review apps for potential compliance issues.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • [1] V. Afonso, A. Bianchi, Y. Fratantonio, A. Doupe, M. Polino, P. de Geus, C. Kruegel, and G. Vigna, “Going native: Using a large-scale analysis of android apps to create a practical native-code sandboxing policy,” in NDSS ’16, Feb. 2016.

  • [2] S. Arzt, S. Rasthofer, C. Fritz, E. Bodden, A. Bartel, J. Klein, Y. Le Traon, D. Octeau, and P. McDaniel, “Flow-Droid: Precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for android apps,” SIGPLAN Not., vol. 49, no. 6, pp. 259–269, Jun. 2014.

  • [3] R. Balebako, A. Marsh, J. Lin, J. Hong, and L. F. Cranor, “The privacy and security behaviors of smartphone app developers,” in USEC ’14, 2014.

  • [4] S. Bird, E. Klein, and E. Loper, “Natural language processing with python,” 2014, accessed: June 28, 2019. [Online]. Available: http://www.nltk.org/book/ch11.html

  • [5] J. Bowers, B. Reaves, I. N. Sherman, P. Traynor, and K. R. B. Butler, “Regulators, mount up! Analysis of privacy policies for mobile money services,” in SOUPS ’17, 2017.

  • [6] California Department of Justice, “Attorney General Kamala D. Harris secures global agreement to strengthen privacy protections for users of mobile applications,” http://www.oag.ca.gov/news/press-releases/attorney-general-kamala-d-harris-secures-global-agreement-strengthen-privacy, Feb. 2012, accessed: June 28, 2019.

  • [7] Y. Chen, W. You, Y. Lee, K. Chen, X. Wang, and W. Zou, “Mass discovery of android traffic imprints through instantiated partial execution,” in CCS ’17, 2017.

  • [8] B. Clark. (2017, Feb.) Millions of apps could soon be purged from Google Play Store. https://thenextweb.com/google/2017/02/08/millions-apps-soon-purged-google-play-store/.

  • [9] A. Continella, Y. Fratantonio, M. Lindorfer, A. Puccetti, A. Zand, C. Kruegel, and G. Vigna, “Obfuscation-resilient privacy leak detection for mobile apps through differential analysis,” in NDSS ’17, 2017.

  • [10] L. F. Cranor, P. G. Leon, and B. Ur, “A large-scale evaluation of U.S. financial institutions standardized privacy notices,” ACM Trans. Web, vol. 10, no. 3, pp. 17:1–17:33, Aug. 2016.

  • [11] Don Reisinger, “Google Play gets serious with ’expert’ screening, age ratings for Android apps,” https://www.cnet.com/news/google-play-adds-app-ratings-to-inform-users-on-content/, Mar. 2015, accessed: June 28, 2019.

  • [12] B. Efron, “Bootstrap methods: Another look at the jackknife,” in Breakthroughs in statistics. Springer, 1992, pp. 569–593.

  • [13] W. Enck, P. Gilbert, B.-G. Chun, L. P. Cox, J. Jung, P. McDaniel, and A. N. Sheth, “TaintDroid: An information-flow tracking system for realtime privacy monitoring on smartphones,” in OSDI ’10, 2010.

  • [14] T. Ermakova, B. Fabian, and E. Babina, “Readability of privacy policies of healthcare websites,” in Wirtschaftsinformatik ’15, 2015.

  • [15] ESRB, “ESRB ratings guide,” http://www.esrb.org/ratings/ratings_guide.aspx, 2018, accessed: June 28, 2019.

  • [16] FTC, “Complaint Path,” https://www.ftc.gov/sites/default/files/documents/cases/2013/02/130201pathinccmpt.pdf, Feb. 2013, accessed: June 28, 2019.

  • [17] C. Gibler, J. Crussell, J. Erickson, and H. Chen, “AndroidLeaks: Automatically detecting potential privacy leaks in android applications on a large scale,” in TRUST ’12, 2012.

  • [18] Google, “Designed for families addendum,” https://play.google.com/intl/ALL_us/about/families/developer-distribution-agreement-addendum.html, 2015, accessed: June 28, 2019.

  • [19] Google, “Google analytics terms of service,” https://www.google.com/analytics/terms/us.html, 2018, accessed: June 28, 2019.

  • [20] ——, “Google developer policy center user data,” https://play.google.com/about/privacy-security-deception/user-data/, 2018, accessed: June 28, 2019.

  • [21] Google, “Play console help,” https://support.google.com/googleplay/android-developer/answer/6048248?hl=en, 2018, accessed: June 28, 2019.

  • [22] M. I. Gordon, D. Kim, J. Perkins, L. Gilham, N. Nguyen, and M. Rinard, “Information-flow analysis of android applications in DroidSafe,” in NDSS ’15, 2015.

  • [23] H. Harkous, K. Fawaz, R. Lebret, F. Schaub, K. G. Shin, and K. Aberer, “Polisis: Automated analysis and presentation of privacy policies using deep learning,” in USENIX Security ’18, 2018.

  • [24] J. Huang, O. Schranz, S. Bugiel, and M. Backes, “The art of app compartmentalization: Compiler-based library privilege separation on stock android,” in CCS ’17, 2017.

  • [25] L. Lei, Y. He, K. Sun, J. Jing, Y. Wang, Q. Li, and J. Weng, “Vulnerable implicit service: A revisit,” in CCS ’17, 2017.

  • [26] T. Libert, “An automated approach to auditing disclosure of third-party data collection in website privacy policies,” in WWW ’18, 2018.

  • [27] J. Lin, B. Liu, N. Sadeh, and J. I. Hong, “Modeling users’ mobile app privacy preferences: Restoring usability in a sea of permission settings,” in SOUPS ’14. USENIX Assoc., 2014.

  • [28] B. Liu, B. Liu, H. Jin, and R. Govindan, “Efficient privilege de-escalation for ad libraries in mobile apps,” in MobiSys ’15, 2015.

  • [29] F. Liu, S. Wilson, P. Story, S. Zimmeck, and N. Sadeh, “Towards automatic classification of privacy policy text,” School of Computer Science Carnegie Mellon University, Pittsburgh, PA, Tech. Rep. CMU-ISR-17-118R and CMULTI-17-010, Jun. 2018.

  • [30] C. D. Manning, P. Raghavan, and H. Schütze, Introduction to information retrieval. Cambridge University Press, 2008.

  • [31] E. Mariconti, L. Onwuzurike, P. Andriotis, E. D. Cristofaro, G. J. Ross, and G. Stringhini, “Mamadroid: Detecting android malware by building markov chains of behavioral models,” in NDSS ’17, 2017.

  • [32] F. Marotta-Wurgler, “Does “notice and choice” disclosure regulation work? An empirical study of privacy policies,” https://www.law.umich.edu/centersandprograms/lawandeconomics/workshops/Documents/Paper13.Marotta-Wurgler.Does%20Notice%20and%20Choice%20Disclosure%20Work.pdf, 2015, accessed: June 28, 2019.

  • [33] A. M. McDonald and L. F. Cranor, “The cost of reading privacy policies,” I/S: A Journal of Law and Policy for the Information Society, vol. 4, no. 3, pp. 540–565, 2008.

  • [34] P. Mutchler, A. Doupé, J. Mitchell, C. Kruegel, and G. Vigna, “A large-scale study of mobile web app security,” in MoST ’15, 2015.

  • [35] Y. Nan, Z. Yang, X. Wang, Y. Zhang, D. Zhu, and M. Yang, “Finding clues for your secrets: Semantics-driven, learning-based privacy discovery in mobile apps,” in NDSS ’17, 2017.

  • [36] R. Neisse, G. Steri, D. Geneiatakis, and I. N. Fovino, “A privacy enforcing framework for android applications,” Computers & Security, vol. 62, pp. 257 – 277, 2016.

  • [37] Oracle, “Naming a package,” https://docs.oracle.com/javase/tutorial/java/package/namingpkgs.html, 2017, accessed: June 28, 2019.

  • [38] X. Pan, X. Wang, Y. Duan, X. Wang, and H. Yin, “Dark hazard: Learning-based, large-scale discovery of hidden sensitive operations in android apps,” in NDSS ’17, 2017.

  • [39] R. Ramanath, F. Liu, N. Sadeh, and N. A. Smith, “Unsupervised alignment of privacy policies using hidden markov models,” in ACL ’14, 2014.

  • [40] A. Razaghpanah, R. Nithyanand, N. Vallina-Rodriguez, S. Sundaresan, M. Allman, C. Kreibich, and P. Gill, “Apps, trackers, privacy and regulators: A global study of the mobile tracking ecosystem,” in NDSS ’18, 2018.

  • [41] A. Razaghpanah, N. Vallina-Rodriguez, S. Sundaresan, C. Kreibich, P. Gill, M. Allman, and V. Paxson, “Haystack: In situ mobile traffic analysis in user space,” CoRR, vol. abs/1510.01419, 2015.

  • [42] D. Reidsma and J. Carletta, “Reliability measurement without limits,” Comput. Linguist., vol. 34, no. 3, pp. 319–326, Sep. 2008.

  • [43] J. Ren, M. Lindorfer, D. Dubois, A. Rao, D. Choffnes, and N. Vallina-Rodriguez, “Bug fixes, improvements, ... and privacy leaks – a longitudinal study of PII leaks across android app versions,” in NDSS ’18, 2018.

  • [44] J. Ren, A. Rao, M. Lindorfer, A. Legout, and D. Choffnes, “Recon: Revealing and controlling PII leaks in mobile network traffic,” in MobiSys ’16, 2016.

  • [45] I. Reyes, P. Wijesekera, J. Reardon, A. E. B. On, A. Razaghpanah, N. Vallina-Rodriguez, and S. Egelman, ““Won’t somebody think of the children?" Examining COPPA compliance at scale,” in PETS ’18, vol. 3, 2018, pp. 63–83.

  • [46] N. Sadeh, A. Acquisti, T. D. Breaux, L. F. Cranor, A. M. McDonald, J. R. Reidenberg, N. A. Smith, F. Liu, N. C. Russell, F. Schaub, and S. Wilson, “The usable privacy policy project,” Carnegie Mellon University, Tech. report CMU-ISR-13-119, 2013.

  • [47] K. M. Sathyendra, S. Wilson, F. Schaub, S. Zimmeck, and N. Sadeh, “Identifying the provision of choices in privacy policy text,” in EMNLP ’17, 2017.

  • [48] scikit-learn developers, “sklearn.feature_extraction.text.tfidfvectorizer,” http://scikit-learn.org/0.18/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html, 2016, accessed: June 28, 2019.

  • [49] ——, “sklearn.linear_model.logisticregression,” http://scikit-learn.org/0.18/modules/generated/sklearn.linear_model.LogisticRegression.html, 2016, accessed: June 28, 2019.

  • [50] ——, “sklearn.svm.svc,” http://scikit-learn.org/0.18/modules/generated/sklearn.svm.SVC.html, 2016, accessed: June 28, 2019.

  • [51] R. Slavin, X. Wang, M. Hosseini, W. Hester, R. Krishnan, J. Bhatia, T. Breaux, and J. Niu, “Toward a framework for detecting privacy policy violation in android application code,” in ICSE ’16, 2016.

  • [52] D. J. Solove and W. Hartzog, “The FTC and the new common law of privacy,” Columbia Law Review, vol. 114, pp. 583–676, 2014.

  • [53] P. Story, S. Zimmeck, A. Ravichander, D. Smullen, Z. Wang, J. Reidenberg, N. C. Russell, and N. Sadeh, “Natural language processing for mobile app privacy compliance,” AAAI Spring Symposium on Privacy-Enhancing Artificial Intelligence and Language Technologies, Mar. 2019.

  • [54] P. Story, S. Zimmeck, and N. Sadeh, “Which apps have privacy policies?” in APF ’18, 2018.

  • [55] W. B. Tesfay, P. Hofmann, T. Nakamura, S. Kiyomoto, and J. Serna, “I read but don’t agree: Privacy policy benchmarking using machine learning and the EU GDPR,” in WWW ’18, 2018.

  • [56] G. Tottie, Negation in English speech and writing. Academic Press, 1991.

  • [57] J. Towns, T. Cockerill, M. Dahan, I. Foster, K. Gaither, A. Grimshaw, V. Hazlewood, S. Lathrop, D. Lifka, G. D. Peterson, R. Roskies, J. R. Scott, and N. Wilkins-Diehr, “XSEDE: Accelerating scientific discovery,” Computing in Science & Engineering, vol. 16, no. 5, pp. 62–74, Sep. 2014.

  • [58] G. S. Tuncay, S. Demetriou, K. Ganju, and C. A. Gunter, “Resolving the predicament of android custom permissions,” in NDSS ’18, 2018.

  • [59] N. Viennot, E. Garcia, and J. Nieh, “A measurement study of Google Play,” in SIGMETRICS ’14, 2014.

  • [60] H. Wang, Z. Liu, Y. Guo, X. Chen, M. Zhang, G. Xu, and J. Hong, “An explorative study of the mobile app ecosystem from app developers’ perspective,” in WWW ’17, 2017.

  • [61] X. Wang, X. Qin, M. B. Hosseini, R. Slavin, T. D. Breaux, and J. Niu, “GUILeak: Identifying privacy practices on GUI-based data,” https://pdfs.semanticscholar.org/ced1/313acaacd3897b5b231cdccb1383d01d20c4.pdf, 2017, accessed: June 28, 2019.

  • [62] T. Watanabe, M. Akiyama, T. Sakai, and T. Mori, “Understanding the inconsistencies between text descriptions and the use of privacy-sensitive resources of mobile apps,” in SOUPS ’15, 2015.

  • [63] S. Wilson, F. Schaub, A. A. Dara, F. Liu, S. Cherivirala, P. G. Leon, M. S. Andersen, S. Zimmeck, K. M. Sathyendra, N. C. Russell, T. B. Norton, E. Hovy, J. Reidenberg, and N. Sadeh, “The creation and analysis of a website privacy policy corpus,” in ACL ’16, 2016.

  • [64] L. Yu, X. Luo, X. Liu, and T. Zhang, “Can we trust the privacy policies of android apps?” in DSN ’16, 2016.

  • [65] Y. Zhuang, A. Rafetseder, Y. Hu, Y. Tian, and J. Cappos, “Sensibility Testbed: Automated IRB policy enforcement in mobile research apps,” in HotMobile ’18, 2018.

  • [66] S. Zimmeck and S. M. Bellovin, “Privee: An architecture for automatically analyzing web privacy policies,” in USENIX Security ’14, 2014.

  • [67] S. Zimmeck, Z. Wang, L. Zou, R. Iyengar, B. Liu, F. Schaub, S. Wilson, N. Sadeh, S. M. Bellovin, and J. Reidenberg, “Automated analysis of privacy requirements for mobile apps,” in NDSS ’17, 2017.

OPEN ACCESS

Journal + Issues

Search