MAPS: Scaling Privacy Compliance Analysis to a Million Apps

Open access

Abstract

The app economy is largely reliant on data collection as its primary revenue model. To comply with legal requirements, app developers are often obligated to notify users of their privacy practices in privacy policies. However, prior research has suggested that many developers are not accurately disclosing their apps’ privacy practices. Evaluating discrepancies between apps’ code and privacy policies enables the identification of potential compliance issues. In this study, we introduce the Mobile App Privacy System (MAPS) for conducting an extensive privacy census of Android apps. We designed a pipeline for retrieving and analyzing large app populations based on code analysis and machine learning techniques. In its first application, we conduct a privacy evaluation for a set of 1,035,853 Android apps from the Google Play Store. We find broad evidence of potential non-compliance. Many apps do not have a privacy policy to begin with. Policies that do exist are often silent on the practices performed by apps. For example, 12.1% of apps have at least one location-related potential compliance issue. We hope that our extensive analysis will motivate app stores, government regulators, and app developers to more effectively review apps for potential compliance issues.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • [1] V. Afonso A. Bianchi Y. Fratantonio A. Doupe M. Polino P. de Geus C. Kruegel and G. Vigna “Going native: Using a large-scale analysis of android apps to create a practical native-code sandboxing policy” in NDSS ’16 Feb. 2016.

  • [2] S. Arzt S. Rasthofer C. Fritz E. Bodden A. Bartel J. Klein Y. Le Traon D. Octeau and P. McDaniel “Flow-Droid: Precise context flow field object-sensitive and lifecycle-aware taint analysis for android apps” SIGPLAN Not. vol. 49 no. 6 pp. 259–269 Jun. 2014.

  • [3] R. Balebako A. Marsh J. Lin J. Hong and L. F. Cranor “The privacy and security behaviors of smartphone app developers” in USEC ’14 2014.

  • [4] S. Bird E. Klein and E. Loper “Natural language processing with python” 2014 accessed: June 28 2019. [Online]. Available: http://www.nltk.org/book/ch11.html

  • [5] J. Bowers B. Reaves I. N. Sherman P. Traynor and K. R. B. Butler “Regulators mount up! Analysis of privacy policies for mobile money services” in SOUPS ’17 2017.

  • [6] California Department of Justice “Attorney General Kamala D. Harris secures global agreement to strengthen privacy protections for users of mobile applications” http://www.oag.ca.gov/news/press-releases/attorney-general-kamala-d-harris-secures-global-agreement-strengthen-privacy Feb. 2012 accessed: June 28 2019.

  • [7] Y. Chen W. You Y. Lee K. Chen X. Wang and W. Zou “Mass discovery of android traffic imprints through instantiated partial execution” in CCS ’17 2017.

  • [8] B. Clark. (2017 Feb.) Millions of apps could soon be purged from Google Play Store. https://thenextweb.com/google/2017/02/08/millions-apps-soon-purged-google-play-store/.

  • [9] A. Continella Y. Fratantonio M. Lindorfer A. Puccetti A. Zand C. Kruegel and G. Vigna “Obfuscation-resilient privacy leak detection for mobile apps through differential analysis” in NDSS ’17 2017.

  • [10] L. F. Cranor P. G. Leon and B. Ur “A large-scale evaluation of U.S. financial institutions standardized privacy notices” ACM Trans. Web vol. 10 no. 3 pp. 17:1–17:33 Aug. 2016.

  • [11] Don Reisinger “Google Play gets serious with ’expert’ screening age ratings for Android apps” https://www.cnet.com/news/google-play-adds-app-ratings-to-inform-users-on-content/ Mar. 2015 accessed: June 28 2019.

  • [12] B. Efron “Bootstrap methods: Another look at the jackknife” in Breakthroughs in statistics. Springer 1992 pp. 569–593.

  • [13] W. Enck P. Gilbert B.-G. Chun L. P. Cox J. Jung P. McDaniel and A. N. Sheth “TaintDroid: An information-flow tracking system for realtime privacy monitoring on smartphones” in OSDI ’10 2010.

  • [14] T. Ermakova B. Fabian and E. Babina “Readability of privacy policies of healthcare websites” in Wirtschaftsinformatik ’15 2015.

  • [15] ESRB “ESRB ratings guide” http://www.esrb.org/ratings/ratings_guide.aspx 2018 accessed: June 28 2019.

  • [16] FTC “Complaint Path” https://www.ftc.gov/sites/default/files/documents/cases/2013/02/130201pathinccmpt.pdf Feb. 2013 accessed: June 28 2019.

  • [17] C. Gibler J. Crussell J. Erickson and H. Chen “AndroidLeaks: Automatically detecting potential privacy leaks in android applications on a large scale” in TRUST ’12 2012.

  • [18] Google “Designed for families addendum” https://play.google.com/intl/ALL_us/about/families/developer-distribution-agreement-addendum.html 2015 accessed: June 28 2019.

  • [19] Google “Google analytics terms of service” https://www.google.com/analytics/terms/us.html 2018 accessed: June 28 2019.

  • [20] —— “Google developer policy center user data” https://play.google.com/about/privacy-security-deception/user-data/ 2018 accessed: June 28 2019.

  • [21] Google “Play console help” https://support.google.com/googleplay/android-developer/answer/6048248?hl=en 2018 accessed: June 28 2019.

  • [22] M. I. Gordon D. Kim J. Perkins L. Gilham N. Nguyen and M. Rinard “Information-flow analysis of android applications in DroidSafe” in NDSS ’15 2015.

  • [23] H. Harkous K. Fawaz R. Lebret F. Schaub K. G. Shin and K. Aberer “Polisis: Automated analysis and presentation of privacy policies using deep learning” in USENIX Security ’18 2018.

  • [24] J. Huang O. Schranz S. Bugiel and M. Backes “The art of app compartmentalization: Compiler-based library privilege separation on stock android” in CCS ’17 2017.

  • [25] L. Lei Y. He K. Sun J. Jing Y. Wang Q. Li and J. Weng “Vulnerable implicit service: A revisit” in CCS ’17 2017.

  • [26] T. Libert “An automated approach to auditing disclosure of third-party data collection in website privacy policies” in WWW ’18 2018.

  • [27] J. Lin B. Liu N. Sadeh and J. I. Hong “Modeling users’ mobile app privacy preferences: Restoring usability in a sea of permission settings” in SOUPS ’14. USENIX Assoc. 2014.

  • [28] B. Liu B. Liu H. Jin and R. Govindan “Efficient privilege de-escalation for ad libraries in mobile apps” in MobiSys ’15 2015.

  • [29] F. Liu S. Wilson P. Story S. Zimmeck and N. Sadeh “Towards automatic classification of privacy policy text” School of Computer Science Carnegie Mellon University Pittsburgh PA Tech. Rep. CMU-ISR-17-118R and CMULTI-17-010 Jun. 2018.

  • [30] C. D. Manning P. Raghavan and H. Schütze Introduction to information retrieval. Cambridge University Press 2008.

  • [31] E. Mariconti L. Onwuzurike P. Andriotis E. D. Cristofaro G. J. Ross and G. Stringhini “Mamadroid: Detecting android malware by building markov chains of behavioral models” in NDSS ’17 2017.

  • [32] F. Marotta-Wurgler “Does “notice and choice” disclosure regulation work? An empirical study of privacy policies” https://www.law.umich.edu/centersandprograms/lawandeconomics/workshops/Documents/Paper13.Marotta-Wurgler.Does%20Notice%20and%20Choice%20Disclosure%20Work.pdf 2015 accessed: June 28 2019.

  • [33] A. M. McDonald and L. F. Cranor “The cost of reading privacy policies” I/S: A Journal of Law and Policy for the Information Society vol. 4 no. 3 pp. 540–565 2008.

  • [34] P. Mutchler A. Doupé J. Mitchell C. Kruegel and G. Vigna “A large-scale study of mobile web app security” in MoST ’15 2015.

  • [35] Y. Nan Z. Yang X. Wang Y. Zhang D. Zhu and M. Yang “Finding clues for your secrets: Semantics-driven learning-based privacy discovery in mobile apps” in NDSS ’17 2017.

  • [36] R. Neisse G. Steri D. Geneiatakis and I. N. Fovino “A privacy enforcing framework for android applications” Computers & Security vol. 62 pp. 257 – 277 2016.

  • [37] Oracle “Naming a package” https://docs.oracle.com/javase/tutorial/java/package/namingpkgs.html 2017 accessed: June 28 2019.

  • [38] X. Pan X. Wang Y. Duan X. Wang and H. Yin “Dark hazard: Learning-based large-scale discovery of hidden sensitive operations in android apps” in NDSS ’17 2017.

  • [39] R. Ramanath F. Liu N. Sadeh and N. A. Smith “Unsupervised alignment of privacy policies using hidden markov models” in ACL ’14 2014.

  • [40] A. Razaghpanah R. Nithyanand N. Vallina-Rodriguez S. Sundaresan M. Allman C. Kreibich and P. Gill “Apps trackers privacy and regulators: A global study of the mobile tracking ecosystem” in NDSS ’18 2018.

  • [41] A. Razaghpanah N. Vallina-Rodriguez S. Sundaresan C. Kreibich P. Gill M. Allman and V. Paxson “Haystack: In situ mobile traffic analysis in user space” CoRR vol. abs/1510.01419 2015.

  • [42] D. Reidsma and J. Carletta “Reliability measurement without limits” Comput. Linguist. vol. 34 no. 3 pp. 319–326 Sep. 2008.

  • [43] J. Ren M. Lindorfer D. Dubois A. Rao D. Choffnes and N. Vallina-Rodriguez “Bug fixes improvements ... and privacy leaks – a longitudinal study of PII leaks across android app versions” in NDSS ’18 2018.

  • [44] J. Ren A. Rao M. Lindorfer A. Legout and D. Choffnes “Recon: Revealing and controlling PII leaks in mobile network traffic” in MobiSys ’16 2016.

  • [45] I. Reyes P. Wijesekera J. Reardon A. E. B. On A. Razaghpanah N. Vallina-Rodriguez and S. Egelman ““Won’t somebody think of the children?" Examining COPPA compliance at scale” in PETS ’18 vol. 3 2018 pp. 63–83.

  • [46] N. Sadeh A. Acquisti T. D. Breaux L. F. Cranor A. M. McDonald J. R. Reidenberg N. A. Smith F. Liu N. C. Russell F. Schaub and S. Wilson “The usable privacy policy project” Carnegie Mellon University Tech. report CMU-ISR-13-119 2013.

  • [47] K. M. Sathyendra S. Wilson F. Schaub S. Zimmeck and N. Sadeh “Identifying the provision of choices in privacy policy text” in EMNLP ’17 2017.

  • [48] scikit-learn developers “sklearn.feature_extraction.text.tfidfvectorizer” http://scikit-learn.org/0.18/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html 2016 accessed: June 28 2019.

  • [49] —— “sklearn.linear_model.logisticregression” http://scikit-learn.org/0.18/modules/generated/sklearn.linear_model.LogisticRegression.html 2016 accessed: June 28 2019.

  • [50] —— “sklearn.svm.svc” http://scikit-learn.org/0.18/modules/generated/sklearn.svm.SVC.html 2016 accessed: June 28 2019.

  • [51] R. Slavin X. Wang M. Hosseini W. Hester R. Krishnan J. Bhatia T. Breaux and J. Niu “Toward a framework for detecting privacy policy violation in android application code” in ICSE ’16 2016.

  • [52] D. J. Solove and W. Hartzog “The FTC and the new common law of privacy” Columbia Law Review vol. 114 pp. 583–676 2014.

  • [53] P. Story S. Zimmeck A. Ravichander D. Smullen Z. Wang J. Reidenberg N. C. Russell and N. Sadeh “Natural language processing for mobile app privacy compliance” AAAI Spring Symposium on Privacy-Enhancing Artificial Intelligence and Language Technologies Mar. 2019.

  • [54] P. Story S. Zimmeck and N. Sadeh “Which apps have privacy policies?” in APF ’18 2018.

  • [55] W. B. Tesfay P. Hofmann T. Nakamura S. Kiyomoto and J. Serna “I read but don’t agree: Privacy policy benchmarking using machine learning and the EU GDPR” in WWW ’18 2018.

  • [56] G. Tottie Negation in English speech and writing. Academic Press 1991.

  • [57] J. Towns T. Cockerill M. Dahan I. Foster K. Gaither A. Grimshaw V. Hazlewood S. Lathrop D. Lifka G. D. Peterson R. Roskies J. R. Scott and N. Wilkins-Diehr “XSEDE: Accelerating scientific discovery” Computing in Science & Engineering vol. 16 no. 5 pp. 62–74 Sep. 2014.

  • [58] G. S. Tuncay S. Demetriou K. Ganju and C. A. Gunter “Resolving the predicament of android custom permissions” in NDSS ’18 2018.

  • [59] N. Viennot E. Garcia and J. Nieh “A measurement study of Google Play” in SIGMETRICS ’14 2014.

  • [60] H. Wang Z. Liu Y. Guo X. Chen M. Zhang G. Xu and J. Hong “An explorative study of the mobile app ecosystem from app developers’ perspective” in WWW ’17 2017.

  • [61] X. Wang X. Qin M. B. Hosseini R. Slavin T. D. Breaux and J. Niu “GUILeak: Identifying privacy practices on GUI-based data” https://pdfs.semanticscholar.org/ced1/313acaacd3897b5b231cdccb1383d01d20c4.pdf 2017 accessed: June 28 2019.

  • [62] T. Watanabe M. Akiyama T. Sakai and T. Mori “Understanding the inconsistencies between text descriptions and the use of privacy-sensitive resources of mobile apps” in SOUPS ’15 2015.

  • [63] S. Wilson F. Schaub A. A. Dara F. Liu S. Cherivirala P. G. Leon M. S. Andersen S. Zimmeck K. M. Sathyendra N. C. Russell T. B. Norton E. Hovy J. Reidenberg and N. Sadeh “The creation and analysis of a website privacy policy corpus” in ACL ’16 2016.

  • [64] L. Yu X. Luo X. Liu and T. Zhang “Can we trust the privacy policies of android apps?” in DSN ’16 2016.

  • [65] Y. Zhuang A. Rafetseder Y. Hu Y. Tian and J. Cappos “Sensibility Testbed: Automated IRB policy enforcement in mobile research apps” in HotMobile ’18 2018.

  • [66] S. Zimmeck and S. M. Bellovin “Privee: An architecture for automatically analyzing web privacy policies” in USENIX Security ’14 2014.

  • [67] S. Zimmeck Z. Wang L. Zou R. Iyengar B. Liu F. Schaub S. Wilson N. Sadeh S. M. Bellovin and J. Reidenberg “Automated analysis of privacy requirements for mobile apps” in NDSS ’17 2017.

Search
Journal information
Metrics
All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 228 228 111
PDF Downloads 132 132 36