Stylometric authorship attribution aims to identify an anonymous or disputed document’s author by examining its writing style. The development of powerful machine learning based stylometric authorship attribution methods presents a serious privacy threat for individuals such as journalists and activists who wish to publish anonymously. Researchers have proposed several authorship obfuscation approaches that try to make appropriate changes (e.g. word/phrase replacements) to evade attribution while preserving semantics. Unfortunately, existing authorship obfuscation approaches are lacking because they either require some manual effort, require significant training data, or do not work for long documents. To address these limitations, we propose a genetic algorithm based random search framework called Mutant-X which can automatically obfuscate text to successfully evade attribution while keeping the semantics of the obfuscated text similar to the original text. Specifically, Mutant-X sequentially makes changes in the text using mutation and crossover techniques while being guided by a fitness function that takes into account both attribution probability and semantic relevance. While Mutant-X requires black-box knowledge of the adversary’s classifier, it does not require any additional training data and also works on documents of any length. We evaluate Mutant-X against a variety of authorship attribution methods on two different text corpora. Our results show that Mutant-X can decrease the accuracy of state-of-the-art authorship attribution methods by as much as 64% while preserving the semantics much better than existing automated authorship obfuscation approaches. While Mutant-X advances the state-of-the-art in automated authorship obfuscation, we find that it does not generalize to a stronger threat model where the adversary uses a different attribution classifier than what Mutant-X assumes. Our findings warrant the need for future research to improve the generalizability (or transferability) of automated authorship obfuscation approaches.
 A. Abbasi and H. Chen. Writeprints: A stylometric approach to identity-level identification and similarity detection in cyberspace. ACM Transactions on Information Systems (TOIS), 26(2):7, 2008.
 S. Afroz, A. C. Islam, A. Stolerman, R. Greenstadt, and D. McCoy. Doppelgänger finder: Taking stylometry to the underground. In IEEE Symposium on Security and Privacy (IEEE S&P), pages 212–226. IEEE, 2014.
 M. Almishari, D. Kaafar, G. Tsudik, and E. Oguz. Stylometric linkability of tweets. In Proceedings of the 13th Workshop on Privacy in the Electronic Society (WPES 2014), pages 205–208. ACM, 2014.
 M. Almishari, E. Oguz, and G. Tsudik. Fighting Authorship Linkability with Crowdsourcing. In ACM Conference on Online Social Networks (COSN), 2014.
 M. AlSallal, R. Iqbal, S. Amin, A. James, and V. Palade. An Integrated Machine Learning Approach for Extrinsic Plagiarism Detection. In 9th International Conference on Developments in eSystems Engineering. IEEE, 2016.
 M. Brennan and R. Greenstadt. Practical Attacks Against Authorship Recognition Techniques. In Proceedings of the Twenty-First Conference on Innovative Applications of Artificial Intelligence, 2009.
 M. Brennan, S. Afroz, and R. Greenstadt. Adversarial stylometry: Circumventing authorship recognition to preserve privacy and anonymity. In ACM Transactions on Information and System Security (TISSEC), volume 15, 2012.
 D. Castro-Castro, R. O. Bueno, and R. Munoz. Author Masking by Sentence Transformation. In Notebook for PAN at CLEF, 2017.
 J. H. Clark and C. J. Hannon. An Algorithm for Identifying Authors Using Synonyms. In Eighth Mexican International Conference on Current Trends in Computer Science (ENC 2007), pages 99–104. IEEE, 2007.
 M. Denkowski and A. Lavie. Meteor universal: Language specific translation evaluation for any target language. In Proceedings of the Ninth Workshop on Statistical Machine Translation, pages 376–380, Baltimore, Maryland, USA, June 2014. Association for Computational Linguistics. 10.3115/v1/W14-3348. URL https://www.aclweb.org/anthology/W14-3348.
 O. Ehmoda and E. Charniak. Statistical Stylometrics and the Marlowe-Shakespeare Authorship Debate. Master’s thesis, Department of Cognitive, Linguistic & Psychological Sciences, Brown University, 2012.
 C. Emmery, E. Manjavacas, and G. Chrupala. Style Obfuscation by Invariance. In Proceedings of the 27th International Conference on Computational Linguistics, 2018.
 D. E. Goldberg. Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley Longman Publishing Co., Inc. Boston, MA, USA, 1989.
 R. Gunning. The Fog Index After Twenty Years. International Journal of Business Communication, 1969.
 J. Guo, S. Lu, H. Cai, W. Zhang, and Y. Yu. Long text generation via adversarial training with leaked information. In AAAI, 2018.
 D. I. Holmes and R. S. Forsyth. The Federalist revisited: New directions in authorship attribution. PhD thesis, Department of Mathematical Sciences, University of the West of England, Bristol, UK, 1995.
 J. Howard and S. Ruder. Universal Language Model Fine-tuning for Text Classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, volume abs/1801.06146, 2018.
 G. Karadzhov, T. Mihaylova, Y. Kiprov, G. Georgiev, I. Koychev, and P. Nakov. The case for being average: A mediocrity approach to style masking and author obfuscation. In International Conference of the Cross-Language Evaluation Forum for European Languages, pages 173–185. Springer, 2017.
 Y. Keswani, H. Trivedi, P. Mehta, and P. Majumder. Author Masking through Translation. In Notebook for PAN at CLEF 2016, pages 890–894, 2016.
 M. Mansoorizadeh, T. Rahgooy, M. Aminiyan, and M. Eskandari. Author obfuscation using WordNet and language models. In Notebook for PAN at CLEF 2016, 2016.
 A. W. McDonald, S. Afroz, A. Caliskan, A. Stolerman, and R. Greenstadt. Use fewer instances of the letter ‘i’: Toward writing style anonymization. In International Symposium on Privacy Enhancing Technologies Symposium, pages 299–318. Springer, 2012.
 A. W. McDonald, J. Ulman, M. Barrowclift, and R. Greenstadt. Anonymouth Revamped: Getting Closer to Stylometric Anonymity. In PETools: Workshop on Privacy Enhancing Tools, volume 20, 2013.
 L. McInnes, J. Healy, and J. Melville. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. ArXiv e-prints, Feb. 2018.
 T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed Representations of Words and Phrases and their Compositionality. In Advances in Neural Information Processing Systems, pages 3111–3119, 2013.
 G. A. Miller. WordNet: a lexical database for English. Communications of the ACM, 38(11):39–41, 1995.
 F. Mosteller and D. Wallace. Inference and disputed authorship: The Federalist. 1964.
 A. Narayanan, H. Paskov, N. Z. Gong, J. Bethencourt, E. Stefanov, E. C. R. Shin, and D. Song. On the Feasibility of Internet-Scale Author Identification. In IEEE Symposium on Security and Privacy (SP), pages 300–314. IEEE, 2012.
 R. Overdorf and R. Greenstadt. Blogs, twitter feeds, and reddit comments: Cross-domain authorship attribution. 2016.
 E. Pitler and A. Nenkova. Revisiting Readability: A Unified Framework for Predicting Text Quality. In Empirical Methods in Natural Language Processing (EMNLP), 2008.
 S. Potthast and S. Hagen. Overview of the Author Obfuscation Task at PAN 2018: A New Approach to Measuring Safety. In Notebook for PAN at CLEF 2018, 2018.
 P. Rajapaksha, R. Farahbakhsh, and N. Crespi. Identifying Content Originator in Social Networks. In IEEE Global Communications Conference, pages 1–6. IEEE, 2017.
 S. Ruder, P. Ghaffari, and J. G. Breslin. Character-level and multi-channel convolutional neural networks for large-scale authorship attribution. arXiv:1609.06686, 2016. URL https://arxiv.org/abs/1609.06686.
 J. Schler, M. Koppel, S. Argamon, and J. W. Pennebaker. Effects of age and gender on blogging. In AAAI spring symposium: Computational approaches to analyzing weblogs, volume 6, pages 199–205, 2006.
 U. Shahid, S. Farooqi, R. Ahmad, Z. Shafiq, P. Srinivasan, and F. Zaffar. Accurate detection of automatically spun content via stylometric analysis. In Data Mining (ICDM), 2017 IEEE International Conference on, pages 425–434. IEEE, 2017.
 R. Shetty, B. Schiele, and M. Fritz. A4NT: Author Attribute Anonymity by Adversarial Training of Neural Machine Translation. In USENIX Security Symposium, 2018.
 A. Stolerman, R. Overdorf, S. Afroz, and R. Greenstadt. Classify, but verify: Breaking the closed-world assumption in stylometric authorship attribution. In IFIP Working Group, volume 11, page 64, 2013.
 D. Tang, F. Wei, N. Yang, M. Zhou, T. Liu, and B. Qin. Learning sentiment-specific word embedding for twitter sentiment classification. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, pages 1555–1565, 2014.
 L.-C. Yu, J. Wang, K. R. Lai, and X. Zhang. Refining word embeddings using intensity scores for sentiment analysis. In IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2018.