On the (In)effectiveness of Mosaicing and Blurring as Tools for Document Redaction

Open access


In many online communities, it is the norm to redact names and other sensitive text from posted screenshots. Sometimes solid bars are used; sometimes a blur or other image transform is used. We consider the effectiveness of two popular image transforms - mosaicing (also known as pixelization) and blurring - for redaction of text. Our main finding is that we can use a simple but powerful class of statistical models - so-called hidden Markov models (HMMs) - to recover both short and indefinitely long instances of redacted text. Our approach borrows on the success of HMMs for automatic speech recognition, where they are used to recover sequences of phonemes from utterances of speech. Here we use HMMs in an analogous way to recover sequences of characters from images of redacted text. We evaluate an implementation of our system against multiple typefaces, font sizes, grid sizes, pixel offsets, and levels of noise. We also decode numerous real-world examples of redacted text. We conclude that mosaicing and blurring, despite their widespread usage, are not viable approaches for text redaction.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • [1] Cavedon L. Foschini L. and Vigna G. Getting the face behind the squares: Reconstructing pixelized video streams. In WOOT (2011) pp. 37-45.

  • [2] Chen F. and Ma J. An empirical identification method of gaussian blur parameter for image deblurring. Signal Processing IEEE Transactions on 57 7 (2009) 2467-2478.

  • [3] Chen X. Yang J. and Wu Q. Image deblur in gradient domain Optical Engineering Optical Engineering 49 11 (2010) 117003-117003.

  • [4] Dufaux F. Video scrambling for privacy protection in video surveillance: recent results and validation framework. In SPIE Defense Security and Sensing (2011) International Society for Optics and Photonics pp. 806302-806302.

  • [5] Eddy S. What is a hidden markov model? Nature biotechnology 22 10 (2004) 1315-1316.

  • [6] Ford R. and Mayron L. M. All Your Base Are Belong to US. In Proceedings of NSPW 2012 (2012) ACM pp. 105-14.

  • [7] Ho N. Z.-Y. and Chang E.-C. Residual information of redacted images hidden in the compression artifacts. In Information Hiding (2008) Springer pp. 87-101.

  • [8] Hu J. Brown M. K. and Turin W. Hmm based online handwriting recognition. Pattern Analysis and Machine Intelligence IEEE Transactions on 18 10 (1996) 1039-1045.

  • [9] Lopresti D. and Spitz A. L. Quantifying information leakage in document redaction. In Proceedings of the 1st ACM workshop on Hardcopy document processing (2004) ACM pp. 63-69.

  • [10] Lopresti D. P. and Spitz A. L. Information leakage through document redaction: attacks and countermeasures. In Electronic Imaging 2005 (2005) International Society for Optics and Photonics pp. 183-190.

  • [11] MacQueen J. et al. Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability (1967) vol. 1 Oakland CA USA. pp. 281-297.

  • [12] Mancas-Thillou C. and Mirmehdi M. An introduction to super-resolution text. In Digital Document Processing. Springer 2007 pp. 305-327.

  • [13] Marelli M. Menini S. Baroni M. Bentivogli L. Bernardi R. and Zamparelli R. A sick cure for the evaluation of compositional distributional semantic models. In Proceedings of the Ninth International Language Resources and Evaluation Conference (2014) pp. 216-223.

  • [14] Naccache D. and Whelan C. 9/11: Who alerted the cia?(and other secret secrets). Rump session Eurocrypt (2004).

  • [15] Newton E. M. Sweeney L. and Malin B. Preserving privacy by de-identifying face images. Knowledge and Data Engineering IEEE Transactions on 17 2 (2005) 232-243.

  • [16] Nizza M. Interpol untwirls a suspected pedophile. http://thelede.blogs.nytimes.com/2007/10/08/interpol-untwirlsa-suspected-pedophile/ 2007.

  • [17] Padilla-López J. R. Chaaraoui A. A. and Flórez- Revuelta F. Visual privacy protection methods: A survey. Expert Systems with Applications 42 9 (2015) 4177-4195.

  • [18] Rabiner L. R. A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE 77 2 (1989) 257-286.

  • [19] Venkatraman D. Why blurring sensitive information is a bad idea. https://dheera.net/projects/blur 2014.

  • [20] White A. M. Matthews A. R. Snow K. Z. and Monrose F. Phonotactic reconstruction of encrypted voip conversations: Hookt on fon-iks. In Security and Privacy (SP) 2011 IEEE Symposium on (2011) IEEE pp. 3-18.

  • [21] Zhuang L. Zhou F. and Tygar J. D. Keyboard acoustic emanations revisited. ACM Transactions on Information and System Security (TISSEC) 13 1 (2009) 3.

  • [22] Vanhoef M. Piessens F. All your biases belong to us: Breaking RC4 in WPA-TKIP and TLS. In 24th USENIX Security Symposium (USENIX Security 15) 2015

  • [23] Bricout R. Murphy S. Paterson K. and Merwe T. Analysing and Exploiting the Mantin Biases in RC4. In Cryptology ePrint Archive Report 2016/063 2016

Journal information
Cited By
All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 420 268 15
PDF Downloads 242 159 4