Evaluation of Fingerprint Selection Algorithms for Local Text Reuse Detection

Gints Jēkabsons

Open Access

Evaluation of Fingerprint Selection Algorithms for Local Text Reuse Detection

Gints Jēkabsons

| Jun 05, 2020

Applied Computer Systems

Volume 25 (2020): Issue 1 (May 2020)

About this article

Cite

Page range: 11 - 18

DOI: https://doi.org/10.2478/acss-2020-0002

Keywords
Document fingerprinting, fingerprint selection, local text reuse detection, plagiarism detection

© 2020 Gints Jēkabsons, published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 Public License.

Detection of local text reuse is central to a variety of applications, including plagiarism detection, origin detection, and information flow analysis. This paper evaluates and compares effectiveness of fingerprint selection algorithms for the source retrieval stage of local text reuse detection. In total, six algorithms are compared – Every p-th, 0 mod p, Winnowing, Hailstorm, Frequency-biased Winnowing (FBW), as well as the proposed modified version of FBW (MFBW).

Most of the previously published studies in local text reuse detection are based on datasets having either artificially generated, long-sized, or unobfuscated text reuse. In this study, to evaluate performance of the algorithms, a new dataset has been built containing real text reuse cases from Bachelor and Master Theses (written in English in the field of computer science) where about half of the cases involve less than 1 % of document text while about two-thirds of the cases involve paraphrasing.

In the performed experiments, the overall best detection quality is reached by Winnowing, 0 mod p, and MFBW. The proposed MFBW algorithm is a considerable improvement over FBW and becomes one of the best performing algorithms.

The software developed for this study is freely available at the author’s website http://www.cs.rtu.lv/jekabsons/.

eISSN:: 2255-8691
Language:: English

Publication timeframe:: 2 times per year
Journal Subjects:: Computer Sciences, Artificial Intelligence, Information Technology, Project Management, Software Development

Journal RSS Feed

Evaluation of Fingerprint Selection Algorithms for Local Text Reuse Detection

Published Online: Jun 05, 2020

Page range: 11 - 18

DOI: https://doi.org/10.2478/acss-2020-0002

KeywordsDocument fingerprinting, fingerprint selection, local text reuse detection, plagiarism detection

© 2020 Gints Jēkabsons, published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 Public License.

Keywords
Document fingerprinting, fingerprint selection, local text reuse detection, plagiarism detection