CCR: A combined cleaning and resampling algorithm for imbalanced data classification

Imbalanced data classification is one of the most widespread challenges in contemporary pattern recognition. Varying levels of imbalance may be observed in most real datasets, affecting the performance of classification algorithms. Particularly, high levels of imbalance make serious difficulties, often requiring the use of specially designed methods. In such cases the most important issue is often to properly detect minority examples, but at the same time the performance on the majority class cannot be neglected. In this paper we describe a novel resampling technique focused on proper detection of minority examples in a two-class imbalanced data task. The proposed method combines cleaning the decision border around minority objects with guided synthetic oversampling. Results of the conducted experimental study indicate that the proposed algorithm usually outperforms the conventional oversampling approaches, especially when the detection of minority examples is considered.

eISSN:: 2083-8492
Language:: English

Publication timeframe:: 4 times per year
Journal Subjects:: Mathematics, Applied Mathematics

Journal RSS Feed

CCR: A combined cleaning and resampling algorithm for imbalanced data classification

Published Online: Jan 13, 2018

Page range: 727 - 736

Received: Jan 12, 2016

Accepted: Aug 25, 2017

DOI: https://doi.org/10.1515/amcs-2017-0050

Keywords
machine learning, classification, imbalanced data, preprocessing, oversampling

© by Michał Koziarski

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

CCR: A combined cleaning and resampling algorithm for imbalanced data classification

Published Online: Jan 13, 2018

Page range: 727 - 736

Received: Jan 12, 2016

Accepted: Aug 25, 2017

DOI: https://doi.org/10.1515/amcs-2017-0050

Keywordsmachine learning, classification, imbalanced data, preprocessing, oversampling

© by Michał Koziarski

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Keywords
machine learning, classification, imbalanced data, preprocessing, oversampling