Evaluation of Resampling Methods in the Class Unbalance Problem

Mariusz Kubus

Open Access

Evaluation of Resampling Methods in the Class Unbalance Problem

Mariusz Kubus

| May 29, 2020

Econometrics

Volume 24 (2020): Issue 1 (March 2020)

About this article

Cite

Page range: 39 - 50

DOI: https://doi.org/10.15611/eada.2020.1.04

Keywords
class unbalance, resampling, regularized logistic regression, random forests

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

The purpose of many real world applications is the prediction of rare events, and the training sets are then highly unbalanced. In this case, the classifiers are biased towards the correct prediction of the majority class and they misclassify a minority class, whereas rare events are of the greater interest. To handle this problem, numerous techniques were proposed that balance the data or modify the learning algorithms. The goal of this paper is a comparison of simple random balancing methods with more sophisticated resampling methods that appeared in the literature and are available in R program. Additionally, the authors ask whether learning on the original dataset and using a shifted threshold for classification is not more competitive. The authors provide a survey from the perspective of regularized logistic regression and random forests. The results show that combining random under-sampling with random forests has an advantage over other techniques while logistic regression can be competitive in the case of highly unbalanced data.

eISSN:: 2449-9994
Language:: English

Publication timeframe:: 4 times per year
Journal Subjects:: Business and Economics, Political Economics, other, Business Management, Mathematics and Statistics for Economists, Mathematics, Social Sciences, Sociology

Journal RSS Feed

Evaluation of Resampling Methods in the Class Unbalance Problem

Published Online: May 29, 2020

Page range: 39 - 50

DOI: https://doi.org/10.15611/eada.2020.1.04

Keywords
class unbalance, resampling, regularized logistic regression, random forests

© 2020 Mariusz Kubus, published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

Evaluation of Resampling Methods in the Class Unbalance Problem

Published Online: May 29, 2020

Page range: 39 - 50

DOI: https://doi.org/10.15611/eada.2020.1.04

Keywordsclass unbalance, resampling, regularized logistic regression, random forests

© 2020 Mariusz Kubus, published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

Keywords
class unbalance, resampling, regularized logistic regression, random forests