The simplest classification task is to divide a set of objects into two classes, but most of the problems we find in real life applications are multi-class. There are many methods of decomposing such a task into a set of smaller classification problems involving two classes only. Among the methods, pairwise coupling proposed by Hastie and Tibshirani (1998) is one of the best known. Its principle is to separate each pair of classes ignoring the remaining ones. Then all objects are tested against these classifiers and a voting scheme is applied using pairwise class probability estimates in a joint probability estimate for all classes. A closer look at the pairwise strategy shows the problem which impacts the final result. Each binary classifier votes for each object even if it does not belong to one of the two classes which it is trained on. This problem is addressed in our strategy. We propose to use additional classifiers to select the objects which will be considered by the pairwise classifiers. A similar solution was proposed by Moreira and Mayoraz (1998), but they use classifiers which are biased according to imbalance in the number of samples representing classes.
Chawla, N., Bowyer, K., Hall, L. and Kegelmeyer, W.P. (2002). SMOTE: Synthetic minority over-sampling technique, Journal of Artificial Intelligence Research16: 321–357.
Chmielnicki, W., Roterman-Konieczna, I. and Stąpor, K. (2012). An improved protein fold recognition with support vector machines, Expert Systems20(2): 200–211.
Chmielnicki, W. and Stąpor, K. (2010). Protein fold recognition with combined SVM-RDA classifier, in M.G. Romay and E. Corchado (Eds.), Hybrid Artificial Intelligence Systems, Lecture Notes in Artificial Intelligence, Vol. 6076, Springer, Berlin, pp. 162–169.
Chmielnicki, W. and Stąpor, K. (2012). A hybrid discriminative/generative approach to protein fold recognition, Neurocomputing75(1): 194–198.
Demsar, J. (2006). Statistical comparisons of classifiers over multiple data sets, Journal of Machine Learning Research7: 1–30.
Dietterich, T. (1998). Approximate statistical tests for comparing supervised classification learning algorithms, Neural Computation10: 1895–1924.
Dietterich, T.G. and Bakiri, G. (1995). Solving multiclass problems via error-correcting output codes, Journal of Artificial Intelligence Research2: 263–286.
Ding, C. and Dubchak, I. (2001). Multi-class protein fold recognition using support vector machines and neural networks, Bioinformatics17(4): 349–358.
Fei, B. and Liu, J. (2006). Binary tree of SVM: A new fast multiclass training and classification algorithm, IEEE Transactions on Neural Networks17(3): 696–704.
Friedman, J. (1996). Another approach to polychotomous classification, Technical report, Stanford University, Stanford, CA.
Galar, M., Fernandez, A., Barrenechea, E., Bustince, H. and Herrera, F. (2011). An overview of ensemble methods for binary classifiers in multi-class problems: Experimental study on one-vs-one and one-vs-all schemes, Pattern Recognition44(8): 1761–1776.
Galar, M., Fernandez, A., Barrenechea, E., Bustince, H. and Herrera, F. (2013). Dynamic classifier selection for one-vs-one strategy: Avoiding non-competent classifiers, Pattern Recognition46(12): 3412–3424.
Glomb, P., Romaszewski, M., Opozda, S. and Sochan, A. (2011). Choosing and modeling hand gesture database for natural user interface, Proceedings of the 9th International Conference on Gesture and Sign Language in Human-Computer Interaction and Embodied Communication, Athens, Greece, pp. 24–35.
Hastie, T. and Tibshirani, R. (1998). Classification by pairwise coupling, The Annals of Statistics26(1): 451–471.
He, H. and Garcia, E. (2009). Learning from imbalanced data, IEEE Transactions on Knowledge and Data Engineering21(9): 1263–1284.
Hollander, M. and Wolfe, D. (1973). Nonparametric Statistical Methods, John Wiley and Sons, New York, NY.
Iman, R. and Davenport, J. (1980). Approximations of the critical region of the Friedman statistics, Communications in Statistics—Theory and Methods9(6): 571–595.
Kahsay, L., Schwenker, F. and Palm, G. (2005). Comparison of multiclass SVM decomposition schemes for visual object recognition, in W. Kropatsch et al. (Eds.), Pattern Recognition, Lecture Notes in Computer Science, Vol. 3663, Springer, Berlin, pp. 334–341.
Kijsirikul, B. and Ussivakul, N. (2002). Multiclass support vector machines using adaptive directed acyclic graph, Proceedings of the International Joint Conference on Neural Networks, Honolulu, HI, USA, pp. 980–985.
Krawczyk, B., Wozniak, M. and Cyganek, B. (2014). Clusterting-based ensembles for one-class classification, Information Sciences264: 182–195.
Krzysko, M. and Wolynski, W. (2009). New variants of pairwise classification, European Journal of Operational Research199(2): 512–519.
Liu, C. and Fujisava, H. (2005). Classification and learning for character recognition: Comparison of methods and remaining problems, International Workshop on Neural Networks and Learning in Document Analysis and Recognition, Seoul, Korea, pp. 1–7.
Liu, X., Wu, J. and Zhou, Z.H. (2008). Exploratory undersampling for class-imbalance learning, IEEE Transactions on Systems, Man and Cybernetics B39(2): 539–550.
Lorena, A. and Carvalho, A. (2010). Building binary-tree-based multiclass classifiers using separability measures, Neurocomputing73(16–18): 2837–2845.
Lorena, A., Carvalho, A. and Gama, J. (2008). A review on the combination of binary classifiers in multiclass problems, Artificial Intelligence Review30(1–4): 19–37.
Moreira, M. and Mayoraz, E. (1998). Improved pairwise coupling classification with correcting classifiers, Proceedings of the 10th European Conference on Machine Learning, ECML 1998, Chemnitz, Germany, pp. 160–171.
Nadeau, C. and Bengio, Y. (2003). Inference for the generalization error, Advances in Neural Information Processing Systems52(3): 239–281.
Ou, G. and Murphey, Y. (2006). Multi-class pattern classification using neural networks, Pattern Recognition40(1): 4–18.
Platt, J., Cristianini, N. and Shawe-Taylor, J. (2000). Large margin DAGs for multiclass classification, Neural Information Processing Systems, NIPS’99, Breckenridge, CO, USA, pp. 547–553.
Saez, J.A., Galar, M., Luengo, J. and Herrera, F. (2012). A first study on decomposition strategies with data with class noise using decision trees, Proceedings of the 7th International Conference on Hybrid Artificial Intelligent Systems, Salamanca, Spain, Part II, pp. 25–35.