Gene selection ensembles and classifier ensembles for medical diagnosis

Małgorzata Ćwiklińska-Jurkowska

Open Access

Gene selection ensembles and classifier ensembles for medical diagnosis

Małgorzata Ćwiklińska-Jurkowska

| Dec 16, 2019

Biometrical Letters

Volume 56 (2019): Issue 2 (December 2019)

About this article

Cite

Page range: 117 - 138

DOI: https://doi.org/10.2478/bile-2019-0007

Keywords
combined methods, discriminant analysis, gene selection

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

The usefulness of combining methods is examined using the example of microarray cancer data sets, where expression levels of huge numbers of genes are reported. Problems of discrimination into two groups are examined on three data sets relating to the expression of huge numbers of genes. For the three examined microarray data sets, the cross-validation errors evaluated on the remaining half of the whole data set, not used earlier for the selection of genes, were used as measures of classifier performance. Common single procedures for the selection of genes—Prediction Analysis of Microarrays (PAM) and Significance Analysis of Microarrays (SAM)—were compared with the fusion of eight selection procedures, or of a smaller subset of five of them, excluding SAM or PAM. Merging five or eight selection methods gave similar results. Based on the misclassification rates for the three examined microarray data sets, for any examined ensemble of classifiers, the combining of gene selection methods was not superior to single PAM or SAM selection for two of the examined data sets. Additionally, the procedure of heterogeneous combining of five base classifiers—k-nearest neighbors, SVM linear and SVM radial with parameter c=1, shrunken centroids regularized classifier (SCRDA) and nearest mean classifier—proved to significantly outperform resampling classifiers such as bagging decision trees. Heterogeneously combined classifiers also outperformed double bagging for some ranges of gene numbers and data sets, but merging is generally not superior to random forests. The preliminary step of combining gene rankings was generally not essential for the performance for either heterogeneously or homogeneously combined classifiers.

eISSN:: 1896-3811
Language:: English

Publication timeframe:: 2 times per year
Journal Subjects:: Life Sciences, Bioinformatics, other, Mathematics, Probability and Statistics, Applied Mathematics

Journal RSS Feed

Gene selection ensembles and classifier ensembles for medical diagnosis

Published Online: Dec 16, 2019

Page range: 117 - 138

DOI: https://doi.org/10.2478/bile-2019-0007

Keywords
combined methods, discriminant analysis, gene selection

© 2019 Małgorzata Ćwiklińska-Jurkowska, published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

Gene selection ensembles and classifier ensembles for medical diagnosis

Published Online: Dec 16, 2019

Page range: 117 - 138

DOI: https://doi.org/10.2478/bile-2019-0007

Keywordscombined methods, discriminant analysis, gene selection

© 2019 Małgorzata Ćwiklińska-Jurkowska, published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

Keywords
combined methods, discriminant analysis, gene selection