Madara Gasparovica-Asite, Inese Polaka and Ludmila Alekseyeva
The present research examines a wide range of attribute selection methods – 86 methods that include both ranking and subset evaluation approaches. The efficacy evaluation of these methods is carried out using bioinformatics data sets provided by the Latvian Biomedical Research and Study Centre. The data sets are intended for diagnostic task purposes and incorporate values of more than 1000 proteomics features as well as diagnosis (specific cancer or healthy) determined by a golden standard method (biopsy and histological analysis). The diagnostic task is solved using classification algorithms FURIA, RIPPER, C4.5, CART, KNN, SVM, FB+ and GARF in the initial and various sets with reduced dimensionality. The research paper finalises with conclusions about the most effective methods of attribute subset selection for classification task in diagnostic proteomics data.