This article describes the fuzzy classification system developed by the authors and that is particularly applicable to bioinformatics data classification. The description focuses on the following steps in the system: 1) Data preprocessing; 2) Classifier training and construction of the rule base; 3) Classification of new records and 4) Evaluation of the results; it also explains the details of processes in each step as well as the processes of missing data replacement, reduction of the number of alternatives and functions, construction of membership functions and stretching of the induced rules. The article concludes with a justification of the methods and algorithms chosen for each process of the system.
The paper describes an algorithm for approximation of trained radial basis function neural network (RBFNN) classification boundary with the help of elliptic rules. These rules can later be translated into IF-THEN form if required. We provide experimental results of the algorithm for a two-dimensional case. Currently, neural networks are not widely used and spread due to difficulties with the interpretation of classification decision being made. The formalized representation of decision process is required in many mission critical areas, such as medicine, nuclear energy, finance and others.
Madara Gasparoviсa, Ludmila Aleksejeva and Valdis Gersons
This article studies the possibilities of BEXA family classification algorithms - BEXA, FuzzyBexa and FuzzyBexa II in data, especially bioinformatics data, classification. Three different types of data sets have been used in the study - data sets often used in the literature, UCI data repository real life data sets and real bioinformatics data sets that have the specific character - a large number of attributes and a small number of records. For the comparison of classification results experiments have been carried out using all data sets and other classification algorithms. As a result, conclusions have been drawn and recommendations given about the use of each algorithm of BEXA family for classification of various real data, as well as an answer has been given to the question, whether the use of these algorithms is recommended for bioinformatics data.
Artificial neural networks (ANNs) are well known for their classification abilities. Although choosing hyperparameters such as neuron layer count and size can be a quite tedious task. Pruning approaches assume that a sufficiently large ANN has already been trained and can be simplified with acceptable classification accuracy loss. The current paper presents a node pruning algorithm and gives experimental results for pruned network accuracy rates versus their non-pruned counterparts.
Pavels Osipovs, Andrejs Rinkevics, Galina Kuleshova and Arkady Borisov
This paper examines the possibility of using Markov chains when constructing a profile of author’s writing style. Thus, the constructed profile can be then used to analyze other texts and calculate their level of similarity. The extraction of the unique profile of text writing style that is characteristic of a specific human can be a topical task in many spheres of human activity. As an example, the task of detecting authorship for scientific and fiction texts can be mentioned. The paper describes a basic theoretical apparatus used for profile construction, software implementation of the experimental system as well as the experiments made and provides experimental results and their analysis.
This article focuses on cluster stability evaluation to assess the characteristics of the dataset and the subclasses found in class decomposition. The evaluation is an iterative process, making small changes to the dataset in every step and reapplying the cluster analysis. These small changes (removing one object from the dataset is repeated for 20 iterations in this case) should not have any impact on clusters if they are stable (meaning that other objects that were not removed stay in the same clusters as in the full clustering).
This paper deals with certain data mining techniques in order to discover their potential for use in automated ontology building. The end goal is the reduction in the time requirement for the construction of any given ontology and necessity for expert consultation. This can be achieved by combining data mining and ontology engineering. The aim of this paper is to take a deeper look at potentially useful data mining techniques for an automated ontology building process, to research related publications in this field and to propose ideas on how to use data mining techniques in ontology building.
Use of Linear Genetic Programming and Artificial Neural Network Methods to Solve Classification Task
This paper presents a comparative analysis of linear genetic programming and artificial neural network methods to solve classification tasks. Usually classification tasks have data sets containing a large number of attributes and records, and more than two classes that will be processed using, for example, created classification rules. As a result, by using classical method to classify a large number of records, a high classification error value will be obtained. The artificial neural networks are often used to solve classification task, mostly obtaining good results. The linear genetic programming is a new direction of evolution algorithms that is not widely researched and its application areas are not well defined. However, some advantages of linear genetic programming are based on genetic operators whose structure does not require complicated calculations.
During this work approximately 400 experiments were conducted with linear genetic programming and artificial neural network methods, using various data sets with different quantity of records, attributes and classes.
Based on the results received, conclusions on possibilities of using the methods of linear genetic programming and artificial neural networks in classification problems were drawn, and suggestions for improving their performance were proposed.
Mining Online Store Client Assessment Classification Rules with Genetic Algorithms
The paper presents the results of the research into algorithms that are not meant to mine classification rules, yet they contain all the necessary functions which allow us to use them for mining classification rules such as Genetic algorithm (GA). The main task of the research is associated with the application of GA to classification rule mining. A classic GA was modified to match the chosen classification task and was compared with other popular classification algorithms - JRip, J48 and Naive Bayes classifier. The paper describes the algorithm proposed and the application task as well as provides a comparative analysis of the obtained results with other algorithms.
Impact of Antibody Panel Size on Classification Accuracy
This paper experimentally studies the influence of antibody panel size reduction on classification results. The presented study includes four classification methods and five feature evaluators that are applied to five different biomedical data sets with large dimensionality (1200 features). The behaviour of the classifiers in these data sets is examined to reveal overall trends of dimensionality reduction impact on classification accuracy.