Mining Online Store Client Assessment Classification Rules with Genetic Algorithms
The paper presents the results of the research into algorithms that are not meant to mine classification rules, yet they contain all the necessary functions which allow us to use them for mining classification rules such as Genetic algorithm (GA). The main task of the research is associated with the application of GA to classification rule mining. A classic GA was modified to match the chosen classification task and was compared with other popular classification algorithms - JRip, J48 and Naive Bayes classifier. The paper describes the algorithm proposed and the application task as well as provides a comparative analysis of the obtained results with other algorithms.
A Study on the Behaviour of the Algorithm for Finding Relevant Attributes and Membership Functions
One of the most recent approaches in machine learning is fuzzy rules usage for solving classification problems. This paper describes the algorithm for finding relevant attributes and searching for membership functions. Experimental results are used to clarify - which data sets can be used to automatically gain primary membership functions from primary data. This quality - gaining of membership functions - is one of the pros of the algorithm, because it eases resolution of classification task. The ability to use it with fuzzy data is one more merit. As a result, there are obtained reliable fuzzy classification rules to separate classes. By reconstructing primary membership functions also the number of IF-THEN rules gained from decision tables is reduced up to three times. Four experiments are conducted with different training and testing data set sizes. Conclusions are made about the optimal size of the training and testing data set that is necessary for achieving better results as well as about the data this algorithm is appropriate for. Finally, possible directions for further research are outlined.
Małgorzata Zdrodowska, Agnieszka Dardzińska, Monika Chorąży and Alina Kułakowska
Rules Discovery without Pre-existing ClassificationRules, In: Chan C.C., Grzymala-Busse J.W., Ziarko W.P. (editors), Rough Sets and Current Trends in Computting, Lecture Notes in Computer Science , Springer, Berlin, Heidelberg, 5306.
16. Raś Z.W., Dardzinska A., Tsay L.-S., Wasyluk H. (2008), Association Action Rules, IEEE/ICDM Workshop on Mining Complex Data (MCD 2008), 283–290.
17. Raś Z.W., Wieczorkowska A. (2000), Action-Rules: How to Increase Profit of a Company, In: Zighed D.A., Komorowski J., Żytkow J. (editors), Principles of Data Mining
Building an ontology is a difficult and time-consuming task. In order to make this task easier and faster, some automatic methods can be employed. This paper examines the feasibility of using rules and concepts discovered during the classification tree building process in the C4.5 algorithm, in a completely automated way, for the purposes of building an ontology from data. By building the ontology directly from continuous data, concepts and relations can be discovered without specific knowledge about the domain. This paper also examines how this method reproduces the classification capabilities of the classification three within an ontology using concepts and class expression axioms.
Technological advancement across human activities has brought about accelerated generation of huge amounts of data. Consequently, researchers are faced with the problem how to determine adequate ways of turning the available data mass into useful knowledge. Data analysis adapted to these changes when data mining was developed as an approach to data analysis from different perspectives which reveals significant hidden regularities. This paper presents conceptual characteristics of decision tree, an important data mining method which is, due to its explorative nature, exceptionally suitable for detection of data structure when analysing various problem situations. The empirical section of the paper demonstrates applicative characteristics of this method using CHAID algorithm in leadership studies: an interdependence of selected personal characteristics and the manager’s leadership style has been investigated. The aim of the paper is to develop a classification model for identification of the dominant leadership style. The study was conducted on the sample of 417 managers of privately owned small-sized enterprises in Serbia, using a specially designed questionnaire. The classification model identified the set of six statistically significant personal characteristics as predictors of dominant leadership style.
On the Discriminant Analysis in the 2-Populations Case
The empirical Bayes Gaussian rule, which in the normal case yields good values of the probability of total error, may yield high values of the maximum probability error. From this point of view the presented modified version of the classification rule of Broffitt, Randles and Hogg appears to be superior. The modification included in this paper is termed as a WR method, and the choice of its weights is discussed. The mentioned methods are also compared with the K nearest neighbours classification rule.
Yun-Hyuk Choi, Hye-Yeon Choi, Chi-Seung Lee, Myung-Hyun Kim and Jae-Myung Lee
In this paper, a method to estimate ice loads as a function of the buttock angle of an icebreaker is presented with respect to polycrystalline freshwater ice. Ice model tests for different buttock angles and impact velocities are carried out to investigate ice pressure loads and tendencies of ice pressure loads in terms of failure modes. Experimental devices were fabricated with an idealized icebreaker bow shape, and medium-scale ice specimens were used. A dry-drop machine with a freefall system was used, and four pressure sensors were installed at the bottom to estimate ice pressure loads. An estimation equation was suggested on the basis of the test results. We analyzed the estimation equation for design ice loads of the International Association of Classification Societies (IACS) classification rules. We suggest an estimation equation considering the relation between ice load, buttock angle, and velocity by modifying the equations given in the IACS classification rules.
The aim of the study was to test the ability to model soil capability units diversity of on the basis of limited information about particle size and morphology of the terrain data. The data obtained from digitization of maps of agricultural soil and topography of the region of the Upper Silesian Industrial District. Rule extraction tools and build models were algorithms in the field of computational intelligence: different versions of decision trees, neural networks and deep learning algorithms. The best algorithms allow for correct classification to 90% of the elements of the validation set. The design ensemble of specialized classifier algorithm increased the efficiency of decision-making algorithm to identify a set of validation to about 94%. Proper selection decision algorithm allows the estimation of the likelihood vector belonging to a complex object. Computational intelligence algorithms can be considered as a tool for extracting classification rules from the collection of data on soils on the local or regional level.
Irina Provorova, Serge Parshutin and Sergejs Provorovs
Using Genetic Algorithm to Optimize Weights in Data Mining Task
This paper considers an application of genetic algorithm (GA) to optimize weights in data mining task. Data mining tasks usually have datasets containing a large number of records and features that will be processed using, for example, created classification rules. As a result, by using classical method to classify a large number of records and features, a high classification error value will be obtained. To solve this problem, the genetic algorithm was applied to find for each feature the weight that would reduce classification error value.
As a classical method, the k-nearest neighbour (KNN) classifier was chosen and the modified genetic algorithm was applied to optimize the weight. Based on the joint application of genetic and k-nearest neighbour algorithms, the GA/KNN hybrid algorithm was developed. As a result, the developed hybrid algorithm provides a stable classification error reducing regardless of the number of records and features, and also of the chosen number of neighbours. In the GA block the modified crossover and mutation works in each generation with identical intensity and cannot provide debasing of the individual.
Classification in the Gabor time-frequency domain of non-stationary signals embedded in heavy noise with unknown statistical distribution
A new supervised classification algorithm of a heavily distorted pattern (shape) obtained from noisy observations of nonstationary signals is proposed in the paper. Based on the Gabor transform of 1-D non-stationary signals, 2-D shapes of signals are formulated and the classification formula is developed using the pattern matching idea, which is the simplest case of a pattern recognition task. In the pattern matching problem, where a set of known patterns creates predefined classes, classification relies on assigning the examined pattern to one of the classes. Classical formulation of a Bayes decision rule requires a priori knowledge about statistical features characterising each class, which are rarely known in practice. In the proposed algorithm, the necessity of the statistical approach is avoided, especially since the probability distribution of noise is unknown. In the algorithm, the concept of discriminant functions, represented by Frobenius inner products, is used. The classification rule relies on the choice of the class corresponding to the max discriminant function. Computer simulation results are given to demonstrate the effectiveness of the new classification algorithm. It is shown that the proposed approach is able to correctly classify signals which are embedded in noise with a very low SNR ratio. One of the goals here is to develop a pattern recognition algorithm as the best possible way to automatically make decisions. All simulations have been performed in Matlab. The proposed algorithm can be applied to non-stationary frequency modulated signal classification and non-stationary signal recognition.