Processing Short Time Series with Data Mining Methods
This article examines several data mining approaches that perform short time series analysis. The basis of the methods is formed by clustering algorithms with or without modifications. The proposed methods implement short time series analysis when the numbers of the observations are not equal and the historical information is short. The inspected approaches are offered for solving complex tasks where statistical analysis methods cannot be applied or their functioning does not provide the necessary efficiency. The proposed methods are based on grid-based clustering and k-means algorithm modifications.
Analysis of Short Time Series in Gene Expression Tasks
The article analyzes various clustering approaches that are used in gene expression tasks. The chosen approaches are portrayed and examined from the viewpoint of use of data mining clustering algorithms. The article provides a short description of working principles and characteristics of the examined methods and algorithms and the data sets used in the experiments. The article presents results of the experiments that are directly connected to the use of clustering algorithms in processing of short time series in bioinformatics tasks, solving gene expression problems as well as provides conclusions and evaluations of each used approach. An analysis of future possibilities to build a new method that is based on data mining approaches and principles but solves bioinformatics tasks that are associated with processing of short time series and the achieved results are interpreted in a way that is easy to perceive for bioinformatics experts, is presented.
Time-Series Data Mining for E-Service Application Analysis
This paper provides application analysis of e-services available on the joint state and municipal e-service portal www.latvija.lv. The research is performed using a combination of time series analysis and data mining techniques. Time series analysis has enabled the determination of the count of clusters that represent services classification by application frequency. Meta-information is processed using data pre-processing methods and the values obtained are then discretised. The methods combinations examined in the paper are tested experimentally on the limited data amount available. The data describe the existing e-service requests by months. The clusters obtained are then added to the initial meta-information available when planning and developing services. E-service membership in the formed data set is determined using inductive classification trees. These algorithms represent knowledge in the form of classification trees through analysing feature values and cyclically split training instances into classes. As a result, based on the analysis conducted, recommendations for e-service developers and implementers are elaborated and basic parameters for successful introduction and application of e-services are determined.
Arnis Kirshners, Galina Kuleshova and Arkady Borisov
Demand Forecasting Based on the Set of Short Time Series
This paper addresses the task of short historical time series and discrete descriptive parameters processing aimed at making demand forecast only on the basis of new product describing parameters. Several data mining methods are used for data processing including data extraction, pre-processing, cluster analysis and classification. Data preparation for data mining processes is made with user-defined parameters entered in the forecasting system. In the selected short historical time series the membership of an object in any class, which is a basis for creating prototypes, is determined using clustering. The k-means clustering algorithm is employed for finding the optimal number of clusters in the sample. The number of clusters is determined on the basis of the mean absolute error. As a result of classification, using inductive decision trees, a correlation between the prototype produced in the course of clustering and product describing parameters is determined. For new product demand clustering, a decision tree obtained as a result of classification is used. New product describing parameters are then projected on the tree, and a tree leave indicating the number of the prototype produced by means of clustering is found. The prototype curve structure depicts possible demand for a new product for the next period.
This article analyzes the traditional time series processing methods that are used to perform the task of short time series analysis in demand forecasting. The main aim of this paper is to scrutinize the ability of these methods to be used when analyzing short time series. The analyzed methods include exponential smoothing, exponential smoothing with the development trend and moving average method. The paper gives the description of the structure and main operating principles. The experimental studies are conducted using real demand data. The obtained results are analyzed; and the recommendations are given about the use of these methods for short time series analysis.
Arnis Kirshners, Inese Polaka and Ludmila Aleksejeva
Data mining methods are applied to a medical task that seeks for the information about the influence of Helicobacter Pylori on the gastric cancer risk increase by analysing the adverse factors of individual lifestyle. In the process of data preprocessing, the data are cleared of noise and other factors, reduced in dimensionality, as well as transformed for the task and cleared of non-informative attributes. Data classification using C4.5, CN2 and k-nearest neighbour algorithms is carried out to find relationships between the analysed attributes and the descriptive class attribute – Helicobacter Pylori presence that could have influence on the cancer development risk. Experimental analysis is carried out using the data of the Latvian-based project “Interdisciplinary Research Group for Early Cancer Detection and Cancer Prevention” database.