Search Results

1 - 10 of 31 items :

  • "ensemble methods" x
Clear All
Ensembles of instance selection methods: A comparative study

Schapire, R.E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting, Journal of Computer and System Sciences 55 (1): 119–139. Galar, M., Fernández, A., Barrenechea, E., Bustince, H. and Herrera, F. (2011). An overview of ensemble methods for binary classifiers in multi-class problems: Experimental study on one-vs-one and one-vs-all schemes, Pattern Recognition 44 (8): 1761–1776. García-Osorio, C., de Haro-García, A. and García-Pedraja, N. (2010). Democratic instance selection: A linear complexity instance

Open access
Multi-label classification using error correcting output codes

., Tsoumakas, G., Kalliris, G. and Vlahavas, I. (2008). Multilabel classification of music into emotions, 9th International Conference on Music Information Retrieval (ISMIR 2008), Philadelphia, PA, USA , pp. 325-330. Tsoumakas, G., Katakis, I. and Vlahavas, I. (2011). Random k-labelsets for multilabel classification, IEEE Transactions on Knowledge and Data Engineering 23 (7): 1079-1089. Tsoumakas, G. and Vlahavas, I. (2007). Random k-labelsets: An Ensemble Method for Multilabel Classification , Lecture Notes in Artificial Intelligence, Vol

Open access
Applying a Neural Network Ensemble to Intrusion Detection

Abstract

An intrusion detection system (IDS) is an important feature to employ in order to protect a system against network attacks. An IDS monitors the activity within a network of connected computers as to analyze the activity of intrusive patterns. In the event of an ‘attack’, the system has to respond appropriately. Different machine learning techniques have been applied in the past. These techniques fall either into the clustering or the classification category. In this paper, the classification method is used whereby a neural network ensemble method is employed to classify the different types of attacks. The neural network ensemble method consists of an autoencoder, a deep belief neural network, a deep neural network, and an extreme learning machine. The data used for the investigation is the NSL-KDD data set. In particular, the detection rate and false alarm rate among other measures (confusion matrix, classification accuracy, and AUC) of the implemented neural network ensemble are evaluated.

Open access
Ensemble Neural Network Approach for Accurate Load Forecasting in a Power System

Ensemble Neural Network Approach for Accurate Load Forecasting in a Power System

The paper presents an improved method for 1-24 hours load forecasting in the power system, integrating and combining different neural forecasting results by an ensemble system. We will integrate the results of partial predictions made by three solutions, out of which one relies on a multilayer perceptron and two others on self-organizing networks of the competitive type. As the expert system we will apply different integration methods: simple averaging, SVD based weighted averaging, principal component analysis and blind source separation. The results of numerical experiments, concerning forecasting the hourly load for the next 24 hours of the Polish power system, will be presented and discussed. We will compare the performance of different ensemble methods on the basis of the mean absolute percentage error, mean squared error and maximum percentage error. They show a significant improvement of the proposed ensemble method in comparison to the individual results of prediction. The comparison of our work with the results of other papers for the same data proves the superiority of our approach.

Open access
Interpretable decision-tree induction in a big data parallel framework

Abstract

When running data-mining algorithms on big data platforms, a parallel, distributed framework, such asMAPREDUCE, may be used. However, in a parallel framework, each individual model fits the data allocated to its own computing node without necessarily fitting the entire dataset. In order to induce a single consistent model, ensemble algorithms such as majority voting, aggregate the local models, rather than analyzing the entire dataset directly. Our goal is to develop an efficient algorithm for choosing one representative model from multiple, locally induced decision-tree models. The proposed SySM (syntactic similarity method) algorithm computes the similarity between the models produced by parallel nodes and chooses the model which is most similar to others as the best representative of the entire dataset. In 18.75% of 48 experiments on four big datasets, SySM accuracy is significantly higher than that of the ensemble; in about 43.75% of the experiments, SySM accuracy is significantly lower; in one case, the results are identical; and in the remaining 35.41% of cases the difference is not statistically significant. Compared with ensemble methods, the representative tree models selected by the proposed methodology are more compact and interpretable, their induction consumes less memory, and, as confirmed by the empirical results, they allow faster classification of new records.

Open access
Modelling Match Outcome in Australian Football: Improved accuracy with large databases

Abstract

Mathematical models that explain match outcome, based on the value of technical performance indicators (PIs), can be used to identify the most important aspects of technical performance in team field-sports. The purpose of this study was to evaluate several methodological opportunities, to enhance the accuracy of this type of modelling. Specifically, we evaluated the potential benefits of 1) modelling match outcome using an increased number of seasons and PIs compared with previous reports, 2) how to identify eras where technical performance characteristics were stable and 3) the application of a novel feature selection method. Ninety-one PIs across sixteen Australian Football (AF) League seasons were analysed. Change-point and Segmented Regression analyses were used to identify eras and they produced similar but non-identical outcomes. A feature selection ensemble method identified the most valuable 45 PIs for modelling. The use of a larger number of seasons for model development lead to improvement in the classification accuracy of the models, compared with previous studies (88.8 vs 78.9%). This study demonstrates the potential benefits of large databases when creating models of match outcome and the pitfalls of determining whether there are eras in a longitudinal database.

Open access
The Curious Case of the PDF Converter that Likes Mozart: Dissecting and Mitigating the Privacy Risk of Personal Cloud Apps

Abstract

Third party apps that work on top of personal cloud services, such as Google Drive and Drop-box, require access to the user’s data in order to provide some functionality. Through detailed analysis of a hundred popular Google Drive apps from Google’s Chrome store, we discover that the existing permission model is quite often misused: around two-thirds of analyzed apps are over-privileged, i.e., they access more data than is needed for them to function. In this work, we analyze three different permission models that aim to discourage users from installing over-privileged apps. In experiments with 210 real users, we discover that the most successful permission model is our novel ensemble method that we call Far-reaching Insights. Far-reaching Insights inform the users about the data-driven insights that apps can make about them (e.g., their topics of interest, collaboration and activity patterns etc.) Thus, they seek to bridge the gap between what third parties can actually know about users and users’ perception of their privacy leakage. The efficacy of Far-reaching Insights in bridging this gap is demonstrated by our results, as Far-reaching Insights prove to be, on average, twice as effective as the current model in discouraging users from installing over-privileged apps. In an effort to promote general privacy awareness, we deployed PrivySeal, a publicly available privacy-focused app store that uses Far-reaching Insights. Based on the knowledge extracted from data of the store’s users (over 115 gigabytes of Google Drive data from 1440 users with 662 installed apps), we also delineate the ecosystem for 3rd party cloud apps from the standpoint of developers and cloud providers. Finally, we present several general recommendations that can guide other future works in the area of privacy for the cloud. To the best of our knowledge, ours is the first work that tackles the privacy risk posed by 3rd party apps on cloud platforms in such depth.

Open access
Machine learning model development for predicting road transport GHG emissions in Canada

References Breiman, L. (1996). Bagging predictors. Machine learning, 24(2), 123-140 Carletta, J. (1996). Assessing agreement on classification tasks: the kappa statistic. Computational linguistics, 22(2), 249-254. Dawson, C. W., & Wilby, R. (1998). An artificial neural network approach to rainfall-runoff modelling. Hydrological Sciences Journal, 43(1), 47-66. Dietterich, T. G. (2000, June). Ensemble methods in machine learning. In International workshop on multiple classifier systems (pp. 1-15). Springer Berlin Heidelberg

Open access
Visualization and Comparison of Single and Combined Parametric and Nonparametric Discriminant Methods for Leukemia Type Recognition Based on Gene Expression

. Rokach, L. (2009). Taxonomy for characterizing ensemble methods in classification tasks: A review and annotated bibliography. Computational Statistics and Data Analysis , 53(12), 4046–4072. Rokach, L. (2010a). Pattern Classification Using Ensemble Methods. In H. Bunke, & P. S. P. Wang (Eds.), Series in Machine Perception and Artificial Intelligence (Vol. 75). World Scientific Publishing. Rokach, L. (2010b). Ensemble-based classifiers. Artificial Intelligence Review , 33(1–2), 1–39. Rokach, L., & Maimon, O. (2005). Top-down induction of decision

Open access
Classifier Ensembles Using Structural Features For Spammer Detection In Online Social Networks

References [1] Bhat S. Y., Abulaish M., Community-based features for identifying spammers in online social networks, in: Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) , ACM, 2013, 100-107. [2] Bhat S. Y., Abulaish M., Analysis and mining of online social networks: emerging trends and challenges, WIREs: Data Mining and Knowledge Discovery , 3, 6, 2013, 408-444. [3] Bhat S. Y., Abulaish M., Mirza A. A., Spammer classification using ensemble methods over structural social network

Open access