Jaroslaw Hryszko and Lech Madeyski
Software defect prediction is a promising approach aiming to increase software quality and, as a result, development pace. Unfortunately, the cost effectiveness of software defect prediction in industrial settings is not eagerly shared by the pioneering companies. In particular, this is the first attempt to investigate the cost effectiveness of using the DePress open source software measurement framework (jointly developed by Wroclaw University of Science and Technology, and Capgemini software development company) for defect prediction in commercial software projects. We explore whether defect prediction can positively impact an industrial software development project by generating profits. To meet this goal, we conducted a defect prediction and simulated potential quality assurance costs based on the best possible prediction results when using a default, non-tweaked DePress configuration, as well as the proposed Quality Assurance (QA) strategy. Results of our investigation are optimistic: we estimated that quality assurance costs can be reduced by almost 30% when the proposed approach will be used, while estimated DePress usage Return on Investment (ROI) is fully 73 (7300%), and Benefits Cost Ratio (BCR) is 74. Such promising results, being the outcome of the presented research, have caused the acceptance of continued usage of the DePress-based software defect prediction for actual industrial projects run by Volvo Group.
Robert Susmaga and Izabela Szczęch
The paper considers particular interestingness measures, called confirmation measures (also known as Bayesian confirmation measures), used for the evaluation of “if evidence, then hypothesis” rules. The agreement of such measures with a statistically sound (significant) dependency between the evidence and the hypothesis in data is thoroughly investigated. The popular confirmation measures were not defined to possess such form of agreement. However, in error-prone environments, potential lack of agreement may lead to undesired effects, e.g. when a measure indicates either strong confirmation or strong disconfirmation, while in fact there is only weak dependency between the evidence and the hypothesis. In order to detect and prevent such situations, the paper employs a coefficient allowing to assess the level of dependency between the evidence and the hypothesis in data, and introduces a method of quantifying the level of agreement (referred to as a concordance) between this coefficient and the measure being analysed. The concordance is characterized and visualised using specialized histograms, scatter-plots, etc. Moreover, risk-related interpretations of the concordance are introduced. Using a set of 12 confirmation measures, the paper presents experiments designed to establish the actual concordance as well as other useful characteristics of the measures.
Mikhail Y. Kovalyov
A recently introduced lot scheduling problem is considered. It is to find a partition of jobs of n orders into lots and to sequence these lots on a single machine so that the total average completion time of the orders is minimized. A simple O(n log n) time algorithm is presented for this problem in the literature, with a relatively sophisticated proof of its optimality. We show that modeling this problem as a classic batching machine problem makes its optimal solution obvious.
This paper attempts to replicate the results of evaluating several artificial agents using the Algorithmic Intelligence Quotient test originally reported by Legg and Veness. Three experiments were conducted: One using default settings, one in which the action space was varied and one in which the observation space was varied. While the performance of freq, Q0, Qλ, and HLQλ corresponded well with the original results, the resulting values differed, when using MC-AIXI. Varying the observation space seems to have no qualitative impact on the results as reported, while (contrary to the original results) varying the action space seems to have some impact. An analysis of the impact of modifying parameters of MC-AIXI on its performance in the default settings was carried out with the help of data mining techniques used to identifying highly performing configurations. Overall, the Algorithmic Intelligence Quotient test seems to be reliable, however as a general artificial intelligence evaluation method it has several limits. The test is dependent on the chosen reference machine and also sensitive to changes to its settings. It brings out some differences among agents, however, since they are limited in size, the test setting may not yet be sufficiently complex. A demanding parameter sweep is needed to thoroughly evaluate configurable agents that, together with the test format, further highlights computational requirements of an agent. These and other issues are discussed in the paper along with proposals suggesting how to alleviate them. An implementation of some of the proposals is also demonstrated.
Aleksander Jarzębowicz and Piotr Marciniak
Despite the growing body of knowledge on requirements engineering and business analysis, these areas of software project are still considered problematic. The paper focuses on problems reported by business analysts and on applicability of available business analysis techniques as solutions to such problems. A unified set of techniques was developed on the basis of 3 industrial standards associated with IIBA, REQB and IREB certification schemes. A group of 8 business analysts was surveyed to list problems they encounter in their work and to assess their frequency. Selected problems were further analyzed and most suitable techniques were proposed to address them. These proposals were validated through follow-up discussions with business analysts. The main results of research reported in this paper are: the comparative analysis of techniques included in IIBA, REQB and IREB standards and the list of problems reported by practitioners associated with techniques suggested as effective solutions.
Marcin Adamski, Krzysztof Kurowski, Marek Mika, Wojciech Piątek and Jan Węglarz
In many distributed computing systems, aspects related to security are getting more and more relevant. Security is ubiquitous and could not be treated as a separated problem or a challenge. In our opinion it should be considered in the context of resource management in distributed computing environments like Grids and Clouds, e.g. scheduled computations can be much delayed because of cyber-attacks, inefficient infrastructure or users valuable and sensitive data can be stolen even in the process of correct computation. To prevent such cases there is a need to introduce new evaluation metrics for resource management that will represent the level of security of computing resources and more broadly distributed computing infrastructures. In our approach, we have introduced a new metric called reputation, which simply determines the level of reliability of computing resources from the security perspective and could be taken into account during scheduling procedures. The new reputation metric is based on various relevant parameters regarding cyber-attacks (also energy attacks), administrative activities such as security updates, bug fixes and security patches. Moreover, we have conducted various computational experiments within the Grid Scheduling Simulator environment (GSSIM) inspired by real application scenarios. Finally, our experimental studies of new resource management approaches taking into account critical security aspects are also discussed in this paper.
One of the essential aspect in biological agents is dynamic stability. This aspect, called homeostasis, is widely discussed in ethology, neuroscience and during the early stages of artificial intelligence. Ashby’s homeostats are general-purpose learning machines for stabilizing essential variables of the agent in the face of general environments. However, despite their generality, the original homeostats couldn’t be scaled because they searched their parameters randomly. In this paper, first we re-define the objective of homeostats as the maximization of a multi-step survival probability from the view point of sequential decision theory and probabilistic theory. Then we show that this optimization problem can be treated by using reinforcement learning algorithms with special agent architectures and theoretically-derived intrinsic reward functions. Finally we empirically demonstrate that agents with our architecture automatically learn to survive in a given environment, including environments with visual stimuli. Our survival agents can learn to eat food, avoid poison and stabilize essential variables through theoretically-derived single intrinsic reward formulations.
Krzysztof Krawiec and Paweł Liskowski
Genetic programming (GP) is a variant of evolutionary algorithm where the entities undergoing simulated evolution are computer programs. A fitness function in GP is usually based on a set of tests, each of which defines the desired output a correct program should return for an exemplary input. The outcomes of interactions between programs and tests in GP can be represented as an interaction matrix, with rows corresponding to programs in the current population and columns corresponding to tests. In previous work, we proposed SFIMX, a method that performs only a fraction of interactions and employs non-negative matrix factorization to estimate the outcomes of remaining ones, shortening GP’s runtime. In this paper, we build upon that work and propose three extensions of SFIMX, in which the subset of tests drawn to perform interactions is selected with respect to test difficulty. The conducted experiment indicates that the proposed extensions surpass the original SFIMX on a suite of discrete GP benchmarks.