Software Measurement and Defect Prediction with Depress Extensible Framework

Open access

Abstract

Context. Software data collection precedes analysis which, in turn, requires data science related skills. Software defect prediction is hardly used in industrial projects as a quality assurance and cost reduction mean. Objectives. There are many studies and several tools which help in various data analysis tasks but there is still neither an open source tool nor standardized approach. Results. We developed Defect Prediction for software systems (DePress), which is an extensible software measurement, and data integration framework which can be used for prediction purposes (e.g. defect prediction, effort prediction) and software changes analysis (e.g. release notes, bug statistics, commits quality). DePress is based on the KNIME project and allows building workflows in a graphic, end-user friendly manner. Conclusions. We present main concepts, as well as the development state of the DePress framework. The results show that DePress can be used in Open Source, as well as in industrial project analysis.

[1] M. D'Ambros and M. Lanza, “Distributed and Collaborative Software Evolution Analysis with Churrasco,” Sci. Comput. Program., vol. 75, pp. 276-287, Apr. 2010.

[2] G. Ghezzi and H. C. Gall, “Distributed and collaborative software analysis,” in Collaborative software engineering (I. Mistrik, J. Grundy, A. van der Hoek, and J. Whitehead, eds.), pp. 241-263, Heidelberg, Germany: Springer, January 2010.

[3] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, “The WEKA data mining software: an update,” ACM SIGKDD Explorations Newsletter, vol. 11, no. 1, pp. 10-18, 2009.

[4] R. Ihaka and R. Gentleman, “R: A language for data analysis and graphics,” Journal of computational and graphical statistics, vol. 5, no. 3, pp. 299-314, 1996.

[5] M. U. Guide, “The mathworks,” Inc., Natick, MA, vol. 5, 1998.

[6] M. R. Berthold, N. Cebron, F. Dill, T. R. Gabriel, T. Kötter, T. Meinl, P. Ohl, C. Sieb, K. Thiel, and B. Wiswedel, “KNIME: The Konstanz Information Miner,” in Studies in Classification, Data Analysis, and Knowledge Organization (GfKL 2007), Springer, 2007.

[7] D. Morent, K. Stathatos, W.-C. Lin, and M. R. Berthold, “Comprehensive PMML preprocessing in KNIME,” in Proceedings of the 2011 workshop on Predictive markup language modeling, PMML '11, (New York, NY, USA), pp. 28-31, ACM, 2011.

[8] Data Mining Group, “PMML Powered.” http://www.dmg.org/products.html, 2012.

[9] T. Meinl and G. Landrum, “Get your chemistry right with knime,” Journal of Cheminformatics, vol. 5, no. Suppl 1, p. F1, 2013.

[10] W. A. Warr, “Scientific workow systems: Pipeline Pilot and KNIME,” Journal of computer-aided molecular design, pp. 1-4, 2012.

[11] M. P. Mazanetz, R. J. Marmon, C. B. Reisser, and I. Morao, “Drug Discovery Applications for KNIME: An Open Source Data Mining Platform,” Current topics in medicinal chemistry, vol. 12, no. 18, pp. 1965-1979, 2012.

[12] M. Jureczko and J. Magott, “QualitySpy: a framework for monitoring software development processes,” Journal of Theoretical and Applied Computer Science, vol. 6, no. 1, pp. 35-45, 2012.

[13] Marian Jureczko and contributors, “Quality Spy.” http://java.net/projects/qualityspy.

[14] The Apache Software Foundation, “Apache License, Version 2.0.” http://www.apache.org/licenses/LICENSE-2.0.html.

[15] N. Fenton, P. Krause, M. Neil, and C. Lane, “A Probabilistic Model for Software Defect Prediction,” 2001.

[16] N. E. Fenton and M. Neil, “Software metrics: success, failures and new directions,” J. Syst. Softw., vol. 47, pp. 149-157, July 1999.

[17] Agena, “Agenarisk Desktop.” .

[18] S. Demeyer, S. Tichelaar, and S. Ducasse, “FAMIX 2.1 - The FAMOOS Information Exchange Model,” tech. rep., University of Berne, 2001.

[19] TIOBE, “Programming Community Index.” http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html, 10 2013.

[20] Black Duck Software, “Ohloh Index.” https://www.ohloh.net/languages.

[21] H. C. Gall, B. Fluri, and M. Pinzger, “Change Analysis with Evolizer and ChangeDistiller,” IEEE Software, vol. 26, no. 1, pp. 26-33, 2009.

[22] B. Fluri, M. Würsch, M. Pinzger, and H. Gall, “Change distilling: Tree differencing for fine-grained source code change extraction,” IEEE Transactions on Software Engineering, vol. 33, pp. 725-743, NOV 2007.

[23] The Eclipse Foundation, “Eclipse.” http://www.eclipse.org/.

[24] G. Ghezzi and H. C. Gall, “SOFAS: A Lightweight Architecture for Software Analysis as a Service,” in 2011 Ninth Working IEEE/IFIP Conference on Software Architecture, pp. 93-102, IEEE, June 2011.

[25] W3C, “Sparql query language for rdf.” http://www.w3.org/TR/rdf-sparql-query/.

[26] M. Fischer, M. Pinzger, and H. Gall, “Populating a release history database from version control and bug tracking systems,” in Software Maintenance, 2003. ICSM 2003. Proceedings. International Conference on, pp. 23-32, IEEE, 2003.

[27] L. Madeyski and N. Radyk, “Judy-a mutation testing tool for Java,” Software, IET, vol. 4, no. 1, pp. 32-42, 2010. http://madeyski.e-informatyka.pl/download/Madeyski10b.pdf.

[28] L. Madeyski, W. Orzeszyna, R. Torkar, and M. Józala, “Overcoming the equivalent mutant problem: A systematic literature review and a comparative experiment of second order mutation,” IEEE Transactions on Software Engineering, vol. 40, pp. 23-42, January 2014. http://dx.doi.org/10.1109/TSE.2013.44.

[29] L. Madeyski, Test-Driven Development: An Empirical Evaluation of Agile Practice. (Heidelberg, London, New York): Springer, 2010. http://www.springer.com/978-3-642-04287-4.

[30] L. Madeyski, “The impact of test-first programming on branch coverage and mutation score indicator of unit tests: An experiment,” Information and Software Technology, vol. 52, no. 2, pp. 169-184, 2010. Draft: http://madeyski.e-informatyka.pl/download/Madeyski10c.pdf.

[31] L. Madeyski, “The impact of pair programming on thoroughness and fault detection effectiveness of unit tests suites,” Software Process: Improvement and Practice, vol. 13, no. 3, pp. 281-295, 2008. Draft: .

[32] JaCoCo. http://www.eclemma.org/jacoco/.

[33] F. Sauer, “Eclipse metrics plugin.” http://metrics.sourceforge.net/.

[34] Checkstyle. http://checkstyle.sourceforge.net/, 2007.

[35] PMD. http:/pmd.sourceforge.net/.

[36] PIT. http:/pitest.org/.

[37] FindBugs. http:/findbugs.sourceforge.net/.

[38] S. R. Chidamber and C. F. Kemerer, “A metrics suite for object oriented design,” IEEE Transactions on Software Engineering, vol. 20, no. 6, pp. 476-493, 1994.

[39] N. Nagappan, B. Murphy, and V. Basili, “The inuence of organizational structure on software quality: an empirical case study,” in Proceedings of the 30th international conference on Software engineering, pp. 521-530, ACM, 2008.

[40] Atlassian, “REST Plugin Module.”

[41] TMate Software, “SVNKit.” http://svnkit.com/.

[42] R Core Team, R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2014.

[43] I. H. Witten, E. Frank, and M. A. Hall, Data Mining: Practical Machine Learning Tools and Techniques. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 3rd ed., 2011.

[44] G. Williams, M. Hahsler, H. Ishwaran, U. B. Kogalur, and R. Guha, pmml: Package 'pmml', 2012. R package version 1.2.32.

[45] BIRT. http://www.eclipse.org/birt/phoenix/.

[46] N. Nagappan, T. Ball, and A. Zeller, “Mining metrics to predict component failures,” in Proceedings of the 28th international conference on Software engineering, pp. 452-461, ACM, 2006.

[47] L. Madeyski and M. Majchrzak, “ImpressiveCode DePress (Defect Prediction for software systems) Extensible Framework,” 2012. Available as an open source project from GitHub: https://github.com/ImpressiveCode/ic-depress.

[48] T. Menzies, B. Caglayan, Z. He, E. Kocaguneli, J. Krall, F. Peters, and B. Turhan, “The PROMISE Repository of empirical software engineering data,” June 2012.

[49] M. Jureczko and L. Madeyski, “Towards identifying software project clusters with regard to defect prediction,” in Proceedings of the 6th International Conference on Predictive Models in Software Engineering, PROMISE '10, (New York, NY, USA), pp. 9:1-9:10, ACM, 2010.

[50] L. Madeyski and M. Jureczko, “Which Process Metrics Can Significantly Improve Defect Prediction Models? An Empirical Study,” Software Quality Journal, 2014. DOI: 10.1007/s11219-014-9241-7 (accepted), preprint: http://madeyski.e-informatyka.pl/download/Madeyski14SQJ.pdf.

[51] D. De Roure, C. Goble, and R. Stevens, “The design and realisation of the myexperiment virtual research environment for social sharing of workows,” Future Generation Computer Systems, vol. 25, pp. 561-567, 2009.

[52] Free Software Foundation, Inc., “GNU General Public License.” http://www.gnu.org/licenses/gpl-3.0.en.html.

[53] GitHub Inc. http://www.github.com.

[54] L. Dabbish, C. Stuart, J. Tsay, and J. Herbsleb, “Social coding in GitHub: transparency and collaboration in an open software repository,” in Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work, CSCW '12, (New York, NY, USA), pp. 1277-1286, ACM, 2012.

[55] M. Majchrzak and L. Madeyski, “DePress JIRA.” https://depress.atlassian.net/browse/DEP, 2013.

[56] L. Madeyski and M. Majchrzak, “DePress GitHub Issues.” https://github.com/ImpressiveCode/ic-depress/issues, 2012.

[57] N. Nagappan, A. Zeller, T. Zimmermann, K. Herzig, and B. Murphy, “Change Bursts as Defect Predictors,” in Software Reliability Engineering (ISSRE), 2010 IEEE 21st International Symposium on, pp. 309 -318, nov. 2010.

Foundations of Computing and Decision Sciences

The Journal of Poznan University of Technology

Journal Information


CiteScore 2016: 0.75

SCImago Journal Rank (SJR) 2016: 0.330
Source Normalized Impact per Paper (SNIP) 2016: 0.709

Metrics

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 71 71 37
PDF Downloads 14 14 7