Context. Software data collection precedes analysis which, in turn, requires data science related skills. Software defect prediction is hardly used in industrial projects as a quality assurance and cost reduction mean. Objectives. There are many studies and several tools which help in various data analysis tasks but there is still neither an open source tool nor standardized approach. Results. We developed Defect Prediction for software systems (DePress), which is an extensible software measurement, and data integration framework which can be used for prediction purposes (e.g. defect prediction, effort prediction) and software changes analysis (e.g. release notes, bug statistics, commits quality). DePress is based on the KNIME project and allows building workflows in a graphic, end-user friendly manner. Conclusions. We present main concepts, as well as the development state of the DePress framework. The results show that DePress can be used in Open Source, as well as in industrial project analysis.
If the inline PDF is not rendering correctly, you can download the PDF file here.
 M. D'Ambros and M. Lanza “Distributed and Collaborative Software Evolution Analysis with Churrasco” Sci. Comput. Program. vol. 75 pp. 276-287 Apr. 2010.
 G. Ghezzi and H. C. Gall “Distributed and collaborative software analysis” in Collaborative software engineering (I. Mistrik J. Grundy A. van der Hoek and J. Whitehead eds.) pp. 241-263 Heidelberg Germany: Springer January 2010.
 M. Hall E. Frank G. Holmes B. Pfahringer P. Reutemann and I. H. Witten “The WEKA data mining software: an update” ACM SIGKDD Explorations Newsletter vol. 11 no. 1 pp. 10-18 2009.
 R. Ihaka and R. Gentleman “R: A language for data analysis and graphics” Journal of computational and graphical statistics vol. 5 no. 3 pp. 299-314 1996.
 M. U. Guide “The mathworks” Inc. Natick MA vol. 5 1998.
 M. R. Berthold N. Cebron F. Dill T. R. Gabriel T. Kötter T. Meinl P. Ohl C. Sieb K. Thiel and B. Wiswedel “KNIME: The Konstanz Information Miner” in Studies in Classification Data Analysis and Knowledge Organization (GfKL 2007) Springer 2007.
 D. Morent K. Stathatos W.-C. Lin and M. R. Berthold “Comprehensive PMML preprocessing in KNIME” in Proceedings of the 2011 workshop on Predictive markup language modeling PMML '11 (New York NY USA) pp. 28-31 ACM 2011.
 Data Mining Group “PMML Powered.” http://www.dmg.org/products.html 2012.
 T. Meinl and G. Landrum “Get your chemistry right with knime” Journal of Cheminformatics vol. 5 no. Suppl 1 p. F1 2013.
 W. A. Warr “Scientific workow systems: Pipeline Pilot and KNIME” Journal of computer-aided molecular design pp. 1-4 2012.
 M. P. Mazanetz R. J. Marmon C. B. Reisser and I. Morao “Drug Discovery Applications for KNIME: An Open Source Data Mining Platform” Current topics in medicinal chemistry vol. 12 no. 18 pp. 1965-1979 2012.
 M. Jureczko and J. Magott “QualitySpy: a framework for monitoring software development processes” Journal of Theoretical and Applied Computer Science vol. 6 no. 1 pp. 35-45 2012.
 Marian Jureczko and contributors “Quality Spy.” http://java.net/projects/qualityspy.
 The Apache Software Foundation “Apache License Version 2.0.” http://www.apache.org/licenses/LICENSE-2.0.html.
 N. Fenton P. Krause M. Neil and C. Lane “A Probabilistic Model for Software Defect Prediction” 2001.
 N. E. Fenton and M. Neil “Software metrics: success failures and new directions” J. Syst. Softw. vol. 47 pp. 149-157 July 1999.
 Agena “Agenarisk Desktop.” .
 S. Demeyer S. Tichelaar and S. Ducasse “FAMIX 2.1 - The FAMOOS Information Exchange Model” tech. rep. University of Berne 2001.
 TIOBE “Programming Community Index.” http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html 10 2013.
 Black Duck Software “Ohloh Index.” https://www.ohloh.net/languages.
 H. C. Gall B. Fluri and M. Pinzger “Change Analysis with Evolizer and ChangeDistiller” IEEE Software vol. 26 no. 1 pp. 26-33 2009.
 B. Fluri M. Würsch M. Pinzger and H. Gall “Change distilling: Tree differencing for fine-grained source code change extraction” IEEE Transactions on Software Engineering vol. 33 pp. 725-743 NOV 2007.
 The Eclipse Foundation “Eclipse.” http://www.eclipse.org/.
 G. Ghezzi and H. C. Gall “SOFAS: A Lightweight Architecture for Software Analysis as a Service” in 2011 Ninth Working IEEE/IFIP Conference on Software Architecture pp. 93-102 IEEE June 2011.
 W3C “Sparql query language for rdf.” http://www.w3.org/TR/rdf-sparql-query/.
 M. Fischer M. Pinzger and H. Gall “Populating a release history database from version control and bug tracking systems” in Software Maintenance 2003. ICSM 2003. Proceedings. International Conference on pp. 23-32 IEEE 2003.
 L. Madeyski and N. Radyk “Judy-a mutation testing tool for Java” Software IET vol. 4 no. 1 pp. 32-42 2010. http://madeyski.e-informatyka.pl/download/Madeyski10b.pdf.
 L. Madeyski W. Orzeszyna R. Torkar and M. Józala “Overcoming the equivalent mutant problem: A systematic literature review and a comparative experiment of second order mutation” IEEE Transactions on Software Engineering vol. 40 pp. 23-42 January 2014. http://dx.doi.org/10.1109/TSE.2013.44.
 L. Madeyski Test-Driven Development: An Empirical Evaluation of Agile Practice. (Heidelberg London New York): Springer 2010. http://www.springer.com/978-3-642-04287-4.
 L. Madeyski “The impact of test-first programming on branch coverage and mutation score indicator of unit tests: An experiment” Information and Software Technology vol. 52 no. 2 pp. 169-184 2010. Draft: http://madeyski.e-informatyka.pl/download/Madeyski10c.pdf.
 L. Madeyski “The impact of pair programming on thoroughness and fault detection effectiveness of unit tests suites” Software Process: Improvement and Practice vol. 13 no. 3 pp. 281-295 2008. Draft: .
 JaCoCo. http://www.eclemma.org/jacoco/.
 F. Sauer “Eclipse metrics plugin.” http://metrics.sourceforge.net/.
 S. R. Chidamber and C. F. Kemerer “A metrics suite for object oriented design” IEEE Transactions on Software Engineering vol. 20 no. 6 pp. 476-493 1994.
 N. Nagappan B. Murphy and V. Basili “The inuence of organizational structure on software quality: an empirical case study” in Proceedings of the 30th international conference on Software engineering pp. 521-530 ACM 2008.
 Atlassian “REST Plugin Module.”
 TMate Software “SVNKit.” http://svnkit.com/.
 R Core Team R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing Vienna Austria 2014.
 I. H. Witten E. Frank and M. A. Hall Data Mining: Practical Machine Learning Tools and Techniques. San Francisco CA USA: Morgan Kaufmann Publishers Inc. 3rd ed. 2011.
 G. Williams M. Hahsler H. Ishwaran U. B. Kogalur and R. Guha pmml: Package 'pmml' 2012. R package version 1.2.32.
 BIRT. http://www.eclipse.org/birt/phoenix/.
 N. Nagappan T. Ball and A. Zeller “Mining metrics to predict component failures” in Proceedings of the 28th international conference on Software engineering pp. 452-461 ACM 2006.
 L. Madeyski and M. Majchrzak “ImpressiveCode DePress (Defect Prediction for software systems) Extensible Framework” 2012. Available as an open source project from GitHub: https://github.com/ImpressiveCode/ic-depress.
 T. Menzies B. Caglayan Z. He E. Kocaguneli J. Krall F. Peters and B. Turhan “The PROMISE Repository of empirical software engineering data” June 2012.
 M. Jureczko and L. Madeyski “Towards identifying software project clusters with regard to defect prediction” in Proceedings of the 6th International Conference on Predictive Models in Software Engineering PROMISE '10 (New York NY USA) pp. 9:1-9:10 ACM 2010.
 L. Madeyski and M. Jureczko “Which Process Metrics Can Significantly Improve Defect Prediction Models? An Empirical Study” Software Quality Journal 2014. DOI: 10.1007/s11219-014-9241-7 (accepted) preprint: http://madeyski.e-informatyka.pl/download/Madeyski14SQJ.pdf.
 D. De Roure C. Goble and R. Stevens “The design and realisation of the myexperiment virtual research environment for social sharing of workows” Future Generation Computer Systems vol. 25 pp. 561-567 2009.
 Free Software Foundation Inc. “GNU General Public License.” http://www.gnu.org/licenses/gpl-3.0.en.html.
 GitHub Inc. http://www.github.com.
 L. Dabbish C. Stuart J. Tsay and J. Herbsleb “Social coding in GitHub: transparency and collaboration in an open software repository” in Proceedings of the ACM 2012 conference on Computer Supported Cooperative Work CSCW '12 (New York NY USA) pp. 1277-1286 ACM 2012.
 M. Majchrzak and L. Madeyski “DePress JIRA.” https://depress.atlassian.net/browse/DEP 2013.
 L. Madeyski and M. Majchrzak “DePress GitHub Issues.” https://github.com/ImpressiveCode/ic-depress/issues 2012.
 N. Nagappan A. Zeller T. Zimmermann K. Herzig and B. Murphy “Change Bursts as Defect Predictors” in Software Reliability Engineering (ISSRE) 2010 IEEE 21st International Symposium on pp. 309 -318 nov. 2010.