Tools for Distributed Systems Monitoring

Open access

Abstract

The management of distributed systems infrastructure requires dedicated set of tools. The one tool that helps visualize current operational state of all systems and notify when failure occurs is available within monitoring solution. This paper provides an overview of monitoring approaches for gathering data from distributed systems and what are the major factors to consider when choosing a monitoring solution. Finally we discuss the tools currently available on the market.

[1] Aceto G., Botta A., De Donato W., Pescape A., Cloud monitoring: A survey, Computer Networks, vol. 57, pp. 2093-2115, 2013.

[2] Boccia V. et al., Infrastructure Monitoring for distributed Tier1: The ReCaS project use-case, International Conference on Intelligent Networking and Collaborative Systems, Salerno, Italy, 2014.

[3] Fatema K., Emeakaroha V. C., Healy P. D., Morrison J. P., Lynn T., A survey of Cloud monitoring tools: Taxonomy, capabilities and objectives, Journal of Parallel and Distributed Computing, vol. 74, no. 10, pp. 2918-2933, 2014.

[4] Hakulinen T., Ninin P., Nunes R., Riesco-Hernandez T., Revisiting CERN Safety System Monitoring (SSM), Proceedings of International Conference on Accelerator & Large Experimental Physics Control Systems, San Francisco, California, USA, 2013.

[5] Hernantes J., Gallardo G., Serrano N., IT Infrastructure-Monitoring Tools, IEEE Software, vol. 32, no. 4, pp. 88-93, 2015.

[6] Horalek J., Sobeslav V., Proactive ICT Application Monitoring, Latest Trends in Information Technology, Wseas Press, pp. 49-54, 2012.

[7] Kent K., Souppaya M., Guide to Computer Security Log Management, US Nat'l Inst. Standards and Technology, Sept. 2006; http://csrc.nist.gov/publications/nistpubs/800-92SP800-92.pdf.

[8] Kufel L., Security Event Monitoring in a Distributed Systems Environment, IEEE Security & Privacy, vol. 11, no. 1, pp. 36-43, 2013.

[9] Massie M., Li B., Nicholes B., Vuksan V., Monitoring with Ganglia, Book published by O’Reilly Media, 2013.

[10] Smit M., Simmons B., Litoiu M., Distributed, application-level monitoring for heterogeneous clouds using stream processing, Future Generation Computer Systems, vol. 29, pp. 2103-2114, 2013.

[11] Spellmann A., Gimarc R., Capacity Planning: A Revolutionary Approach for Tomorrow’s Digital Infrastructure, Computer Measurement Group Conference, La Jolla, California, USA, 2013.

[12] Terenziani P., Coping with Events in Temporal Relational Databases, IEEE Trans. Knowledge and Data Eng., vol. 25, no. 5, pp. 1181-1185, 2013.

[13] Tierney B., Crowley B., Gunter D., Holding M., Lee J., Thompson M., A Monitoring Sensor Management System for Grid Environments, Proceedings of The Ninth International Symposium On High-performance Distributed Computing, IEEE CS, pp. 97-104, 2000.

[14] Amazon AWS Micro instance limitations, https://aws.amazon.com/ec2/faqs, Jul 2016.

[15] AppDynamics, Application Performance Monitoring & Management, http://www.appdynamics.com, Apr 2016.

[16] Datadog, Cloud Monitoring as a Service, http://www.datadoghq.com, Apr 2016.

[17] DevOps support teams, http://theagileadmin.com/what-is-devops, Apr 2016.

[18] External Data Representation (XDR), Wikipedia page, https://en.wikipedia.org/wiki/External_Data_Representation, Feb 2016.

[19] Ganglia Monitoring System, http://ganglia.sourceforge.net, Feb 2016.

[20] Graphite, Graphs rendering application, http://graphite.readthedocs.org, Apr 2016.

[21] High availability, Wikipedia page, https://en.wikipedia.org/wiki/High_availability, Feb 2016.

[22] HP Operations Manager, http://hp.com/go/Ops, Feb 2016.

[23] Hyperic Application & System Monitoring, http://sourceforge.net/projects/hyperic-hq, Feb 2016.

[24] IBM SmartCloud Monitoring, http://ibm.com/software/tivoli/products/smartcloudmonitoring, Feb 2016.

[25] Icinga, Open Source Monitoring, http://www.icinga.org, Apr 2016.

[26] InfluxData, The platform for time-series data, https://influxdata.com, Apr 2016.

[27] International Telecommunication Union, X.733: Information technology - Open Systems Interconnection - Systems Management: Alarm reporting function, http://www.itu.int/rec/T-REC-X.733/en, Apr 2016.

[28] Live monitoring console of Wikimedia Grid, http://ganglia.wikimedia.org, Feb 2016.

[29] ManageEngine Applications Manager, http://appmanager.com, Feb 2016.

[30] Nagios - The Industry Standard In IT Infrastructure Monitoring, http://www.nagios.org, Feb 2016.

[31] New Relic, Application Performance Management & Monitoring, http://newrelic.com, Apr 2016.

[32] PagerDuty, The Incident Resolution Platform For IT Operations & DevOps Teams, http://www.pagerduty.com, Apr 2016.

[33] Prometheus, Monitoring system and time-series database, http://prometheus.io, Apr 2016.

[34] Request for Comments (RFC) 5424 - The Syslog Protocol, http://tools.ietf.org/html/rfc5424#section-6.2.1, Feb 2016.

[35] Request for Comments (RFC) 5674 - Alarms in Syslog, https://tools.ietf.org/html/rfc5674.html, Apr 2016.

[36] Riemann, A network monitoring system, http://riemann.io, Apr 2016.

[37] Sensu, Monitoring for today’s infrastructure, https://sensuapp.org, Apr 2016.

[38] Shinken Monitoring, http://shinken-monitoring.org, Apr 2016.

[39] Windows Event Types, http://msdn.microsoft.com/enus/library/windows/desktop/aa363662.aspx, Feb 2016.

[40] Zabbix - The Enterprise-Class Open Source Network Monitoring Solution, http://www.zabbix.com, Feb 2016.

Foundations of Computing and Decision Sciences

The Journal of Poznan University of Technology

Journal Information


CiteScore 2017: 0.82

SCImago Journal Rank (SJR) 2017: 0.212
Source Normalized Impact per Paper (SNIP) 2017: 0.523

Metrics

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 259 259 87
PDF Downloads 82 82 35