Tools for Distributed Systems Monitoring

Open access

Abstract

The management of distributed systems infrastructure requires dedicated set of tools. The one tool that helps visualize current operational state of all systems and notify when failure occurs is available within monitoring solution. This paper provides an overview of monitoring approaches for gathering data from distributed systems and what are the major factors to consider when choosing a monitoring solution. Finally we discuss the tools currently available on the market.

If the inline PDF is not rendering correctly, you can download the PDF file here.

  • [1] Aceto G. Botta A. De Donato W. Pescape A. Cloud monitoring: A survey Computer Networks vol. 57 pp. 2093-2115 2013.

  • [2] Boccia V. et al. Infrastructure Monitoring for distributed Tier1: The ReCaS project use-case International Conference on Intelligent Networking and Collaborative Systems Salerno Italy 2014.

  • [3] Fatema K. Emeakaroha V. C. Healy P. D. Morrison J. P. Lynn T. A survey of Cloud monitoring tools: Taxonomy capabilities and objectives Journal of Parallel and Distributed Computing vol. 74 no. 10 pp. 2918-2933 2014.

  • [4] Hakulinen T. Ninin P. Nunes R. Riesco-Hernandez T. Revisiting CERN Safety System Monitoring (SSM) Proceedings of International Conference on Accelerator & Large Experimental Physics Control Systems San Francisco California USA 2013.

  • [5] Hernantes J. Gallardo G. Serrano N. IT Infrastructure-Monitoring Tools IEEE Software vol. 32 no. 4 pp. 88-93 2015.

  • [6] Horalek J. Sobeslav V. Proactive ICT Application Monitoring Latest Trends in Information Technology Wseas Press pp. 49-54 2012.

  • [7] Kent K. Souppaya M. Guide to Computer Security Log Management US Nat'l Inst. Standards and Technology Sept. 2006; http://csrc.nist.gov/publications/nistpubs/800-92SP800-92.pdf.

  • [8] Kufel L. Security Event Monitoring in a Distributed Systems Environment IEEE Security & Privacy vol. 11 no. 1 pp. 36-43 2013.

  • [9] Massie M. Li B. Nicholes B. Vuksan V. Monitoring with Ganglia Book published by O’Reilly Media 2013.

  • [10] Smit M. Simmons B. Litoiu M. Distributed application-level monitoring for heterogeneous clouds using stream processing Future Generation Computer Systems vol. 29 pp. 2103-2114 2013.

  • [11] Spellmann A. Gimarc R. Capacity Planning: A Revolutionary Approach for Tomorrow’s Digital Infrastructure Computer Measurement Group Conference La Jolla California USA 2013.

  • [12] Terenziani P. Coping with Events in Temporal Relational Databases IEEE Trans. Knowledge and Data Eng. vol. 25 no. 5 pp. 1181-1185 2013.

  • [13] Tierney B. Crowley B. Gunter D. Holding M. Lee J. Thompson M. A Monitoring Sensor Management System for Grid Environments Proceedings of The Ninth International Symposium On High-performance Distributed Computing IEEE CS pp. 97-104 2000.

  • [14] Amazon AWS Micro instance limitations https://aws.amazon.com/ec2/faqs Jul 2016.

  • [15] AppDynamics Application Performance Monitoring & Management http://www.appdynamics.com Apr 2016.

  • [16] Datadog Cloud Monitoring as a Service http://www.datadoghq.com Apr 2016.

  • [17] DevOps support teams http://theagileadmin.com/what-is-devops Apr 2016.

  • [18] External Data Representation (XDR) Wikipedia page https://en.wikipedia.org/wiki/External_Data_Representation Feb 2016.

  • [19] Ganglia Monitoring System http://ganglia.sourceforge.net Feb 2016.

  • [20] Graphite Graphs rendering application http://graphite.readthedocs.org Apr 2016.

  • [21] High availability Wikipedia page https://en.wikipedia.org/wiki/High_availability Feb 2016.

  • [22] HP Operations Manager http://hp.com/go/Ops Feb 2016.

  • [23] Hyperic Application & System Monitoring http://sourceforge.net/projects/hyperic-hq Feb 2016.

  • [24] IBM SmartCloud Monitoring http://ibm.com/software/tivoli/products/smartcloudmonitoring Feb 2016.

  • [25] Icinga Open Source Monitoring http://www.icinga.org Apr 2016.

  • [26] InfluxData The platform for time-series data https://influxdata.com Apr 2016.

  • [27] International Telecommunication Union X.733: Information technology - Open Systems Interconnection - Systems Management: Alarm reporting function http://www.itu.int/rec/T-REC-X.733/en Apr 2016.

  • [28] Live monitoring console of Wikimedia Grid http://ganglia.wikimedia.org Feb 2016.

  • [29] ManageEngine Applications Manager http://appmanager.com Feb 2016.

  • [30] Nagios - The Industry Standard In IT Infrastructure Monitoring http://www.nagios.org Feb 2016.

  • [31] New Relic Application Performance Management & Monitoring http://newrelic.com Apr 2016.

  • [32] PagerDuty The Incident Resolution Platform For IT Operations & DevOps Teams http://www.pagerduty.com Apr 2016.

  • [33] Prometheus Monitoring system and time-series database http://prometheus.io Apr 2016.

  • [34] Request for Comments (RFC) 5424 - The Syslog Protocol http://tools.ietf.org/html/rfc5424#section-6.2.1 Feb 2016.

  • [35] Request for Comments (RFC) 5674 - Alarms in Syslog https://tools.ietf.org/html/rfc5674.html Apr 2016.

  • [36] Riemann A network monitoring system http://riemann.io Apr 2016.

  • [37] Sensu Monitoring for today’s infrastructure https://sensuapp.org Apr 2016.

  • [38] Shinken Monitoring http://shinken-monitoring.org Apr 2016.

  • [39] Windows Event Types http://msdn.microsoft.com/enus/library/windows/desktop/aa363662.aspx Feb 2016.

  • [40] Zabbix - The Enterprise-Class Open Source Network Monitoring Solution http://www.zabbix.com Feb 2016.

Search
Journal information
Impact Factor


CiteScore 2018: 0.61

SCImago Journal Rank (SJR) 2018: 0.152
Source Normalized Impact per Paper (SNIP) 2018: 0.463

Mathematical Citation Quotient (MCQ) 2018: 0.08

Cited By
Metrics
All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 1011 649 24
PDF Downloads 676 494 40