One aspect that every organization that utilizes the power of technology strives is the ability to tell how their computer systems and the applications running therein are faring. Being able to know when something is not taking place as expected can really boost performance and reduce the amount of time troubleshooting for anomalies. To succeed in that, there are tools that have to be your best friend because they will aid you in this prudent quest. To that end, therefore, there are a number of tools we can utilize to gather and process what is taking place inside your networking equipment and servers (whether physical or virtual).
We are going to explore the best Open Source Monitoring Tools that you can employ in your infrastructure to keep you fully updated on the status of your infrastructure.
LibreNMS is an auto discovering PHP/MySQL/SNMP based network monitoring which includes support for a wide range of network hardware and operating systems including Cisco, Linux, FreeBSD, Juniper, Brocade, Foundry, HP and many more. It is best suited for Network devices and Servers.
Features of LibreNMS
What is cool about libreNMS is the fact that it is auto-discovering. You do not have to tell it if your device is a Cisco, Juniper, Windows or Linux based. It automatically gathers this information like a charm using protocols such as CDP, FDP, LLDP, OSPF, BGP, SNMP, and ARP.
It goes the extra mile and discovers the interfaces on your router or switch which is pretty impressive. It also attempts to draw the connection details of your network but requires assistance from you.
Like most monitoring tools, libreNMS also has the monitoring functionality which can be highly customized.
It can scale
As your network grows, its distributed polling feature allows horizontal scaling of your system.
LibreNMS has a billing system. Yes, this tool has one. This can be done through the generation of bandwidth bills for ports on your network in accordance to usage or transfer.
LibreNMS has an Andriod and Apple Apps which can be used to view and manage your network. This is such a breath of fresh air.
Support or various authentication mechanisms such as radius, LDAP, Active Directory and more.
Generate bandwidth bills for ports on your network based on usage or transfer.
You can integrate it into any other system via its API access.
This tool is a beast and hence we encourage you to take a look at what is happening inside its engine. There is much more than the article can reveal including security through
From nagios.org, “Nagios monitors your entire IT infrastructure to ensure systems, applications, services, and business processes are functioning properly. In the event of a failure, Nagios can alert technical staff of the problem, allowing them to begin remediation processes before outages affect business processes, end-users, or customers.”
It is a tool that began way back in 1999 and has grown to include other products currently but all focused on monitoring. Let us have a look at the features it has for your consideration.
Monitoring of a large number of devices
Nagios has the capabilities of monitor applications, services, operating systems, network protocols, system metrics and infrastructure components with a single tool. This makes it a jack of all trades which can be quite beneficial if you want one tool to cover a wide range of services and devices.
Having many users logged into the interface simultaneously boosts efficiency and even improves your business since interested stakeholders can have a real-time look at the status of the infrastructure. It can also limit views to only user-specific network and hence accommodate more in one platform. You can only see what belongs to you.
Nagios ensures that Service Level Agreements are met by producing reports which can be enhanced by plugins from third party vendors. This makes it highly flexible and customizable.
With a centralized web interface where you can see everything, it can be easy to detect outages.
Nagios has alerting functionality. The alerts can be sent via SMS and mail which translates to the simplified management of your infrastructure.
One interesting feature Nagios has is how event handlers allow the automatic restart of failed applications and services.
From its site, “Zabbix is the ultimate enterprise-level software designed for real-time monitoring of millions of metrics collected from tens of thousands of servers, virtual machines, and network devices.” It is capable of monitoring not only Linux but Windows, Solaris, IBM AIX. It has the capabilities of monitoring applications, services, databases and much more.
Zabbix contains many features and we shall go over them in a nutshell.
Solutions for any kind of IT infrastructure, services, applications, resources-
Next generation Zabbix Agent
Zabbix 4.4 introduces a new type of agent, zabbix_agent2, which offers a wide range of new capabilities and advanced monitoring functions
Collection of Metrics
It has various methods through which it can collect the metrics being desired including
- Multi-platform Zabbix agent(Zabbix agent may run on various supported platforms, including Linux, UNIX, and Windows, and collect data such as CPU, memory, disk and network interface usage from a device.),
- SNMP and IPMI agents
- Agentless monitoring of user services
- Custom methods
- Calculation and aggregation and end-user web monitoring
Detection of anomalies in your set-up
Zabbix is able to detect problem states within the incoming metric flow in an automatic fashion using defined smart thresholds
Better visualization presentation
According to the Zabbix developers, the interface gives its users multiple ways of presenting a visual overview of your infrastructure and environment. These can be in-form of Widget-based dashboards, Graphs, Network maps, and Slideshows.
The server can send messages or mail. A lot more can be done as far as alerts are concerned. For example, the messages can be customized based on the recipient’s role or with runtime and inventory information. Moreover, the messages can be configured to focus on the root causes of the arising problem using the Zabbix Event correlation mechanism.
The use of templates: This feature allows you to Use out-of-the-box templates for most of the popular platforms and to Monitor thousands of similar devices by using configuration templates
Zabbix uses proxies that send collected information in the environment it sits in a central Zabbix server. The Use of Zabbix proxies may greatly simplify the maintenance of an environment monitored by Zabbix and increase the performance of the central Zabbix server. This shows how the monitoring system can scale in a distributed fashion. Zabbix has an API and hence can be used to integrate it into any system in the infrastructure.
Official support of TimescaleDB
According to Prometheus GitHub Page, it is a Cloud Native Computing Foundation project that monitors systems and services. It collects metrics from configured targets at given intervals, evaluates rule expressions, displays the results, and can trigger alerts if some condition is observed to be true.
It fits both machine-centric monitoring as well as monitoring of highly dynamic service-oriented architectures. For graphic visualizations, Prometheus supports tools such as Grafana for data visualization and export.
Top Features of Prometheus
- It is a multi-dimensional data model (time-series defined by metric name and set of key/value dimensions)
- A flexible query language to leverage this dimensionality
- Has no dependency on distributed storage; single server nodes are autonomous
- Timeseries collection happens via a pull model over HTTP
- Pushing time-series is supported via an intermediary gateway
- Targets are discovered via service discovery or static configuration
- Multiple modes of graphing and dashboarding support
- Support for hierarchical and horizontal federation
From their GitHub page, Netdata is distributed, real-time, performance and health monitoring for systems and applications. It is a highly optimized monitoring agent you install on all your systems and containers. It provides unparalleled insights, in real-time, of everything happening on the systems it runs (including web servers, databases, applications), using highly interactive web dashboards. Another cool feature about Netdata is that it can run autonomously, without any third-party components, or it can be integrated into existing monitoring toolchains such as Prometheus, Graphite, OpenTSDB, Kafka, Grafana, and others.
Netdata is a monitoring agent you install on all your systems. It is:
- A metrics collector – for system and application metrics (including web servers, databases, containers, etc)
- A time-series database – all stored in memory (does not touch the disks while it runs)
- A metrics visualizer – super fast, interactive, modern, optimized for anomaly detection
- An alarms notification engine – an advanced watchdog for detecting performance and availability issues
Features of Netdata
- 1s granularity – the highest possible resolution for all metrics.
- Unlimited metrics – collects all the available metrics, the more the better.
- 1% CPU utilization of a single core – it is super fast, unbelievably optimized.
- A few MB of RAM – by default it uses 25MB RAM. You size it.
- Zero disk I/O – while it runs, it does not load or save anything (except error and access logs).
- Zero configuration – auto-detects everything, it can collect up to 10000 metrics per server out of the box.
- Zero maintenance – You just run it, it does the rest.
- Zero dependencies – it is even its own web server, for its static web files and its web API.
- Scales to infinity – you can install it on all your servers, containers, VMs and IoTs.
- Several operating modes – Autonomous host monitoring (the default), headless data collector, forwarding proxy, store and forward proxy, central multi-host monitoring
Health Monitoring & Alarms
Sophisticated alerting – comes with hundreds of alarms, out of the box!
Notifications: Whether you use Telegram, Twilio, Email, kavenegar, messagebird, and others, then you are covered.
- Stunning interactive dashboards – mouse, touchpad and touch-screen friendly in dark and white themes
- Amazingly fast visualization – responds to all queries in less than 1 ms per metric, even on low-end hardware.
- Embeddable – its charts can be embedded on your web pages, wikis and blogs.
What it monitors
Netdata data collection is extensible – you can monitor anything you can get a metric for. APM (Application Performance Monitoring), System Resources, Disks, File systems, Networking, DNS Servers, Virtual Private Networks, Proxies, Balancers, Accelerators.
6. Icinga 2
Icinga is a monitoring system which checks the availability of your network resources, notifies users of outages, and generates performance data for reporting. It is scalable and extensible and can monitor large, complex environments across multiple locations.
Features of Icinga 2
The Icinga Reporting Module is the framework and foundation Icinga created to handle data collected by Icinga 2 and other data providers. It can display the data directly within the Icinga web interface or export it to PDF, JSON or CSV format. With scheduled reports, you can receive the prepared data periodically via email.
Graphs and Metrics
Icinga uses graphite for graphs and metrics. It is a time-series database storing collected metrics and making them available through restful APIs and web interfaces.
You will get Maps, Business Process, Certificate Monitoring and a Dashing Dashboard.
You can use Logstash or Graylog in your infrastructure.
Notification Scripts and Interfaces.
There’s a variety of resources available, for example, different notification scripts such as:
- Pager (XMPP, etc.)
- Ticket systems
From Cacti’s site, this tool “is a complete network graphing solution designed to harness the power of RRDTool’s data storage and graphing functionality. Cacti provide a fast poller, advanced graph templating, multiple data acquisition methods, and user management features out of the box. All of this is wrapped in an intuitive, easy to use interface that makes sense for LAN-sized installations up to complex networks with thousands of devices.”(Cacti.net, 2021).
Cacti harness the power of RRDtool which is an OpenSource industry-standard data logging and graphing system for time series data. This high-performance tool RRDtool can be easily and seamlessly integrated into scripting languages such as shell scripts, perl, python, ruby, lua or tcl applications.
Top features of Cacti include the following
Graph templates enable common graphs to be grouped together by templating. Every field for a normal graph can be templated or specified on a per-graph basis.
Cacti has functionality for data input. This gives users the freedom to develop custom scripts for gathering data from the target devices. Nonetheless, it comes bundled with SNMP which is an industry data-gathering technology. What is more is that Cacti comes with a PHP-based poller having the benefits of executing scripts, retrieving SNMP data, and updating the RRD files
Cacti has this rich feature where multiple users with their accounts can be set up. The administrator has the flexibility of allocating a given portion of privileges to a given user.
Display of graphs
There are three different ways to view your graphs viz, tree view, list view, and preview view. These three views have their benefits, for example, the tree view gives users the ability to create hierarchies of graphs and also the chance to place those graphs on the tree. A large number of graphs can be managed this way. The list view as the name suggests is simply a list of the available graphs and links you to the actual graph when clicked. The last preview view gives a visual of all of the graphs in one large list where you can quickly peruse and look at the graphic graphs.
There are three different types of templates: Data Templates, Graph Templates, and Host Templates. It eases the burden of defining all data sources and graphs without using Templates at all which can be quite painful. The data template provides a skeleton for an actual data source. The Host Templates groups all Graph Templates and Data Query for a given device type. What is more exciting is that you do not need to create all Templates on your own. Templates can be found out of the box and there is a very simple feature where such templates can be imported into your cacti platform.
Cacti can be configured to send mail alerts in case pre-defined variables or thresholds have been exceeded or not achieved. This makes your nights awesome since you do not have to start looking for problems when those calls come in. It will pinpoint that a certain service is down or facing particular anomalies.
Cacti can generate reports in accordance with your configuration.
Grafana is a tool that gives you the power to query, visualize, alert on and understand your metrics no matter where they are stored. You get the chance to create, explore, and share dashboards with your team in an effort to foster a data-driven culture. In brief, Grafana is the open-source analytics and monitoring solution for every database.
Fast and flexible client-side graphs with a multitude of options. Panel plugins for many different ways to visualize metrics and logs.
Create dynamic & reusable dashboards with template variables that appear as dropdowns at the top of the dashboard.
Explore your data through ad-hoc queries and dynamic drill-down. Split view and compare different time ranges, queries and data sources side by side.
Experience the magic of switching from metrics to logs with preserved label filters. Quickly search through all your logs or streaming them live.
Visually define alert rules for your most important metrics. Grafana will continuously evaluate and send notifications to systems like Slack, PagerDuty, VictorOps, OpsGenie.
Mixed Data Sources
Mix different data sources in the same graph! You can specify a data source on a per-query basis. This works for even custom data sources.
Annotate graphs with rich events from different data sources. Hover over events shows you the full event metadata and tags.
Ad-hoc filters allow you to create new key/value filters on the fly, which are automatically applied to all queries that use that data source.
9. Glances – An eye on your system
From its GitHub pagehttps://github.com/nicolargo/glances, Glances is a cross-platform monitoring tool which aims to present a large amount of monitoring information through a curses or Web-based interface. The information dynamically adapts depending on the size of the user interface.
Features of Glances
Written in Python, Glances will run on almost any platform: GNU/Linux, FreeBSD, OS X, and Windows.
Export all system statistics to CSV, InfluxDB, Cassandra, OpenTSDB, StatsD, ElasticSearch or even RabbitMQ. Glances also provide a dedicated Grafana dashboard.
Present a maximum of information in a minimum of space through a curses or Web based interface.
It can adapt dynamically the displayed information depending on the terminal size.
From its GitHub page, Sensu is an open source monitoring tool for ephemeral infrastructure and distributed applications. It is an agent-based monitoring system with built-in auto-discovery, making it very well-suited for cloud environments. It uses service checks to monitor service health and collect telemetry data.
- Server monitoring
- Container monitoring
- Real-time inventory
- Health checks & custom metrics
- Alerts & incident management
- Automated remediation & custom workflows
- 200+ community plugins
- Namespaces and RBAC
- Basic authentication
- Real-time incident dashboard
- Real-time inventory dashboard
- Grafana Datasource
- Multi-tenant dashboard (single-site)
- Discovery, Inventory, Config Management APIs
- Token-based API authentication (JWTs)
Services & support
- Bonsai (hosted Sensu Asset Index & CDN)
- Community support (Discourse, Slack)
It should be noted that there is an enterprise version of Sensu that contains many more features. You can find out more about it in this Enterprise Sensu Link.
Now the choice of tool to use belongs to you. Check them out and have wonderful monitoring in the year. Before you leave, you can go over other sweet guides below.