One aspect that every organization that utilizes the power of technology strives is the ability to tell how their computer systems and the applications running therein are faring. Being able to know when something is not taking place as expected can really boost performance and reduce the amount of time troubleshooting for anomalies. To succeed in that, there are tools that have to be your best friend because they will aid you in this prudent quest. To that end, therefore, there are a number of tools we can utilize to gather and process what is taking place inside your networking equipment and servers (whether physical or virtual).

Original content from computingforgeeks.com - post 44628

We are going to explore the best Open Source Monitoring Tools that you can employ in your infrastructure to keep you fully updated on the status of your infrastructure.

1. Checkmk

Checkmk is an open source monitoring solution enabling users to keep an eye on all assets within hybrid IT environments. The software can be set up within minutes, and is easy to maintain regardless of where in your IT infrastructure you decide to deploy it. Checkmk can additionally be customized to suit your requirements, and is extremely scalable.

Features of Checkmk

Since Checkmk’s introduction in 2007 it has remained an active open source project, providing an integrated monitoring solution. Thanks to Checkmk’s many great features, it comes as no surprise that the user community is growing by the day. Tens of thousands of users in more than 50 countries rely on Checkmk to ensure high availability and the best performance of their systems

Ready within minutes

You can deploy Checkmk under all major Linux distributions, as a virtual or physical appliance, or run it in Docker. It comes in an integrated package, so that you can implement your monitoring in a matter of minutes – there is no need to configure databases and web servers to get everything working. Adding hosts is also easy, because Checkmk supports the auto-discovery of monitoring services and auto-configuration of plug-ins.

Monitoring out-of-the-box

Over 2,000 pre-configured official plug-ins are included with Checkmk for automated monitoring. These cover your servers, networks, applications, storage, data center hardware, cloud assets, as well as Docker and Kubernetes.

Receive notifications when you need them

Many organizations suffer from false alerts, but with Checkmk you can put an end to alert fatigue and only get a notification when action is actually required. You can use email, SMS, Slack, Telegram and other messenger apps to make sure notifications reach the right person at the right time.

Future-proof

Based on the needs of the users, the Checkmk development team ensures continuity and constant innovation on the product side. Our team maintains all plug-ins, and also ensures that Checkmk is being continuously expanded with valuable new features for meeting the IT challenges of tomorrow.

Integrate your stack

Checkmk combines enterprise-grade scalability and security with the extensibility of open source software. It integrates seamlessly with other enterprise applications such as InfluxDB, DataDog, VictorOps, Grafana or Jira and provides powerful features to automate monitoring workflows.

Visualization

Checkmk comes with interactive HTML5 graphs that allow you to analyze time-series metrics over long time spans. You can also leverage graphic maps and diagrams with live monitoring data to get a dynamic view of the health of your infrastructure and applications. All dashboards and views are customizable to suit your specific needs.

Choosing the right monitoring machine for your IT

You can use Checkmk for free with the completely open-source Checkmk Raw Edition, or monitor up to 25 hosts with the free version of the Checkmk Enterprise Edition.

Installation guides

Follow guide below to install Checkmk:

Install Checkmk as Docker container

2. LibreNMS

LibreNMS is an auto discovering PHP/MySQL/SNMP based network monitoring which includes support for a wide range of network hardware and operating systems including Cisco, Linux, FreeBSD, Juniper, Brocade, Foundry, HP and many more. It is best suited for Network devices and Servers.

Features of LibreNMS

What is cool about libreNMS is the fact that it is auto-discovering. You do not have to tell it if your device is a Cisco, Juniper, Windows or Linux based. It automatically gathers this information like a charm using protocols such as CDP, FDP, LLDP, OSPF, BGP, SNMP, and ARP.

It goes the extra mile and discovers the interfaces on your router or switch which is pretty impressive. It also attempts to draw the connection details of your network but requires assistance from you.

Alerts

Like most monitoring tools, libreNMS also has the monitoring functionality which can be highly customized.

It can scale

As your network grows, its distributed polling feature allows horizontal scaling of your system.
LibreNMS has a billing system. Yes, this tool has one. This can be done through the generation of bandwidth bills for ports on your network in accordance to usage or transfer.

Mobile Apps

LibreNMS has an Andriod and Apple Apps which can be used to view and manage your network. This is such a breath of fresh air.
Support or various authentication mechanisms such as radius, LDAP, Active Directory and more.

Billing system

Generate bandwidth bills for ports on your network based on usage or transfer.

API Integration

You can integrate it into any other system via its API access.
This tool is a beast and hence we encourage you to take a look at what is happening inside its engine. There is much more than the article can reveal including security through

Installation guides:

3. Nagios

From nagios.org, “Nagios monitors your entire IT infrastructure to ensure systems, applications, services, and business processes are functioning properly. In the event of a failure, Nagios can alert technical staff of the problem, allowing them to begin remediation processes before outages affect business processes, end-users, or customers.”

It is a tool that began way back in 1999 and has grown to include other products currently but all focused on monitoring. Let us have a look at the features it has for your consideration.

Monitoring of a large number of devices

Nagios has the capabilities of monitor applications, services, operating systems, network protocols, system metrics and infrastructure components with a single tool. This makes it a jack of all trades which can be quite beneficial if you want one tool to cover a wide range of services and devices.

Multi-tenancy

Having many users logged into the interface simultaneously boosts efficiency and even improves your business since interested stakeholders can have a real-time look at the status of the infrastructure. It can also limit views to only user-specific network and hence accommodate more in one platform. You can only see what belongs to you.

Reporting

Nagios ensures that Service Level Agreements are met by producing reports which can be enhanced by plugins from third party vendors. This makes it highly flexible and customizable.

Visibility

With a centralized web interface where you can see everything, it can be easy to detect outages.

Notifications

Nagios has alerting functionality. The alerts can be sent via SMS and mail which translates to the simplified management of your infrastructure.

One interesting feature Nagios has is how event handlers allow the automatic restart of failed applications and services.

Installation guides:

Install and Configure Nagios 4 on RHEL 8 / CentOS

4. Zabbix

From its site, “Zabbix is the ultimate enterprise-level software designed for real-time monitoring of millions of metrics collected from tens of thousands of servers, virtual machines, and network devices.” It is capable of monitoring not only Linux but Windows, Solaris, IBM AIX. It has the capabilities of monitoring applications, services, databases and much more.

Zabbix contains many features and we shall go over them in a nutshell.

Monitors anything

Solutions for any kind of IT infrastructure, services, applications, resources-

Next generation Zabbix Agent

Zabbix 4.4 introduces a new type of agent, zabbix_agent2, which offers a wide range of new capabilities and advanced monitoring functions

Collection of Metrics

It has various methods through which it can collect the metrics being desired including

Multi-platform Zabbix agent(Zabbix agent may run on various supported platforms, including Linux, UNIX, and Windows, and collect data such as CPU, memory, disk and network interface usage from a device.),
SNMP and IPMI agents
Agentless monitoring of user services
Custom methods
Calculation and aggregation and end-user web monitoring

Detection of anomalies in your set-up

Zabbix is able to detect problem states within the incoming metric flow in an automatic fashion using defined smart thresholds

Better visualization presentation

According to the Zabbix developers, the interface gives its users multiple ways of presenting a visual overview of your infrastructure and environment. These can be in-form of Widget-based dashboards, Graphs, Network maps, and Slideshows.

Notifications

The server can send messages or mail. A lot more can be done as far as alerts are concerned. For example, the messages can be customized based on the recipient’s role or with runtime and inventory information. Moreover, the messages can be configured to focus on the root causes of the arising problem using the Zabbix Event correlation mechanism.
The use of templates: This feature allows you to Use out-of-the-box templates for most of the popular platforms and to Monitor thousands of similar devices by using configuration templates

Scalability

Zabbix uses proxies that send collected information in the environment it sits in a central Zabbix server. The Use of Zabbix proxies may greatly simplify the maintenance of an environment monitored by Zabbix and increase the performance of the central Zabbix server. This shows how the monitoring system can scale in a distributed fashion. Zabbix has an API and hence can be used to integrate it into any system in the infrastructure.

Official support of TimescaleDB

Installation guides:

5. Prometheus

According to Prometheus GitHub Page, it is a Cloud Native Computing Foundation project that monitors systems and services. It collects metrics from configured targets at given intervals, evaluates rule expressions, displays the results, and can trigger alerts if some condition is observed to be true.
It fits both machine-centric monitoring as well as monitoring of highly dynamic service-oriented architectures. For graphic visualizations, Prometheus supports tools such as Grafana for data visualization and export.

Top Features of Prometheus

It is a multi-dimensional data model (time-series defined by metric name and set of key/value dimensions)
A flexible query language to leverage this dimensionality
Has no dependency on distributed storage; single server nodes are autonomous
Timeseries collection happens via a pull model over HTTP
Pushing time-series is supported via an intermediary gateway
Targets are discovered via service discovery or static configuration
Multiple modes of graphing and dashboarding support
Support for hierarchical and horizontal federation

Installation guides:

6. Netdata

From their GitHub page, Netdata is distributed, real-time, performance and health monitoring for systems and applications. It is a highly optimized monitoring agent you install on all your systems and containers. It provides unparalleled insights, in real-time, of everything happening on the systems it runs (including web servers, databases, applications), using highly interactive web dashboards. Another cool feature about Netdata is that it can run autonomously, without any third-party components, or it can be integrated into existing monitoring toolchains such as Prometheus, Graphite, OpenTSDB, Kafka, Grafana, and others.

Netdata is a monitoring agent you install on all your systems. It is:

A metrics collector – for system and application metrics (including web servers, databases, containers, etc)
A time-series database – all stored in memory (does not touch the disks while it runs)
A metrics visualizer – super fast, interactive, modern, optimized for anomaly detection
An alarms notification engine – an advanced watchdog for detecting performance and availability issues

Features of Netdata

General Features

1s granularity – the highest possible resolution for all metrics.
Unlimited metrics – collects all the available metrics, the more the better.
1% CPU utilization of a single core – it is super fast, unbelievably optimized.
A few MB of RAM – by default it uses 25MB RAM. You size it.
Zero disk I/O – while it runs, it does not load or save anything (except error and access logs).
Zero configuration – auto-detects everything, it can collect up to 10000 metrics per server out of the box.
Zero maintenance – You just run it, it does the rest.
Zero dependencies – it is even its own web server, for its static web files and its web API.
Scales to infinity – you can install it on all your servers, containers, VMs and IoTs.
Several operating modes – Autonomous host monitoring (the default), headless data collector, forwarding proxy, store and forward proxy, central multi-host monitoring

Health Monitoring & Alarms

Sophisticated alerting – comes with hundreds of alarms, out of the box!
Notifications: Whether you use Telegram, Twilio, Email, kavenegar, messagebird, and others, then you are covered.

Visualization

Stunning interactive dashboards – mouse, touchpad and touch-screen friendly in dark and white themes
Amazingly fast visualization – responds to all queries in less than 1 ms per metric, even on low-end hardware.
Customizable – custom dashboards can be built using simple HTML (no javascript necessary).
Embeddable – its charts can be embedded on your web pages, wikis and blogs.

What it monitors

Netdata data collection is extensible – you can monitor anything you can get a metric for. APM (Application Performance Monitoring), System Resources, Disks, File systems, Networking, DNS Servers, Virtual Private Networks, Proxies, Balancers, Accelerators.

Installation guides:

Find more at Netdata Website and their step by step guide for more details.

7. Icinga 2

Icinga is a monitoring system which checks the availability of your network resources, notifies users of outages, and generates performance data for reporting. It is scalable and extensible and can monitor large, complex environments across multiple locations.

Features of Icinga 2

Reporting

The Icinga Reporting Module is the framework and foundation Icinga created to handle data collected by Icinga 2 and other data providers. It can display the data directly within the Icinga web interface or export it to PDF, JSON or CSV format. With scheduled reports, you can receive the prepared data periodically via email.

Graphs and Metrics

Icinga uses graphite for graphs and metrics. It is a time-series database storing collected metrics and making them available through restful APIs and web interfaces.

Visualization

You will get Maps, Business Process, Certificate Monitoring and a Dashing Dashboard.

Log Monitoring

You can use Logstash or Graylog in your infrastructure.

Notification Scripts and Interfaces.

There’s a variety of resources available, for example, different notification scripts such as:

E-Mail
SMS
Pager (XMPP, etc.)
Twitter
IRC
Ticket systems

Guides:

8. Cacti

From Cacti’s site, this tool “is a complete network graphing solution designed to harness the power of RRDTool’s data storage and graphing functionality. Cacti provide a fast poller, advanced graph templating, multiple data acquisition methods, and user management features out of the box. All of this is wrapped in an intuitive, easy to use interface that makes sense for LAN-sized installations up to complex networks with thousands of devices.”(Cacti.net, 2023).

Cacti harness the power of RRDtool which is an OpenSource industry-standard data logging and graphing system for time series data. This high-performance tool RRDtool can be easily and seamlessly integrated into scripting languages such as shell scripts, perl, python, ruby, lua or tcl applications.

Top features of Cacti include the following

Templates

Graph templates enable common graphs to be grouped together by templating. Every field for a normal graph can be templated or specified on a per-graph basis.

Data Gathering

Cacti has functionality for data input. This gives users the freedom to develop custom scripts for gathering data from the target devices. Nonetheless, it comes bundled with SNMP which is an industry data-gathering technology. What is more is that Cacti comes with a PHP-based poller having the benefits of executing scripts, retrieving SNMP data, and updating the RRD files

User Management

Cacti has this rich feature where multiple users with their accounts can be set up. The administrator has the flexibility of allocating a given portion of privileges to a given user.

Display of graphs

There are three different ways to view your graphs viz, tree view, list view, and preview view. These three views have their benefits, for example, the tree view gives users the ability to create hierarchies of graphs and also the chance to place those graphs on the tree. A large number of graphs can be managed this way. The list view as the name suggests is simply a list of the available graphs and links you to the actual graph when clicked. The last preview view gives a visual of all of the graphs in one large list where you can quickly peruse and look at the graphic graphs.

Templates

There are three different types of templates: Data Templates, Graph Templates, and Host Templates. It eases the burden of defining all data sources and graphs without using Templates at all which can be quite painful. The data template provides a skeleton for an actual data source. The Host Templates groups all Graph Templates and Data Query for a given device type. What is more exciting is that you do not need to create all Templates on your own. Templates can be found out of the box and there is a very simple feature where such templates can be imported into your cacti platform.

Alerting mechanisms

Cacti can be configured to send mail alerts in case pre-defined variables or thresholds have been exceeded or not achieved. This makes your nights awesome since you do not have to start looking for problems when those calls come in. It will pinpoint that a certain service is down or facing particular anomalies.

Reporting

Cacti can generate reports in accordance with your configuration.

9. Grafana

Grafana is a tool that gives you the power to query, visualize, alert on and understand your metrics no matter where they are stored. You get the chance to create, explore, and share dashboards with your team in an effort to foster a data-driven culture. In brief, Grafana is the open-source analytics and monitoring solution for every database.

Features

Visualize

Fast and flexible client-side graphs with a multitude of options. Panel plugins for many different ways to visualize metrics and logs.

Dynamic Dashboards

Create dynamic & reusable dashboards with template variables that appear as dropdowns at the top of the dashboard.

Explore Metrics

Explore your data through ad-hoc queries and dynamic drill-down. Split view and compare different time ranges, queries and data sources side by side.

Explore Logs

Experience the magic of switching from metrics to logs with preserved label filters. Quickly search through all your logs or streaming them live.

Alerting

Visually define alert rules for your most important metrics. Grafana will continuously evaluate and send notifications to systems like Slack, PagerDuty, VictorOps, OpsGenie.

Mixed Data Sources

Mix different data sources in the same graph! You can specify a data source on a per-query basis. This works for even custom data sources.

Annotations

Annotate graphs with rich events from different data sources. Hover over events shows you the full event metadata and tags.

Ad-hoc Filters

Ad-hoc filters allow you to create new key/value filters on the fly, which are automatically applied to all queries that use that data source.

10. Glances – An eye on your system

From its GitHub pagehttps://github.com/nicolargo/glances, Glances is a cross-platform monitoring tool which aims to present a large amount of monitoring information through a curses or Web-based interface. The information dynamically adapts depending on the size of the user interface.

Features of Glances

Cross-platform

Written in Python, Glances will run on almost any platform: GNU/Linux, FreeBSD, OS X, and Windows.

Export

Export all system statistics to CSV, InfluxDB, Cassandra, OpenTSDB, StatsD, ElasticSearch or even RabbitMQ. Glances also provide a dedicated Grafana dashboard.

Present a maximum of information in a minimum of space through a curses or Web based interface.

It can adapt dynamically the displayed information depending on the terminal size.

11. Sensu

From its GitHub page, Sensu is an open source monitoring tool for ephemeral infrastructure and distributed applications. It is an agent-based monitoring system with built-in auto-discovery, making it very well-suited for cloud environments. It uses service checks to monitor service health and collect telemetry data.

Server monitoring
Container monitoring
Real-time inventory
Health checks & custom metrics
Alerts & incident management
Automated remediation & custom workflows
200+ community plugins
Namespaces and RBAC
Basic authentication

Dashboard features

Real-time incident dashboard
Real-time inventory dashboard
Grafana Datasource
Multi-tenant dashboard (single-site)

Extensibility features

Custom plugins/scripts language support (e.g. C, C++, Golang, Ruby, Javascript/NodeJS, Rust, C#, Perl, Bash, etc)
Discovery, Inventory, Config Management APIs
Token-based API authentication (JWTs)

Services & support

Bonsai (hosted Sensu Asset Index & CDN)
Community support (Discourse, Slack)

It should be noted that there is an enterprise version of Sensu that contains many more features. You can find out more about it in this Enterprise Sensu Link.

Conclusion

Now the choice of tool to use belongs to you. Check them out and have wonderful monitoring in the year. Before you leave, you can go over other sweet guides below.

1. Checkmk

Features of Checkmk

Ready within minutes

Monitoring out-of-the-box

Receive notifications when you need them

Future-proof

Integrate your stack

Visualization

Choosing the right monitoring machine for your IT

Installation guides

2. LibreNMS

Features of LibreNMS

Alerts

It can scale

Mobile Apps

Billing system

API Integration

3. Nagios

Monitoring of a large number of devices

Multi-tenancy

Reporting

Notifications

4. Zabbix

Zabbix contains many features and we shall go over them in a nutshell.

Monitors anything

Next generation Zabbix Agent

Collection of Metrics

Detection of anomalies in your set-up

Better visualization presentation

Notifications

Scalability

Official support of TimescaleDB

5. Prometheus

Top Features of Prometheus

6. Netdata

Features of Netdata

General Features

Health Monitoring & Alarms

Visualization

What it monitors

7. Icinga 2

Features of Icinga 2

Reporting

Graphs and Metrics

Visualization

Log Monitoring

Notification Scripts and Interfaces.

8. Cacti

Top features of Cacti include the following

Templates

Data Gathering

User Management

Display of graphs

Templates

Alerting mechanisms

Reporting

9. Grafana

Features

Visualize

Dynamic Dashboards

Explore Metrics

Explore Logs

Alerting

Mixed Data Sources

Annotations

Ad-hoc Filters

10. Glances – An eye on your system

Features of Glances

Cross-platform

Export

Present a maximum of information in a minimum of space through a curses or Web based interface.

11. Sensu

Dashboard features

Extensibility features

Services & support

Conclusion

Keep reading

Leave a Comment Cancel reply