Monitoring

Monitor VMware ESXi with Grafana and Telegraf

Josphat Mutai Updated Oct 4, 2022 · 6 min read

On this page

Step 1: Install InfluxDB and Grafana
Step 2: Install and Configure Telegraf
Step 3: Check InfluxDB Metrics
Step 3: Add InfluxDB Data Source to Grafana
Step 4: Import Grafana Dashboards

Greetings guys!. Here I present to you the most efficient and amazing way to Monitor your VMware ESXi infrastructure with Grafana, Telegraf, and InfluxDB. The setup is pretty straightforward and you should have your VMware metrics visualized on Grafana in less than 30 minutes. Our last VMware monitoring was on How To Monitor VMware ESXi Host Using LibreNMS.

Original content from computingforgeeks.com - post 13423

This setup uses an official vSphere plugin for Telegraf to pull metrics from vCenter. This includes metrics for vSphere hosts compute(RAM&CPU), Networking, Datastores and Virtual Machines running on vSphere hypervisors. So let’s get started.

Step 1: Install InfluxDB and Grafana

All collected metrics are stored in InfluxDB database. Grafana will connect to InfluxDB to query and display metrics on its dashboards. You need to install both InfluxDB and Grafana before other stuff.

How to install InfluxDB on Ubuntu , Debian and on CentOS
How to Install Grafana on Ubuntu and CentOS

Once both InfluxDB and Grafana are installed, proceed to install and configure Telegraf which is a powerful metrics collector written in Go.

Step 2: Install and Configure Telegraf

If you used links on step 1 to install InfluxDB, the repository required for Telegraf installation was added. Just use the following commands to install Telegraf.

# CentOS
sudo yum -y install telegraf

# Ubuntu
sudo apt-get -y install telegraf

After installation, we need to configure Telegraf to pull Monitoring metrics from vCenter. Edit Telegraf main configuration file:

sudo vim /etc/telegraf/telegraf.conf

1. Add InfluxDB output storage backend where metrics will be stored.

# Configuration for sending metrics to InfluxDB
[[outputs.influxdb]]
    urls = ["http://10.10.1.20:8086"]
    database = "vmware"
    timeout = "0s"
    username = "monitoring"
    password = "DBPassword"

Replace 10.10.1.20 with your InfluxDB server IP address. if you don’t have authentication enabled on InfluxDB, you can safely remove the username and password line in the configuration.

2. Configure vsphere input plugin for Telegraf. The complete configuration should look similar to this:


# Read metrics from VMware vCenter
 [[inputs.vsphere]]
 ## List of vCenter URLs to be monitored. These three lines must be uncommented
 ## and edited for the plugin to work.
 vcenters = [ "https://10.10.1.2/sdk" ]
    username = "[email protected]"
    password = "AdminPassword"
 #
 ## VMs
 ## Typical VM metrics (if omitted or empty, all metrics are collected)
 vm_metric_include = [
      "cpu.demand.average",
      "cpu.idle.summation",
      "cpu.latency.average",
      "cpu.readiness.average",
      "cpu.ready.summation",
      "cpu.run.summation",
      "cpu.usagemhz.average",
      "cpu.used.summation",
      "cpu.wait.summation",
      "mem.active.average",
      "mem.granted.average",
      "mem.latency.average",
      "mem.swapin.average",
      "mem.swapinRate.average",
      "mem.swapout.average",
      "mem.swapoutRate.average",
      "mem.usage.average",
      "mem.vmmemctl.average",
      "net.bytesRx.average",
      "net.bytesTx.average",
      "net.droppedRx.summation",
      "net.droppedTx.summation",
      "net.usage.average",
      "power.power.average",
      "virtualDisk.numberReadAveraged.average",
      "virtualDisk.numberWriteAveraged.average",
      "virtualDisk.read.average",
      "virtualDisk.readOIO.latest",
      "virtualDisk.throughput.usage.average",
      "virtualDisk.totalReadLatency.average",
      "virtualDisk.totalWriteLatency.average",
      "virtualDisk.write.average",
      "virtualDisk.writeOIO.latest",
      "sys.uptime.latest",
    ]
 # vm_metric_exclude = [] ## Nothing is excluded by default
 # vm_instances = true ## true by default
 #
 ## Hosts
 ## Typical host metrics (if omitted or empty, all metrics are collected)
 host_metric_include = [
      "cpu.coreUtilization.average",
      "cpu.costop.summation",
      "cpu.demand.average",
      "cpu.idle.summation",
      "cpu.latency.average",
      "cpu.readiness.average",
      "cpu.ready.summation",
      "cpu.swapwait.summation",
      "cpu.usage.average",
      "cpu.usagemhz.average",
      "cpu.used.summation",
      "cpu.utilization.average",
      "cpu.wait.summation",
      "disk.deviceReadLatency.average",
      "disk.deviceWriteLatency.average",
      "disk.kernelReadLatency.average",
      "disk.kernelWriteLatency.average",
      "disk.numberReadAveraged.average",
      "disk.numberWriteAveraged.average",
      "disk.read.average",
      "disk.totalReadLatency.average",
      "disk.totalWriteLatency.average",
      "disk.write.average",
      "mem.active.average",
      "mem.latency.average",
      "mem.state.latest",
      "mem.swapin.average",
      "mem.swapinRate.average",
      "mem.swapout.average",
      "mem.swapoutRate.average",
      "mem.totalCapacity.average",
      "mem.usage.average",
      "mem.vmmemctl.average",
      "net.bytesRx.average",
      "net.bytesTx.average",
      "net.droppedRx.summation",
      "net.droppedTx.summation",
      "net.errorsRx.summation",
      "net.errorsTx.summation",
      "net.usage.average",
      "power.power.average",
      "storageAdapter.numberReadAveraged.average",
      "storageAdapter.numberWriteAveraged.average",
      "storageAdapter.read.average",
      "storageAdapter.write.average",
      "sys.uptime.latest",
    ]
 # host_metric_exclude = [] ## Nothing excluded by default
 # host_instances = true ## true by default
 #
 ## Clusters
 cluster_metric_include = [] ## if omitted or empty, all metrics are collected
 # cluster_metric_exclude = [] ## Nothing excluded by default
 # cluster_instances = false ## false by default
 #
 ## Datastores
 datastore_metric_include = [] ## if omitted or empty, all metrics are collected
 # datastore_metric_exclude = [] ## Nothing excluded by default
 # datastore_instances = false ## false by default for Datastores only
 #
 ## Datacenters
 datacenter_metric_include = [] ## if omitted or empty, all metrics are collected
# datacenter_metric_exclude = [ "*" ] ## Datacenters are not collected by default.
 # datacenter_instances = false ## false by default for Datastores only
 #
 ## Plugin Settings
 ## separator character to use for measurement and field names (default: "_")
 # separator = "_"
 #
 ## number of objects to retreive per query for realtime resources (vms and hosts)
 ## set to 64 for vCenter 5.5 and 6.0 (default: 256)
 # max_query_objects = 256
 #
 ## number of metrics to retreive per query for non-realtime resources (clusters and datastores)
 ## set to 64 for vCenter 5.5 and 6.0 (default: 256)
 # max_query_metrics = 256
 #
 ## number of go routines to use for collection and discovery of objects and metrics
 # collect_concurrency = 1
 # discover_concurrency = 1
 #
 ## whether or not to force discovery of new objects on initial gather call before collecting metrics
 ## when true for large environments this may cause errors for time elapsed while collecting metrics
 ## when false (default) the first collection cycle may result in no or limited metrics while objects are discovered
 # force_discover_on_init = false
 #
 ## the interval before (re)discovering objects subject to metrics collection (default: 300s)
 # object_discovery_interval = "300s"
 #
 ## timeout applies to any of the api request made to vcenter
 # timeout = "60s"
 #
 ## Optional SSL Config
 # ssl_ca = "/path/to/cafile"
 # ssl_cert = "/path/to/certfile"
 # ssl_key = "/path/to/keyfile"
 ## Use SSL but skip chain & host verification
 insecure_skip_verify = true

The only variables to change on your end are:

10.10.1.2 should be replaced with the vCenter IP address
[email protected] should match your vCenter user account
AdminPassword with the password to authenticate with

If your vCenter server has a self-signed certificate, make sure you turn insecure_skip_verify flag to true.

insecure_skip_verify = true

Start and enable telegraf service after making the changes.

sudo systemctl restart telegraf
sudo systemctl enable telegraf

Step 3: Check InfluxDB Metrics

We need to confirm that our metrics are being pushed to InfluxDB and that we can see them.

Open InfluxDB shell:

With Authentication:

$ influx -username 'username' -password 'DBPassword'
Connected to http://localhost:8086 version 1.6.4
InfluxDB shell version: 1.6.4

‘username‘ – InfluxDB authentication username
‘StrongPassword‘ – InfluxDB password

Without Authentication:

$ influx
Connected to http://localhost:8086 version 1.6.4
InfluxDB shell version: 1.6.4

Switch to vmware database we configured on telegraf.

> USE vmware
Using database vmware

Check if there is inflow of time series metrics.

> SHOW MEASUREMENTS
name: measurements
name
----
cpu
disk
diskio
kernel
mem
processes
swap
system
vsphere_cluster_clusterServices
vsphere_cluster_mem
vsphere_cluster_vmop
vsphere_datacenter_vmop
vsphere_datastore_datastore
vsphere_datastore_disk
vsphere_host_cpu
vsphere_host_disk
vsphere_host_mem
vsphere_host_net
vsphere_host_power
vsphere_host_storageAdapter
vsphere_host_sys
vsphere_vm_cpu
vsphere_vm_mem
vsphere_vm_net
vsphere_vm_power
vsphere_vm_sys
vsphere_vm_virtualDisk
>

Step 3: Add InfluxDB Data Source to Grafana

Login to Grafana and add InfluxDB data source – Specify server IP, database name and authentication credentials if applicable.

Give it a name, choose type, specify server IP.

Provide database name and authentication credentials if applicable.

Save and test settings.

Step 4: Import Grafana Dashboards

We have configured all dependencies and test to be working. The last action is to create or import Grafana dashboards that will display vSphere metrics.

In this post, we will use great Grafana dashboards created by Jorge de la Cruz.

On successful imports, you should start seeing data appearing on the dashboards.

The visualization may need your little extra effort to get perfect displays for your environment and specific metrics to be shown.

VMware Learning Materials:

Check out other Grafana related articles available on our blog.

Monitor Zimbra Server with Grafana, Influxdb and Telegraf

Monitor Redis Server with Prometheus and Grafana in 5 minutes

Monitoring Ceph Cluster with Prometheus and Grafana

How to Monitor BIND DNS server with Prometheus and Grafana

Monitoring MySQL / MariaDB with Prometheus in five minutes

Monitor Apache Web Server with Prometheus and Grafana in 5 minutes

Josphat Mutai

Founder of Computingforgeeks. Senior Systems Engineer with over a decade of hands-on experience building and managing production infrastructure. Core expertise in Linux/UNIX administration, cloud platforms (AWS, GCP, Azure), and Kubernetes ecosystems (EKS, GKE, OpenShift). Deep experience with virtualization (KVM, Proxmox, OpenStack), infrastructure as code (Terraform, Terragrunt, Ansible, Crossplane), GitOps workflows (ArgoCD, FluxCD), Nix/NixOS, and container orchestration at scale. Currently focused on integrating AI and LLMs into DevOps and platform engineering workflows. Every guide on this site is tested on real systems before publishing.

Keep reading

Virtualization Install KVM and Virt-Manager on Arch Linux

KVM Virsh Commands Cheatsheet for KVM Virtual Machine Management

KVM Install VirtIO Drivers on Windows Server 2025 / Windows 11

Storage Build a Ceph Storage Cluster for Your Home Lab (3-node, NVMe, 10GbE)

Proxmox Build a Proxmox Server: Homelab Virtualization Host

KVM Install OpenNebula KVM Node on Rocky 10 / AlmaLinux 10

21 thoughts on “Monitor VMware ESXi with Grafana and Telegraf”

Dennis Faucher

March 25, 2021 at 10:59 pm

Nice telegraf.conf. I changed from the default “all metrics” to yours and my network usage for Influx dropped from 25 Mb/s to 12 Mb/s. Thank you.
Reply
chad

January 13, 2022 at 10:32 pm

Scratching my head to figure out what I am doing wrong. I installed everything on an Ubuntu VM using no authentication. Everything seems to have installed and setup properly. However in Grafana I am getting errors for each dashboard. It seems like Influx is collecting the data. I intentionally did it as simple a setup as possible just to test it out. I literally can’t figure out what else to check. Screenshot of error I get from each dashboard https://www.screencast.com/t/u9ory6DVC

I have the self signed SSL cert set to true, double checked my IP’s and vcenter password. New to Influx/Grafana. Any suggestions would be amazing. Thank you for such a killer post BTW
Reply
Van Ngoc Tang

April 17, 2022 at 1:12 pm

Hi Josphat Mutai
I’m following this tut.
But when i’m import Dashboard. some thing when wrong bellow
—
Templating [vcenter]
Error updating options: InfluxDB Error: error parsing query: found FROM, expected SELECT, DELETE, SHOW, CREATE, DROP, EXPLAIN, GRANT, REVOKE, ALTER, SET, KILL at line 1, char 1
—–
Does you have any suggestion for me ?
Reply
- HUng Thinh
  
  September 7, 2022 at 9:58 am
  
  me too
  Reply
- Nico
  
  November 1, 2022 at 1:50 pm
  
  Same error here, did you find the way 😉 ?
  Reply
  - John Williams
    
    November 20, 2022 at 9:09 am
    
    I am having a similar issue. Have you gotten it resolved?
    Reply
Timur

May 3, 2022 at 10:25 am

Templating [vcenter]
Error updating options: InfluxDB Error: error parsing query: found FROM, expected SELECT, DELETE, SHOW, CREATE, DROP, EXPLAIN, GRANT, REVOKE, ALTER, SET, KILL at line 1, char 1.
I Have the same problem.
Reply
zSprawl

August 28, 2022 at 6:52 am

If anyone is having problems with maxQuerySize, here is the VMware article needed to raise the value. I didn’t have luck with -1 for unlimited, so I just set the maxQuerySize to something large (aka 2100 in my case).
Reply
- zSprawl
  
  August 28, 2022 at 6:53 am
  
  Hmm, I missed the link: https://kb.vmware.com/s/article/2107096
  Reply
HUng Thinh

September 7, 2022 at 9:56 am

It not working, when I run: SHOW MEASUREMENTS, nothing display
Reply
Dayton Jones

September 27, 2022 at 3:53 am

I’m getting:
”
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x50 pc=0x330a572]

goroutine 66 [running]:
github.com/influxdata/telegraf/plugins/inputs/vsphere.(*Endpoint).Collect(0x0, {0x66e3850?, 0xc0000780a0}, {0x66ff760?, 0xc0005ca540})
/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/endpoint.go:891 +0x72
github.com/influxdata/telegraf/plugins/inputs/vsphere.(*VSphere).Gather.func1(0xc0000cd810?)
/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/vsphere.go:138 +0x78
created by github.com/influxdata/telegraf/plugins/inputs/vsphere.(*VSphere).Gather
/go/src/github.com/influxdata/telegraf/plugins/inputs/vsphere/vsphere.go:136 +0x65
”

OS: Ubuntu 20.04
Influx: 1.8
Telegraf: 1.24.1
Reply
Nabeel

March 30, 2023 at 2:41 pm

I have configured my Vcenter monitoring with Influxdb and telegraf and visualized over the Grafana dashboard.
Now I just want to create an alert rule if any of my VM or host goes down So the email gets triggered.
Contact point and email is already configured.
Reply
Ivaan

July 24, 2023 at 11:54 am

InfluxDB Error: error parsing query: found FROM, expected SELECT, DELETE, SHOW, CREATE, DROP, EXPLAIN, GRANT, REVOKE, ALTER, SET, KILL at line 1, char 1

Any fix for above error ?
Reply
John

September 25, 2023 at 10:06 am

When Open InfluxDB shell and give the first command:
influx -username ‘username’ -password ‘DBPassword’
Return:
ERR: 400 Bad Request: failed to parse query: found influx, expected SELECT, DELETE, SHOW, CREATE, DROP, EXPLAIN, GRANT, REVOKE, ALTER, SET, KILL at line 1, char 1
Any Solution?
Reply
- Josphat Mutai
  
  September 25, 2023 at 4:46 pm
  
  You’re using InfluxDB 1.x or version 2.x?
  Reply
  - John
    
    September 25, 2023 at 5:11 pm
    
    Version 2
    Reply
john

September 25, 2023 at 5:08 pm

Finally
When i type command
$ influx
nothing happen cannot connect
Is there any changes with versions?
Reply
john

September 25, 2023 at 5:09 pm

Version 2
Reply
John

September 25, 2023 at 5:13 pm

InfluxDB v2.7.1 (git: 407fa622e9)
Reply
John

September 27, 2023 at 6:38 pm

Hallo again
I try all these days without success.
Is there any guide with new updates? Influxv2?
I have an esxi server with 7 vms.
Thank you
Reply
Flitox

December 13, 2023 at 2:14 am

I really don’t understand why I don’t see anything on the dashboards in Grafana. I have “N/A” everywhere!
In the influx shell, I have the metrics coming back, but nothing in Grafana, which has an idea, it’s going crazy!!!
Reply