Monitoring Production Kubernetes Cluster(s) is an important and progressive operation for any Cluster Administrator. There are myriad of solutions that fall into the category of Kubernetes monitoring stack, and some of them are Prometheus and Grafana. This guide is created with an intention of guiding Kubernetes users to Setup Prometheus and Grafana on Kubernetes using prometheus-operator.
Prometheus is a full fledged solution that enables Developers and SysAdmins to access advanced metrics capabilities in Kubernetes. The metrics are collected in a time internal of 30 seconds, this is a default settings. The information collected include resources such as Memory, CPU, Disk Performance and Network IO as well as R/W rates. By default the metrics are exposed on your cluster for up to a period of 14 days, but the settings can be adjusted to suit your environment.
Grafana is used for analytics and interactive visualization of metrics that’s collected and stored in Prometheus database. You can create custom charts, graphs, and alerts for Kubernetes cluster, with Prometheus being data source. In this guide we will perform installation of both Prometheus and Grafana on a Kubernetes Cluster. For this setup kubectl configuration is required, with Cluster Admin role binding.
We will be using Prometheus Operator in this installation to deploy Prometheus monitoring stack on Kubernetes. The Prometheus Operator is written to ease the deployment and overall management of Prometheus and its related monitoring components. By using the Operator we simplify and automate Prometheus configuration on any any Kubernetes cluster using Kubernetes custom resources.
The diagram below shows the components of the Kubernetes monitoring that we’ll deploy:
The Operator uses the following custom resource definitions (CRDs) to deploy and configure Prometheus monitoring stack:
- Prometheus – This defines a desired Prometheus deployment on Kubernetes
- Alertmanager – This defines a desired Alertmanager deployment on Kubernetes cluster
- ThanosRuler – This defines Thanos desired Ruler deployment.
- ServiceMonitor – Specifies how groups of Kubernetes services should be monitored
- PodMonitor – Declaratively specifies how group of pods should be monitored
- Probe – Specifies how groups of ingresses or static targets should be monitored
- PrometheusRule – Provides specification of Prometheus alerting desired set. The Operator generates a rule file, which can be used by Prometheus instances.
- AlertmanagerConfig – Declaratively specifies subsections of the Alertmanager configuration, allowing routing of alerts to custom receivers, and setting inhibit rules.
Deploy Prometheus / Grafana Monitoring Stack on Kubernetes
To get a complete an entire monitoring stack we will use kube-prometheus project which includes Prometheus Operator among its components. The kube-prometheus stack is meant for cluster monitoring and is pre-configured to collect metrics from all Kubernetes components, with a default set of dashboards and alerting rules.
You should have kubectl configured and confirmed to be working:
$ kubectl cluster-info Kubernetes control plane is running at https://192.168.10.12:6443 CoreDNS is running at https://192.168.10.12:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
Step 1: Clone kube-prometheus project
Use git command to clone kube-prometheus project to your local system:
git clone https://github.com/prometheus-operator/kube-prometheus.git
Navigate to the kube-prometheus directory:
Step 2: Create monitoring namespace, CustomResourceDefinitions & operator pod
Create a namespace and required CustomResourceDefinitions:
kubectl create -f manifests/setup
Command execution results as seen in the terminal screen.
namespace/monitoring created customresourcedefinition.apiextensions.k8s.io/alertmanagerconfigs.monitoring.coreos.com created customresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com created customresourcedefinition.apiextensions.k8s.io/podmonitors.monitoring.coreos.com created customresourcedefinition.apiextensions.k8s.io/probes.monitoring.coreos.com created customresourcedefinition.apiextensions.k8s.io/prometheuses.monitoring.coreos.com created customresourcedefinition.apiextensions.k8s.io/prometheusrules.monitoring.coreos.com created customresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com created customresourcedefinition.apiextensions.k8s.io/thanosrulers.monitoring.coreos.com created clusterrole.rbac.authorization.k8s.io/prometheus-operator created clusterrolebinding.rbac.authorization.k8s.io/prometheus-operator created deployment.apps/prometheus-operator created service/prometheus-operator created serviceaccount/prometheus-operator created
The namespace created with CustomResourceDefinitions is named monitoring:
$ kubectl get ns monitoring NAME STATUS AGE monitoring Active 2m41s
Confirm that Prometheus operator pods are running:
$ kubectl get pods -n monitoring NAME READY STATUS RESTARTS AGE prometheus-operator-84dc795dc8-jbgjm 2/2 Running 0 91s
Step 3: Deploy Prometheus Monitoring Stack on Kubernetes
Once you confirm the Prometheus operator is running you can go ahead and deploy Prometheus monitoring stack.
kubectl create -f manifests/
Here is my deployment progress output:
poddisruptionbudget.policy/alertmanager-main created prometheusrule.monitoring.coreos.com/alertmanager-main-rules created secret/alertmanager-main created service/alertmanager-main created serviceaccount/alertmanager-main created servicemonitor.monitoring.coreos.com/alertmanager created clusterrole.rbac.authorization.k8s.io/blackbox-exporter created clusterrolebinding.rbac.authorization.k8s.io/blackbox-exporter created configmap/blackbox-exporter-configuration created deployment.apps/blackbox-exporter created service/blackbox-exporter created serviceaccount/blackbox-exporter created servicemonitor.monitoring.coreos.com/blackbox-exporter created secret/grafana-datasources created configmap/grafana-dashboard-alertmanager-overview created configmap/grafana-dashboard-apiserver created configmap/grafana-dashboard-cluster-total created configmap/grafana-dashboard-controller-manager created configmap/grafana-dashboard-k8s-resources-cluster created configmap/grafana-dashboard-k8s-resources-namespace created configmap/grafana-dashboard-k8s-resources-node created configmap/grafana-dashboard-k8s-resources-pod created configmap/grafana-dashboard-k8s-resources-workload created configmap/grafana-dashboard-k8s-resources-workloads-namespace created configmap/grafana-dashboard-kubelet created configmap/grafana-dashboard-namespace-by-pod created configmap/grafana-dashboard-namespace-by-workload created configmap/grafana-dashboard-node-cluster-rsrc-use created configmap/grafana-dashboard-node-rsrc-use created configmap/grafana-dashboard-nodes created configmap/grafana-dashboard-persistentvolumesusage created configmap/grafana-dashboard-pod-total created configmap/grafana-dashboard-prometheus-remote-write created configmap/grafana-dashboard-prometheus created configmap/grafana-dashboard-proxy created configmap/grafana-dashboard-scheduler created configmap/grafana-dashboard-workload-total created configmap/grafana-dashboards created deployment.apps/grafana created service/grafana created serviceaccount/grafana created servicemonitor.monitoring.coreos.com/grafana created prometheusrule.monitoring.coreos.com/kube-prometheus-rules created clusterrole.rbac.authorization.k8s.io/kube-state-metrics created clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created deployment.apps/kube-state-metrics created prometheusrule.monitoring.coreos.com/kube-state-metrics-rules created service/kube-state-metrics created serviceaccount/kube-state-metrics created servicemonitor.monitoring.coreos.com/kube-state-metrics created prometheusrule.monitoring.coreos.com/kubernetes-monitoring-rules created servicemonitor.monitoring.coreos.com/kube-apiserver created servicemonitor.monitoring.coreos.com/coredns created servicemonitor.monitoring.coreos.com/kube-controller-manager created servicemonitor.monitoring.coreos.com/kube-scheduler created servicemonitor.monitoring.coreos.com/kubelet created clusterrole.rbac.authorization.k8s.io/node-exporter created clusterrolebinding.rbac.authorization.k8s.io/node-exporter created daemonset.apps/node-exporter created prometheusrule.monitoring.coreos.com/node-exporter-rules created service/node-exporter created serviceaccount/node-exporter created servicemonitor.monitoring.coreos.com/node-exporter created clusterrole.rbac.authorization.k8s.io/prometheus-adapter created clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created clusterrolebinding.rbac.authorization.k8s.io/prometheus-adapter created clusterrolebinding.rbac.authorization.k8s.io/resource-metrics:system:auth-delegator created clusterrole.rbac.authorization.k8s.io/resource-metrics-server-resources created configmap/adapter-config created deployment.apps/prometheus-adapter created poddisruptionbudget.policy/prometheus-adapter created rolebinding.rbac.authorization.k8s.io/resource-metrics-auth-reader created service/prometheus-adapter created serviceaccount/prometheus-adapter created servicemonitor.monitoring.coreos.com/prometheus-adapter created clusterrole.rbac.authorization.k8s.io/prometheus-k8s created clusterrolebinding.rbac.authorization.k8s.io/prometheus-k8s created prometheusrule.monitoring.coreos.com/prometheus-operator-rules created servicemonitor.monitoring.coreos.com/prometheus-operator created poddisruptionbudget.policy/prometheus-k8s created prometheus.monitoring.coreos.com/k8s created prometheusrule.monitoring.coreos.com/prometheus-k8s-prometheus-rules created rolebinding.rbac.authorization.k8s.io/prometheus-k8s-config created rolebinding.rbac.authorization.k8s.io/prometheus-k8s created rolebinding.rbac.authorization.k8s.io/prometheus-k8s created rolebinding.rbac.authorization.k8s.io/prometheus-k8s created role.rbac.authorization.k8s.io/prometheus-k8s-config created role.rbac.authorization.k8s.io/prometheus-k8s created role.rbac.authorization.k8s.io/prometheus-k8s created role.rbac.authorization.k8s.io/prometheus-k8s created service/prometheus-k8s created serviceaccount/prometheus-k8s created servicemonitor.monitoring.coreos.com/prometheus-k8s created
Give it few seconds and the pods should start coming online. This can be checked with the commands below:
$ kubectl get pods -n monitoring NAME READY STATUS RESTARTS AGE alertmanager-main-0 2/2 Running 0 113s alertmanager-main-1 2/2 Running 0 113s alertmanager-main-2 2/2 Running 0 113s blackbox-exporter-6c95587d7-2vf28 3/3 Running 0 113s grafana-9b54884bf-9s82l 1/1 Running 0 112s kube-state-metrics-b545789dd-27xg4 3/3 Running 0 111s node-exporter-cbjx5 2/2 Running 0 111s node-exporter-fs2vj 2/2 Running 0 111s node-exporter-gswkl 2/2 Running 0 111s node-exporter-hxv7l 2/2 Running 0 111s node-exporter-ktnd8 2/2 Running 0 111s prometheus-adapter-5c977869c-7mhz2 1/1 Running 0 111s prometheus-adapter-5c977869c-8fndf 1/1 Running 0 111s prometheus-k8s-0 2/2 Running 1 109s prometheus-k8s-1 2/2 Running 1 109s prometheus-operator-84dc795dc8-jbgjm 2/2 Running 0 7m37s
To list all the services created you’ll run the command:
$ kubectl get svc -n monitoring NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE alertmanager-main ClusterIP 10.254.220.101 <none> 9093/TCP 3m20s alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 3m20s blackbox-exporter ClusterIP 10.254.41.39 <none> 9115/TCP,19115/TCP 3m20s grafana ClusterIP 10.254.226.247 <none> 3000/TCP 3m19s kube-state-metrics ClusterIP None <none> 8443/TCP,9443/TCP 3m19s node-exporter ClusterIP None <none> 9100/TCP 3m18s prometheus-adapter ClusterIP 10.254.193.17 <none> 443/TCP 3m18s prometheus-k8s ClusterIP 10.254.92.43 <none> 9090/TCP 3m17s prometheus-operated ClusterIP None <none> 9090/TCP 3m17s prometheus-operator ClusterIP None <none> 8443/TCP 9m4s
Step 4: Access Prometheus, Grafana, and Alertmanager dashboards
We now have the monitoring stack deployed, but how can we access the dashboards of Grafana, Prometheus and Alertmanager?. There are two ways to achieve this;
First method: Accessing Prometheus UI and Grafana dashboards using kubectl proxy
An easy way to access Prometheus, Grafana, and Alertmanager dashboards is by using
kubectl port-forward once all the services are running:
kubectl --namespace monitoring port-forward svc/grafana 3000
Then access Grafana dashboard on your local browser on URL: http://localhost:3000
Default Logins are:
Username: admin Password: admin
You’re required to change the password on first login:
For Prometheus port forwarding run the commands below:
kubectl --namespace monitoring port-forward svc/prometheus-k8s 9090
And web console is accessible through the URL: http://localhost:9090
Alert Manager Dashboard
For Dashboard Alert Manager Dashboard:
kubectl --namespace monitoring port-forward svc/alertmanager-main 9093
Access URL is http://localhost:9093
Second method: Accessing Prometheus UI and Grafana dashboard using NodePort (Only for private clusters)
To access Prometheus, Grafana, and Alertmanager dashboards using one of the worker nodes IP address and a port you’ve to edit the services and set the type to NodePort.
The Node Port method is only recommended for local clusters not exposed to the internet. The basic reason for this is insecurity of Prometheus/Alertmanager services.
$ kubectl --namespace monitoring edit svc/prometheus-k8s #Update inside spec section spec: type: NodePort
$ kubectl --namespace monitoring edit svc/alertmanager-main #Update inside spec section spec: type: NodePort
$ kubectl --namespace monitoring edit svc/grafana #Update inside spec section spec: type: NodePort
Confirm that the each of the services have a Node Port assigned:
$ kubectl -n monitoring get svc | grep NodePort alertmanager-main NodePort 10.254.220.101 <none> 9093:31237/TCP 45m grafana NodePort 10.254.226.247 <none> 3000:31123/TCP 45m prometheus-k8s NodePort 10.254.92.43 <none> 9090:32627/TCP 45m
In this example we can access the services as below:
# Grafana http://node_ip:31123 # Prometheus http://node_ip:31123 # Alert Manager http://node_ip:31237
An example of default grafana dashboard showing cluster-wide compute resource usage.
Destroying down Prometheus monitoring stack
If at some point you feel like tearing down Prometheus Monitoring stack in your Kubernetes Cluster, you can run kubectl delete command and pass the path to the manifest files we used during deployment.
kubectl delete --ignore-not-found=true -f manifests/ -f manifests/setup
Within some few minutes the stack is deleted and you can re-deploy if that was the intention.