One of the most important tasks of a system administrator is to ensure that the system and all the available packages are updated to their latest versions. Even after adding the nodes to a Kubernetes cluster, we still need to manage the node updates. In most situations, once the updates(e.g., kernel updates, system maintenance, or hardware changes) have been obtained, you need to reboot the node for the changes to apply. This might be even hard on Kubernetes because you need to ensure that the running applications are gracefully terminated or migrated to other nodes before the reboot.
Kured (KUbernetes REboot Daemon) is a tool designed to simplify the process of rebooting nodes in a Kubernetes cluster. It helps manage node reboots in a controlled and automated manner, minimizing disruptions to applications running on the cluster.
It automates the reboot process by following the below process:
- Monitor Node Health: Kured continuously monitors the health of nodes in the Kubernetes cluster. It checks the availability and health status of each node. It also watches for the presence of a reboot sentinel file e.g. /var/run/reboot-required or the successful run of a sentinel command.
- Drain Node: When a node needs to be rebooted, Kured triggers the “drain” process for that node. The “drain” process gracefully terminates the running pods on the node and ensures that the workload is migrated to other available nodes in the cluster. This ensures that there is no disruption to the applications running on the cluster. Utilises a lock in the API server to ensure only one node reboots at a time
- Reboot Node: After the node has been drained and all the workload has been migrated, Kured initiates the reboot of the node.
- Node Recovery: Once the node is back online after the reboot, Kured ensures that the node is marked as available and ready to accept new pods.
Kured provides a configurable and automated solution for managing node reboots in a Kubernetes cluster. It helps simplify the process and reduces the operational complexity associated with performing node reboots, ensuring minimal impact on the availability and stability of applications running in the cluster.
Today, we will learn how to perform safe & automatic Node Reboots on Kubernetes with Kured.
Getting Started
For this guide, I assume you already have a Kubernetes cluster running. If not, you can use any of the dedicated guides below to set up a Kubernetes cluster:
- Deploy HA Kubernetes Cluster on Rocky Linux 8 using RKE2
- Run Kubernetes on Debian with Minikube
- Deploy Kubernetes Cluster on Linux With k0s
- Install Kubernetes Cluster on Ubuntu using K3s
- Install Kubernetes Cluster on Rocky Linux 8 with Kubeadm & CRI-O
- Deploy k0s Kubernetes on Rocky Linux 9 using k0sctl
- Install Minikube Rocky Linux 9 and Create Kubernetes Cluster
Once set up, you also need kubectl. To install it, use the commands:
curl -LO "https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl"
chmod +x kubectl
sudo mv kubectl /usr/local/bin
You need to export the admin config to access the cluster:
##For RKE2
export PATH=$PATH:/var/lib/rancher/rke2/bin export KUBECONFIG=/etc/rancher/rke2/rke2.yaml
##For K0s
export KUBECONFIG=/var/lib/k0s/pki/admin.conf
Verify if you can access the cluster:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
demo-compute-1 Ready <none> 17h v1.27.2
demo-compute-2 Ready <none> 17h v1.27.2
demo-k8s-controlplane Ready control-plane 17h v1.27.2
#1. Install kured (Kubernetes Reboot Daemon) on Kubernetes
Kured is a Kubernetes daemonset that automates safe and controlled node reboots based on indications from the package management system of the underlying operating system. It ensures that nodes in the cluster are rebooted in a coordinated manner without disrupting the availability of running applications.
To install it, obtain a default manifest without Prometheus alerting interlock or Slack notifications with the commands:
latest=$(curl -s https://api.github.com/repos/kubereboot/kured/releases | jq -r '.[0].tag_name')
wget https://github.com/kubereboot/kured/releases/download/$latest/kured-$latest-dockerhub.yaml
Once downloaded. open the file for editing:
vim kured-$latest-dockerhub.yaml
Now there are quite a number of configurations you can make here. The available flags are:
Kubernetes Reboot Daemon
Usage:
kured [flags]
Flags:
--alert-filter-regexp regexp.Regexp alert names to ignore when checking for active alerts
--alert-firing-only only consider firing alerts when checking for active alerts
--annotate-nodes if set, the annotations 'weave.works/kured-reboot-in-progress' and 'weave.works/kured-most-recent-reboot-needed' will be given to nodes undergoing kured reboots
--blocking-pod-selector stringArray label selector identifying pods whose presence should prevent reboots
--drain-grace-period int time in seconds given to each pod to terminate gracefully, if negative, the default value specified in the pod will be used (default -1)
--drain-timeout duration timeout after which the drain is aborted (default: 0, infinite time)
--ds-name string name of daemonset on which to place lock (default "kured")
--ds-namespace string namespace containing daemonset on which to place lock (default "kube-system")
--end-time string schedule reboot only before this time of day (default "23:59:59")
--force-reboot force a reboot even if the drain fails or times out
-h, --help help for kured
--lock-annotation string annotation in which to record locking node (default "weave.works/kured-node-lock")
--lock-release-delay duration delay lock release for this duration (default: 0, disabled)
--lock-ttl duration expire lock annotation after this duration (default: 0, disabled)
--log-format string use text or json log format (default "text")
--message-template-drain string message template used to notify about a node being drained (default "Draining node %s")
--message-template-reboot string message template used to notify about a node being rebooted (default "Rebooting node %s")
--message-template-uncordon string message template used to notify about a node being successfully uncordoned (default "Node %s rebooted & uncordoned successfully!")
--node-id string node name kured runs on, should be passed down from spec.nodeName via KURED_NODE_ID environment variable
--notify-url string notify URL for reboot notifications (cannot use with --slack-hook-url flags)
--period duration sentinel check period (default 1h0m0s)
--post-reboot-node-labels strings labels to add to nodes after uncordoning
--pre-reboot-node-labels strings labels to add to nodes before cordoning
--prefer-no-schedule-taint string Taint name applied during pending node reboot (to prevent receiving additional pods from other rebooting nodes). Disabled by default. Set e.g. to "weave.works/kured-node-reboot" to enable tainting.
--prometheus-url string Prometheus instance to probe for active alerts
--reboot-command string command to run when a reboot is required (default "/bin/systemctl reboot")
--reboot-days strings schedule reboot on these days (default [su,mo,tu,we,th,fr,sa])
--reboot-delay duration delay reboot for this duration (default: 0, disabled)
--reboot-sentinel string path to file whose existence triggers the reboot command (default "/var/run/reboot-required")
--reboot-sentinel-command string command for which a zero return code will trigger a reboot command
--skip-wait-for-delete-timeout int when seconds is greater than zero, skip waiting for the pods whose deletion timestamp is older than N seconds while draining a node
--slack-channel string slack channel for reboot notifications
--slack-hook-url string slack hook URL for reboot notifications [deprecated in favor of --notify-url]
--slack-username string slack username for reboot notifications (default "kured")
--start-time string schedule reboot only after this time of day (default "0:00")
--time-zone string use this timezone for schedule inputs (default "UTC")
By default, Kured checks for the existence of /var/run/reboot-required every 60 minutes, however, you can override these values with –reboot-sentinel and –period. For this demo, we will uncomment and modify the lines below under the Kured DaemonSet;
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: kured # Must match `--ds-name`
namespace: kube-system # Must match `--ds-namespace`
spec:
.....
command:
- /usr/bin/kured
- --period=10m
- --reboot-sentinel=/var/run/reboot-required
....
We have set Kured to check the existence of /var/run/reboot-required every 10 minutes. Remember this is used for testing purposes. You can set the desired time for your environment.
Once the desired configurations have been made, you can then apply the manifest:
kubectl apply -f kured-$latest-dockerhub.yaml
View the deployment status:
$ kubectl get po -n kube-system |grep kured
kured-52lj5 1/1 Running 0 69s
kured-8pczr 1/1 Running 0 69s
kured-rvh7t 1/1 Running 0 69s
There are 3 pods each running on a dedicated hosts. To view the host use the command:
$ kubectl get pods -o wide -n kube-system |grep kured
kured-mhwsg 1/1 Running 0 3m27s 10.244.1.3 master.computingforgeeks.com <none> <none>
kured-p984r 1/1 Running 0 3m27s 10.244.2.5 worker1.computingforgeeks.com <none> <none>
kured-t2tl5 1/1 Running 0 3m26s 10.244.0.9 worker2.computingforgeeks.com <none> <none>
#2. Testing Kured Operation on Kubernetes
To test if Kured is working as desired, we will create the /var/run/reboot-required file on one of the nodes manually:
sudo touch /var/run/reboot-required
Now check the Kured logs:
kubectl logs -f kured-8pczr -c kured -n kube-system
After the set time, a pending system reboot will be identified. The node will be drained and restarted

From the above output, the reboot has been performed, once is restarts successfully, you can view the system uptime on the node:

You can schedule the reboots to happen at a preferred time by making more modifications to the YAML. For example, if you want the reboots to happen only between 00:00hrs and 03:00hrs you can make the below settings:
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: kured # Must match `--ds-name`
namespace: kube-system # Must match `--ds-namespace`
spec:
selector:
matchLabels:
name: kured
updateStrategy:
type: RollingUpdate
....
command:
- /usr/bin/kured
.....
- --reboot-days=sun,mon,tue,wed,thu,fri,sat
- --reboot-delay=90s
- --start-time=0:00
- --end-time=02:59:59
- --time-zone=Africa/Nairobi
Once the changes have been made, apply the command:
kubectl apply -f kured-$latest-dockerhub.yaml
Verdict
In this guide, we have learned how to perform safe & automatic Node Reboots on Kubernetes with Kured. Now you can configure Kubernetes to check if reboots are required and make easy management system of the underlying operating system. I hope this was informative.
See more:
- Install and Use Trow Container Image Registry With Kubernetes
- Become a Kubernetes Pro with this kubectl Guide
- Install and Configure Traefik Ingress Controller on Kubernetes