One of the most important tasks of a system administrator is to ensure that the system and all the available packages are updated to their latest versions. Even after adding the nodes to a Kubernetes cluster, we still need to manage the node updates. In most situations, once the updates(e.g., kernel updates, system maintenance, or hardware changes) have been obtained, you need to reboot the node for the changes to apply. This might be even hard on Kubernetes because you need to ensure that the running applications are gracefully terminated or migrated to other nodes before the reboot.

Kured (KUbernetes REboot Daemon) is a tool designed to simplify the process of rebooting nodes in a Kubernetes cluster. It helps manage node reboots in a controlled and automated manner, minimizing disruptions to applications running on the cluster.

It automates the reboot process by following the below process:

  • Monitor Node Health: Kured continuously monitors the health of nodes in the Kubernetes cluster. It checks the availability and health status of each node. It also watches for the presence of a reboot sentinel file e.g. /var/run/reboot-required or the successful run of a sentinel command.
  • Drain Node: When a node needs to be rebooted, Kured triggers the “drain” process for that node. The “drain” process gracefully terminates the running pods on the node and ensures that the workload is migrated to other available nodes in the cluster. This ensures that there is no disruption to the applications running on the cluster. Utilises a lock in the API server to ensure only one node reboots at a time
  • Reboot Node: After the node has been drained and all the workload has been migrated, Kured initiates the reboot of the node.
  • Node Recovery: Once the node is back online after the reboot, Kured ensures that the node is marked as available and ready to accept new pods.

Kured provides a configurable and automated solution for managing node reboots in a Kubernetes cluster. It helps simplify the process and reduces the operational complexity associated with performing node reboots, ensuring minimal impact on the availability and stability of applications running in the cluster.

Today, we will learn how to perform safe & automatic Node Reboots on Kubernetes with Kured.

Getting Started

For this guide, I assume you already have a Kubernetes cluster running. If not, you can use any of the dedicated guides below to set up a Kubernetes cluster:

Once set up, you also need kubectl. To install it, use the commands:

curl -LO "https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl"
chmod +x kubectl
sudo mv kubectl /usr/local/bin

You need to export the admin config to access the cluster:

##For RKE2
export PATH=$PATH:/var/lib/rancher/rke2/bin export KUBECONFIG=/etc/rancher/rke2/rke2.yaml

##For K0s
export KUBECONFIG=/var/lib/k0s/pki/admin.conf

Verify if you can access the cluster:

$ kubectl get nodes
NAME                    STATUS   ROLES           AGE   VERSION
demo-compute-1          Ready    <none>          17h   v1.27.2
demo-compute-2          Ready    <none>          17h   v1.27.2
demo-k8s-controlplane   Ready    control-plane   17h   v1.27.2

#1. Install kured (Kubernetes Reboot Daemon) on Kubernetes

Kured is a Kubernetes daemonset that automates safe and controlled node reboots based on indications from the package management system of the underlying operating system. It ensures that nodes in the cluster are rebooted in a coordinated manner without disrupting the availability of running applications.

To install it, obtain a default manifest without Prometheus alerting interlock or Slack notifications with the commands:

latest=$(curl -s https://api.github.com/repos/kubereboot/kured/releases | jq -r '.[0].tag_name')
wget https://github.com/kubereboot/kured/releases/download/$latest/kured-$latest-dockerhub.yaml 

Once downloaded. open the file for editing:

vim kured-$latest-dockerhub.yaml 

Now there are quite a number of configurations you can make here. The available flags are:

Kubernetes Reboot Daemon

Usage:
  kured [flags]

Flags:
      --alert-filter-regexp regexp.Regexp   alert names to ignore when checking for active alerts
      --alert-firing-only                   only consider firing alerts when checking for active alerts
      --annotate-nodes                      if set, the annotations 'weave.works/kured-reboot-in-progress' and 'weave.works/kured-most-recent-reboot-needed' will be given to nodes undergoing kured reboots
      --blocking-pod-selector stringArray   label selector identifying pods whose presence should prevent reboots
      --drain-grace-period int              time in seconds given to each pod to terminate gracefully, if negative, the default value specified in the pod will be used (default -1)
      --drain-timeout duration              timeout after which the drain is aborted (default: 0, infinite time)
      --ds-name string                      name of daemonset on which to place lock (default "kured")
      --ds-namespace string                 namespace containing daemonset on which to place lock (default "kube-system")
      --end-time string                     schedule reboot only before this time of day (default "23:59:59")
      --force-reboot                        force a reboot even if the drain fails or times out
  -h, --help                                help for kured
      --lock-annotation string              annotation in which to record locking node (default "weave.works/kured-node-lock")
      --lock-release-delay duration         delay lock release for this duration (default: 0, disabled)
      --lock-ttl duration                   expire lock annotation after this duration (default: 0, disabled)
      --log-format string                   use text or json log format (default "text")
      --message-template-drain string       message template used to notify about a node being drained (default "Draining node %s")
      --message-template-reboot string      message template used to notify about a node being rebooted (default "Rebooting node %s")
      --message-template-uncordon string    message template used to notify about a node being successfully uncordoned (default "Node %s rebooted & uncordoned successfully!")
      --node-id string                      node name kured runs on, should be passed down from spec.nodeName via KURED_NODE_ID environment variable
      --notify-url string                   notify URL for reboot notifications (cannot use with --slack-hook-url flags)
      --period duration                     sentinel check period (default 1h0m0s)
      --post-reboot-node-labels strings     labels to add to nodes after uncordoning
      --pre-reboot-node-labels strings      labels to add to nodes before cordoning
      --prefer-no-schedule-taint string     Taint name applied during pending node reboot (to prevent receiving additional pods from other rebooting nodes). Disabled by default. Set e.g. to "weave.works/kured-node-reboot" to enable tainting.
      --prometheus-url string               Prometheus instance to probe for active alerts
      --reboot-command string               command to run when a reboot is required (default "/bin/systemctl reboot")
      --reboot-days strings                 schedule reboot on these days (default [su,mo,tu,we,th,fr,sa])
      --reboot-delay duration               delay reboot for this duration (default: 0, disabled)
      --reboot-sentinel string              path to file whose existence triggers the reboot command (default "/var/run/reboot-required")
      --reboot-sentinel-command string      command for which a zero return code will trigger a reboot command
      --skip-wait-for-delete-timeout int    when seconds is greater than zero, skip waiting for the pods whose deletion timestamp is older than N seconds while draining a node
      --slack-channel string                slack channel for reboot notifications
      --slack-hook-url string               slack hook URL for reboot notifications [deprecated in favor of --notify-url]
      --slack-username string               slack username for reboot notifications (default "kured")
      --start-time string                   schedule reboot only after this time of day (default "0:00")
      --time-zone string                    use this timezone for schedule inputs (default "UTC")

By default, Kured checks for the existence of /var/run/reboot-required every 60 minutes, however, you can override these values with –reboot-sentinel and –period. For this demo, we will uncomment and modify the lines below under the Kured DaemonSet;

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: kured # Must match `--ds-name`
  namespace: kube-system # Must match `--ds-namespace`
spec:
.....
          command:
            - /usr/bin/kured
            - --period=10m
            - --reboot-sentinel=/var/run/reboot-required
....

We have set Kured to check the existence of /var/run/reboot-required every 10 minutes. Remember this is used for testing purposes. You can set the desired time for your environment.

Once the desired configurations have been made, you can then apply the manifest:

kubectl apply -f kured-$latest-dockerhub.yaml 

View the deployment status:

$ kubectl get po -n kube-system |grep kured
kured-52lj5                                     1/1     Running            0                69s
kured-8pczr                                     1/1     Running            0                69s
kured-rvh7t                                     1/1     Running            0                69s

There are 3 pods each running on a dedicated hosts. To view the host use the command:

$ kubectl get pods -o wide  -n kube-system |grep kured
kured-mhwsg                                            1/1     Running   0             3m27s   10.244.1.3        master.computingforgeeks.com    <none>           <none>
kured-p984r                                            1/1     Running   0             3m27s   10.244.2.5        worker1.computingforgeeks.com   <none>           <none>
kured-t2tl5                                            1/1     Running   0             3m26s   10.244.0.9        worker2.computingforgeeks.com   <none>           <none>

#2. Testing Kured Operation on Kubernetes

To test if Kured is working as desired, we will create the /var/run/reboot-required file on one of the nodes manually:

sudo touch /var/run/reboot-required

Now check the Kured logs:

kubectl logs -f kured-8pczr -c kured -n kube-system

After the set time, a pending system reboot will be identified. The node will be drained and restarted

Perform Safe Automatic Node Reboots on Kubernetes with Kured

From the above output, the reboot has been performed, once is restarts successfully, you can view the system uptime on the node:

Perform Safe Automatic Node Reboots on Kubernetes with Kured 2

You can schedule the reboots to happen at a preferred time by making more modifications to the YAML. For example, if you want the reboots to happen only between 00:00hrs and 03:00hrs you can make the below settings:

---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: kured # Must match `--ds-name`
  namespace: kube-system # Must match `--ds-namespace`
spec:
  selector:
    matchLabels:
      name: kured
  updateStrategy:
    type: RollingUpdate
....
          command:
            - /usr/bin/kured
.....
            - --reboot-days=sun,mon,tue,wed,thu,fri,sat
            - --reboot-delay=90s
            - --start-time=0:00
            - --end-time=02:59:59
            - --time-zone=Africa/Nairobi

Once the changes have been made, apply the command:

kubectl apply -f kured-$latest-dockerhub.yaml 

Verdict

In this guide, we have learned how to perform safe & automatic Node Reboots on Kubernetes with Kured. Now you can configure Kubernetes to check if reboots are required and make easy management system of the underlying operating system. I hope this was informative.

See more:

LEAVE A REPLY

Please enter your comment!
Please enter your name here