Install Kubernetes on Rocky Linux 9 / AlmaLinux 9

Rocky Linux 9 and AlmaLinux 9 are still the dominant RHEL 9 rebuilds in the field. Many shops aren’t ready to move their production estate to the 10 series yet, and RHEL 9 stays supported until May 2032. If that describes your environment, this guide walks through a full Kubernetes 1.35 cluster deployment on Rocky 9 or Alma 9 with kubeadm and containerd as the container runtime.

Original content from computingforgeeks.com - post 149901

The setup uses the stable pkgs.k8s.io repositories, containerd from the official Docker CE repo, and Calico v3.30 for pod networking. We also ship an Ansible role, k8s-pre-bootstrap, that handles the repetitive per-node prep. If you prefer automation, point the role at your nodes and skip straight to kubeadm init. Otherwise, every manual step is documented below.

Verified working: April 2026 on Rocky Linux 9.6 and AlmaLinux 9.6, kubeadm v1.35.3, containerd v2.2.2, Calico v3.30.3, SELinux enforcing

Prerequisites

Three nodes, bare metal or virtual. One control plane, two workers. Minimum specs per node:

OS: Rocky Linux 9 or AlmaLinux 9 (minimal install, kernel 5.14.x)
RAM: 2 GB workers, 4 GB on the control plane
CPU: 2 vCPUs minimum
Disk: 20 GB free on /var
Full connectivity between all nodes on the required ports
Root or sudo access, and a non-root user for kubectl

The lab nodes used here:

Role	Hostname	IP
Control plane	k8s-cp01	10.0.1.10
Worker 1	k8s-wk01	10.0.1.11
Worker 2	k8s-wk02	10.0.1.12

Substitute your own IPs and hostnames throughout.

Why Rocky/Alma 9 for Kubernetes in 2026

A fair question, given Rocky 10 and Alma 10 are both GA. The reasons we still see 9.x on freshly built clusters: longer maintenance track (RHEL 9 full-support into 2027, maintenance into 2032), a kernel family (5.14.x) that is well-understood by every third-party CNI and CSI driver in the wild, and compatibility with existing Ansible inventories that were written for RHEL 9. If you already have a fleet on 9, there is no technical reason to rush a cluster onto 10 just to run current Kubernetes. Kubernetes 1.35 runs identically on both.

Automate the prep with the k8s-pre-bootstrap Ansible role

Kernel tuning, swap disable, containerd, kubelet/kubeadm/kubectl: these are identical on every node. We maintain an Ansible role, k8s-pre-bootstrap, that handles them on Rocky 9, Alma 9, Rocky 10, Ubuntu, and Debian. Clone, configure, run, and skip to kubeadm init.

git clone https://github.com/jmutai/k8s-pre-bootstrap.git
cd k8s-pre-bootstrap
ansible-galaxy collection install -r requirements.yml

Edit the hosts inventory:

[k8snodes]
k8s-cp01 ansible_host=10.0.1.10
k8s-wk01 ansible_host=10.0.1.11
k8s-wk02 ansible_host=10.0.1.12

[k8snodes:vars]
ansible_user=rocky
ansible_become=true

Open k8s-prep.yml and confirm k8s_version: "1.35" and container_runtime: containerd. Then:

ansible-playbook -i hosts k8s-prep.yml

A clean run ends with a PLAY RECAP showing failed=0 for every host. The role installs containerd, loads br_netfilter and overlay, applies the required sysctl, disables swap (including zram-generator where present), adds the pkgs.k8s.io repo, installs kubelet, kubeadm, kubectl, and reboots any node whose kernel was updated. After it finishes, jump to Initialize the cluster.

The rest of the guide covers the same ground by hand. Useful when you want to audit the steps, when you have a single node, or when the playbook is failing and you need to narrow down which task is broken.

1. Prepare every node

Run the following on all three nodes. Hostnames first:

sudo hostnamectl set-hostname k8s-cp01   # on the control plane
sudo hostnamectl set-hostname k8s-wk01   # on worker 1
sudo hostnamectl set-hostname k8s-wk02   # on worker 2

Populate /etc/hosts on every node so peer resolution works even if DNS is flaky:

sudo tee -a /etc/hosts <<'EOF'
10.0.1.10  k8s-cp01
10.0.1.11  k8s-wk01
10.0.1.12  k8s-wk02
EOF

Disable swap. The kubelet refuses to start when swap is active:

sudo swapoff -a
sudo sed -i '/\sswap\s/s/^/#/' /etc/fstab

Load the kernel modules containerd and the CNI need, and set them to load at boot:

sudo tee /etc/modules-load.d/k8s.conf <<'EOF'
overlay
br_netfilter
EOF

sudo modprobe overlay
sudo modprobe br_netfilter

Apply the sysctl settings for bridge filtering and IP forwarding:

sudo tee /etc/sysctl.d/k8s.conf <<'EOF'
net.bridge.bridge-nf-call-iptables  = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward                 = 1
EOF

sudo sysctl --system

Verify:

sysctl net.bridge.bridge-nf-call-iptables net.ipv4.ip_forward
lsmod | grep -E 'overlay|br_netfilter'

All three sysctl keys must return 1 and both kernel modules should appear in lsmod.

2. Install containerd

We pull containerd from the official Docker CE repository. The RHEL 9 repo definition works for Rocky 9 and Alma 9 without modification:

sudo dnf install -y dnf-plugins-core
sudo dnf config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
sudo dnf install -y containerd.io

Now the step people miss and then spend the evening debugging. Kubernetes on systemd-managed hosts expects the runtime to use the systemd cgroup driver. The default containerd config ships with cgroupfs, which causes kubelet flapping and random OOM-kills under load. Flip it:

sudo mkdir -p /etc/containerd
sudo containerd config default | sudo tee /etc/containerd/config.toml > /dev/null
sudo sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.toml

Confirm the change stuck:

grep SystemdCgroup /etc/containerd/config.toml

The output should show SystemdCgroup = true. Enable and start the service:

sudo systemctl enable --now containerd
systemctl is-active containerd

If you want to dig deeper into how containerd integrates with the CRI, see our guide on configuring containerd as a Kubernetes runtime.

3. Install kubeadm, kubelet, kubectl

The Google-hosted legacy repo is dead. All current Kubernetes packages live at pkgs.k8s.io, versioned per minor release. If you want the full story on the kubeadm install flow across distros, our Ubuntu kubeadm guide follows the same playbook with apt instead of dnf. Pin to 1.35 explicitly:

sudo tee /etc/yum.repos.d/kubernetes.repo <<'EOF'
[kubernetes]
name=Kubernetes
baseurl=https://pkgs.k8s.io/core:/stable:/v1.35/rpm/
enabled=1
gpgcheck=1
gpgkey=https://pkgs.k8s.io/core:/stable:/v1.35/rpm/repodata/repomd.xml.key
exclude=kubelet kubeadm kubectl cri-tools kubernetes-cni
EOF

The exclude line keeps a routine dnf update from pulling in a minor-version jump unexpectedly. When you are ready to upgrade, pass --disableexcludes=kubernetes:

sudo dnf install -y kubelet kubeadm kubectl --disableexcludes=kubernetes
sudo systemctl enable kubelet

The kubelet will refuse to stay running until the node is initialized or joined. That is expected. Verify versions:

kubeadm version -o short
kubectl version --client --output=yaml | head -5

All three binaries should report v1.35.x. If you see older versions, run sudo dnf clean all and reinstall.

4. Open the firewall

Rocky/Alma 9 ship with firewalld active. Skipping this section is the single most common reason join commands hang. On the control plane:

sudo firewall-cmd --permanent --add-port={6443,2379-2380,10250,10257,10259}/tcp
sudo firewall-cmd --reload

On each worker:

sudo firewall-cmd --permanent --add-port={10250,30000-32767}/tcp
sudo firewall-cmd --reload

What each port does:

Port	Component	Where
6443	kube-apiserver	control plane
2379-2380	etcd client / peer	control plane
10250	kubelet API	all nodes
10257	kube-controller-manager	control plane
10259	kube-scheduler	control plane
30000-32767	NodePort services	workers

Calico also needs its own set of ports between nodes: 179/tcp for BGP, 4789/udp for VXLAN, and 5473/tcp for Typha if you enable it. If your nodes are on the same L2 segment with permissive east-west rules, you are fine. In tighter environments, add them explicitly.

SELinux note. Rocky/Alma 9 run SELinux in enforcing mode out of the box. Keep it that way. The Kubernetes RPMs ship with the policy bits needed for kubelet, containerd, and the CNI. Do not run setenforce 0. If a specific pod workload throws AVC denials, use ausearch -m avc -ts recent to identify the denial and write a targeted policy with audit2allow, rather than disabling SELinux across the cluster.

5. Initialize the cluster

This runs on the control plane only. Calico’s default pod CIDR is 192.168.0.0/16. Using it means zero edits to the Calico manifest later, which is one less thing to get wrong:

sudo kubeadm init \
  --pod-network-cidr=192.168.0.0/16 \
  --apiserver-advertise-address=10.0.1.10 \
  --kubernetes-version=stable-1.35

On a freshly prepped Rocky 9 node, kubeadm finishes in under a minute. The tail of the output prints a kubeadm join command. Save it. The final lines look like this:

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 10.0.1.10:6443 --token abcdef.0123456789abcdef \
  --discovery-token-ca-cert-hash sha256:aaaa0000bbbb1111cccc2222dddd3333eeee4444ffff5555aaaa0000bbbb1111

Set up kubectl access for your regular user on the control plane:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
kubectl cluster-info

The node will show NotReady because no CNI is installed yet. That changes in the next step.

6. Install the Calico CNI

Apply the Tigera operator, then the custom resources manifest:

kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.30.3/manifests/tigera-operator.yaml
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.30.3/manifests/custom-resources.yaml

Watch the calico-system namespace until every pod is Running:

kubectl get pods -n calico-system -w

Ctrl-C out once the operator, typha, and node pods are all Ready. The control plane node should now show Ready:

kubectl get nodes

If the node is stuck in NotReady beyond 3-4 minutes, tail the calico-node logs: kubectl logs -n calico-system -l k8s-app=calico-node --tail=50. The usual culprits are a missing br_netfilter module or a sysctl that did not actually apply.

7. Join the workers

Run the join command from kubeadm init on each worker. Use the actual token and hash from your output:

sudo kubeadm join 10.0.1.10:6443 --token abcdef.0123456789abcdef \
  --discovery-token-ca-cert-hash sha256:aaaa0000bbbb1111cccc2222dddd3333eeee4444ffff5555aaaa0000bbbb1111

Lost the join command, or the token expired (they last 24 hours)? Regenerate from the control plane:

kubeadm token create --print-join-command

Back on the control plane, watch the workers join. Calico takes a minute or two to push the CNI config to each fresh node:

kubectl get nodes -o wide

Expected output after everything settles:

NAME       STATUS   ROLES           AGE   VERSION   INTERNAL-IP   OS-IMAGE
k8s-cp01   Ready    control-plane   12m   v1.35.3   10.0.1.10     Rocky Linux 9.6 (Blue Onyx)
k8s-wk01   Ready    <none>          3m    v1.35.3   10.0.1.11     Rocky Linux 9.6 (Blue Onyx)
k8s-wk02   Ready    <none>          3m    v1.35.3   10.0.1.12     Rocky Linux 9.6 (Blue Onyx)

Label the workers so the role shows up in kubectl get nodes:

kubectl label node k8s-wk01 node-role.kubernetes.io/worker=worker
kubectl label node k8s-wk02 node-role.kubernetes.io/worker=worker

8. Smoke test the cluster

Before handing the cluster off, confirm scheduling, networking, and service routing all work. A two-replica nginx deployment is enough:

kubectl create deployment smoke-nginx --image=nginx:latest --replicas=2
kubectl expose deployment smoke-nginx --type=NodePort --port=80
kubectl get pods -o wide
kubectl get svc smoke-nginx

Take the NodePort from the PORT(S) column (something like 80:31458/TCP) and hit any worker’s IP on that port:

curl -sI http://10.0.1.11:31458 | head -3

A 200 OK from the nginx default means the cluster is routing traffic end to end. Clean up:

kubectl delete deployment smoke-nginx
kubectl delete svc smoke-nginx

Production hardening checklist

A fresh kubeadm cluster is functional, not production-ready. Before running workloads that matter, tighten these:

Multi-control-plane. A single control plane is a single failure domain. Add two more with a stacked etcd topology, or an external etcd cluster if you want independent failure domains. Put Nginx or HAProxy in front of the API servers on port 6443. Our HA kubeadm guide covers the full topology.
Back up etcd. The cluster state is in etcd. Nothing you do in kubectl matters if that data is gone. Run etcdctl snapshot save on a schedule and ship the snapshot off-node.
Pin package versions. Keep the exclude=kubelet kubeadm kubectl line in the kubernetes.repo so routine patching cannot cause a minor-version jump. Upgrade deliberately with kubeadm upgrade plan.
Enable audit logging. The default kube-apiserver audit policy is empty. Point --audit-policy-file at a real policy and write events to disk or an external sink.
Enforce Pod Security Standards. Label namespaces with pod-security.kubernetes.io/enforce=restricted where possible. The built-in admission plugin is enabled by default since 1.25, but you still need to opt in per namespace.
Rotate kubelet client certs. Kubelet certificates expire after a year. Set rotateCertificates: true and serverTLSBootstrap: true in the kubelet config, and approve the server CSRs (or run an approver like kubelet-rubber-stamp).
Monitoring and alerting. A cluster without metrics is flying blind. Our guide to installing Prometheus and Grafana on Kubernetes covers the kube-prometheus-stack Helm chart. Dashboards for node exporter, kube-state-metrics, and etcd are the minimum viable baseline.

Common issues

Error: “[ERROR CRI]: container runtime is not running”

kubeadm preflight could not talk to containerd. Usually the default config was not regenerated, so crictl cannot connect. Regenerate the config and restart:

sudo containerd config default | sudo tee /etc/containerd/config.toml > /dev/null
sudo sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.toml
sudo systemctl restart containerd

Error: “kubelet is not running” after init

Check the kubelet journal:

sudo journalctl -u kubelet --since "5 min ago" --no-pager | tail -50

Cgroup driver mismatch is by far the most frequent cause on Rocky/Alma 9. The fix is the same as the CRI error above: SystemdCgroup = true in containerd’s config, then restart containerd and the kubelet.

Workers stuck at “NotReady” indefinitely

Calico has not pushed the CNI config to the worker. Confirm the calico-node pod on that worker is running: kubectl -n calico-system get pods -o wide | grep WORKER_NAME. If it is pending or crashlooping, inspect its logs and events. A frequent cause is the worker missing the br_netfilter module, which means the sysctl prep step did not fully apply. Re-run:

sudo modprobe br_netfilter overlay
sudo sysctl --system

Pods in ImagePullBackOff

Test the node’s reach to the registry directly:

sudo crictl pull registry.k8s.io/pause:3.10

Failure here usually means egress firewall or proxy. Configure HTTPS_PROXY for containerd in /etc/systemd/system/containerd.service.d/http-proxy.conf and reload systemd.

Starting over

If something went sideways during bootstrap, a clean reset is faster than debugging. On the node you want to wipe:

sudo kubeadm reset -f
sudo rm -rf /etc/cni/net.d ~/.kube
sudo iptables -F && sudo iptables -t nat -F && sudo iptables -t mangle -F && sudo iptables -X

Then re-run kubeadm init (control plane) or kubeadm join (worker).

That’s a working three-node Kubernetes 1.35 cluster on Rocky Linux 9 / AlmaLinux 9 with containerd and Calico. From here, the immediate next steps are usually HA-ifying the control plane, adding Ingress, wiring up persistent storage, and deploying a Prometheus-based monitoring stack. For multi-cluster management, our guide on managing multiple Kubernetes clusters with kubectl and kubectx is a good follow-up.