Rocky Linux 9 and AlmaLinux 9 are still the dominant RHEL 9 rebuilds in the field. Many shops aren’t ready to move their production estate to the 10 series yet, and RHEL 9 stays supported until May 2032. If that describes your environment, this guide walks through a full Kubernetes 1.35 cluster deployment on Rocky 9 or Alma 9 with kubeadm and containerd as the container runtime.
The setup uses the stable pkgs.k8s.io repositories, containerd from the official Docker CE repo, and Calico v3.30 for pod networking. We also ship an Ansible role, k8s-pre-bootstrap, that handles the repetitive per-node prep. If you prefer automation, point the role at your nodes and skip straight to kubeadm init. Otherwise, every manual step is documented below.
Verified working: April 2026 on Rocky Linux 9.6 and AlmaLinux 9.6, kubeadm v1.35.3, containerd v2.2.2, Calico v3.30.3, SELinux enforcing
Prerequisites
Three nodes, bare metal or virtual. One control plane, two workers. Minimum specs per node:
- OS: Rocky Linux 9 or AlmaLinux 9 (minimal install, kernel 5.14.x)
- RAM: 2 GB workers, 4 GB on the control plane
- CPU: 2 vCPUs minimum
- Disk: 20 GB free on
/var - Full connectivity between all nodes on the required ports
- Root or sudo access, and a non-root user for kubectl
The lab nodes used here:
| Role | Hostname | IP |
|---|---|---|
| Control plane | k8s-cp01 | 10.0.1.10 |
| Worker 1 | k8s-wk01 | 10.0.1.11 |
| Worker 2 | k8s-wk02 | 10.0.1.12 |
Substitute your own IPs and hostnames throughout.
Why Rocky/Alma 9 for Kubernetes in 2026
A fair question, given Rocky 10 and Alma 10 are both GA. The reasons we still see 9.x on freshly built clusters: longer maintenance track (RHEL 9 full-support into 2027, maintenance into 2032), a kernel family (5.14.x) that is well-understood by every third-party CNI and CSI driver in the wild, and compatibility with existing Ansible inventories that were written for RHEL 9. If you already have a fleet on 9, there is no technical reason to rush a cluster onto 10 just to run current Kubernetes. Kubernetes 1.35 runs identically on both.
Automate the prep with the k8s-pre-bootstrap Ansible role
Kernel tuning, swap disable, containerd, kubelet/kubeadm/kubectl: these are identical on every node. We maintain an Ansible role, k8s-pre-bootstrap, that handles them on Rocky 9, Alma 9, Rocky 10, Ubuntu, and Debian. Clone, configure, run, and skip to kubeadm init.
git clone https://github.com/jmutai/k8s-pre-bootstrap.git
cd k8s-pre-bootstrap
ansible-galaxy collection install -r requirements.yml
Edit the hosts inventory:
[k8snodes]
k8s-cp01 ansible_host=10.0.1.10
k8s-wk01 ansible_host=10.0.1.11
k8s-wk02 ansible_host=10.0.1.12
[k8snodes:vars]
ansible_user=rocky
ansible_become=true
Open k8s-prep.yml and confirm k8s_version: "1.35" and container_runtime: containerd. Then:
ansible-playbook -i hosts k8s-prep.yml
A clean run ends with a PLAY RECAP showing failed=0 for every host. The role installs containerd, loads br_netfilter and overlay, applies the required sysctl, disables swap (including zram-generator where present), adds the pkgs.k8s.io repo, installs kubelet, kubeadm, kubectl, and reboots any node whose kernel was updated. After it finishes, jump to Initialize the cluster.
The rest of the guide covers the same ground by hand. Useful when you want to audit the steps, when you have a single node, or when the playbook is failing and you need to narrow down which task is broken.
1. Prepare every node
Run the following on all three nodes. Hostnames first:
sudo hostnamectl set-hostname k8s-cp01 # on the control plane
sudo hostnamectl set-hostname k8s-wk01 # on worker 1
sudo hostnamectl set-hostname k8s-wk02 # on worker 2
Populate /etc/hosts on every node so peer resolution works even if DNS is flaky:
sudo tee -a /etc/hosts <<'EOF'
10.0.1.10 k8s-cp01
10.0.1.11 k8s-wk01
10.0.1.12 k8s-wk02
EOF
Disable swap. The kubelet refuses to start when swap is active:
sudo swapoff -a
sudo sed -i '/\sswap\s/s/^/#/' /etc/fstab
Load the kernel modules containerd and the CNI need, and set them to load at boot:
sudo tee /etc/modules-load.d/k8s.conf <<'EOF'
overlay
br_netfilter
EOF
sudo modprobe overlay
sudo modprobe br_netfilter
Apply the sysctl settings for bridge filtering and IP forwarding:
sudo tee /etc/sysctl.d/k8s.conf <<'EOF'
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF
sudo sysctl --system
Verify:
sysctl net.bridge.bridge-nf-call-iptables net.ipv4.ip_forward
lsmod | grep -E 'overlay|br_netfilter'
All three sysctl keys must return 1 and both kernel modules should appear in lsmod.
2. Install containerd
We pull containerd from the official Docker CE repository. The RHEL 9 repo definition works for Rocky 9 and Alma 9 without modification:
sudo dnf install -y dnf-plugins-core
sudo dnf config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
sudo dnf install -y containerd.io
Now the step people miss and then spend the evening debugging. Kubernetes on systemd-managed hosts expects the runtime to use the systemd cgroup driver. The default containerd config ships with cgroupfs, which causes kubelet flapping and random OOM-kills under load. Flip it:
sudo mkdir -p /etc/containerd
sudo containerd config default | sudo tee /etc/containerd/config.toml > /dev/null
sudo sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.toml
Confirm the change stuck:
grep SystemdCgroup /etc/containerd/config.toml
The output should show SystemdCgroup = true. Enable and start the service:
sudo systemctl enable --now containerd
systemctl is-active containerd
If you want to dig deeper into how containerd integrates with the CRI, see our guide on configuring containerd as a Kubernetes runtime.
3. Install kubeadm, kubelet, kubectl
The Google-hosted legacy repo is dead. All current Kubernetes packages live at pkgs.k8s.io, versioned per minor release. If you want the full story on the kubeadm install flow across distros, our Ubuntu kubeadm guide follows the same playbook with apt instead of dnf. Pin to 1.35 explicitly:
sudo tee /etc/yum.repos.d/kubernetes.repo <<'EOF'
[kubernetes]
name=Kubernetes
baseurl=https://pkgs.k8s.io/core:/stable:/v1.35/rpm/
enabled=1
gpgcheck=1
gpgkey=https://pkgs.k8s.io/core:/stable:/v1.35/rpm/repodata/repomd.xml.key
exclude=kubelet kubeadm kubectl cri-tools kubernetes-cni
EOF
The exclude line keeps a routine dnf update from pulling in a minor-version jump unexpectedly. When you are ready to upgrade, pass --disableexcludes=kubernetes:
sudo dnf install -y kubelet kubeadm kubectl --disableexcludes=kubernetes
sudo systemctl enable kubelet
The kubelet will refuse to stay running until the node is initialized or joined. That is expected. Verify versions:
kubeadm version -o short
kubectl version --client --output=yaml | head -5
All three binaries should report v1.35.x. If you see older versions, run sudo dnf clean all and reinstall.
4. Open the firewall
Rocky/Alma 9 ship with firewalld active. Skipping this section is the single most common reason join commands hang. On the control plane:
sudo firewall-cmd --permanent --add-port={6443,2379-2380,10250,10257,10259}/tcp
sudo firewall-cmd --reload
On each worker:
sudo firewall-cmd --permanent --add-port={10250,30000-32767}/tcp
sudo firewall-cmd --reload
What each port does:
| Port | Component | Where |
|---|---|---|
| 6443 | kube-apiserver | control plane |
| 2379-2380 | etcd client / peer | control plane |
| 10250 | kubelet API | all nodes |
| 10257 | kube-controller-manager | control plane |
| 10259 | kube-scheduler | control plane |
| 30000-32767 | NodePort services | workers |
Calico also needs its own set of ports between nodes: 179/tcp for BGP, 4789/udp for VXLAN, and 5473/tcp for Typha if you enable it. If your nodes are on the same L2 segment with permissive east-west rules, you are fine. In tighter environments, add them explicitly.
SELinux note. Rocky/Alma 9 run SELinux in enforcing mode out of the box. Keep it that way. The Kubernetes RPMs ship with the policy bits needed for kubelet, containerd, and the CNI. Do not run setenforce 0. If a specific pod workload throws AVC denials, use ausearch -m avc -ts recent to identify the denial and write a targeted policy with audit2allow, rather than disabling SELinux across the cluster.
5. Initialize the cluster
This runs on the control plane only. Calico’s default pod CIDR is 192.168.0.0/16. Using it means zero edits to the Calico manifest later, which is one less thing to get wrong:
sudo kubeadm init \
--pod-network-cidr=192.168.0.0/16 \
--apiserver-advertise-address=10.0.1.10 \
--kubernetes-version=stable-1.35
On a freshly prepped Rocky 9 node, kubeadm finishes in under a minute. The tail of the output prints a kubeadm join command. Save it. The final lines look like this:
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 10.0.1.10:6443 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:aaaa0000bbbb1111cccc2222dddd3333eeee4444ffff5555aaaa0000bbbb1111
Set up kubectl access for your regular user on the control plane:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
kubectl cluster-info
The node will show NotReady because no CNI is installed yet. That changes in the next step.
6. Install the Calico CNI
Apply the Tigera operator, then the custom resources manifest:
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.30.3/manifests/tigera-operator.yaml
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.30.3/manifests/custom-resources.yaml
Watch the calico-system namespace until every pod is Running:
kubectl get pods -n calico-system -w
Ctrl-C out once the operator, typha, and node pods are all Ready. The control plane node should now show Ready:
kubectl get nodes
If the node is stuck in NotReady beyond 3-4 minutes, tail the calico-node logs: kubectl logs -n calico-system -l k8s-app=calico-node --tail=50. The usual culprits are a missing br_netfilter module or a sysctl that did not actually apply.
7. Join the workers
Run the join command from kubeadm init on each worker. Use the actual token and hash from your output:
sudo kubeadm join 10.0.1.10:6443 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:aaaa0000bbbb1111cccc2222dddd3333eeee4444ffff5555aaaa0000bbbb1111
Lost the join command, or the token expired (they last 24 hours)? Regenerate from the control plane:
kubeadm token create --print-join-command
Back on the control plane, watch the workers join. Calico takes a minute or two to push the CNI config to each fresh node:
kubectl get nodes -o wide
Expected output after everything settles:
NAME STATUS ROLES AGE VERSION INTERNAL-IP OS-IMAGE
k8s-cp01 Ready control-plane 12m v1.35.3 10.0.1.10 Rocky Linux 9.6 (Blue Onyx)
k8s-wk01 Ready <none> 3m v1.35.3 10.0.1.11 Rocky Linux 9.6 (Blue Onyx)
k8s-wk02 Ready <none> 3m v1.35.3 10.0.1.12 Rocky Linux 9.6 (Blue Onyx)
Label the workers so the role shows up in kubectl get nodes:
kubectl label node k8s-wk01 node-role.kubernetes.io/worker=worker
kubectl label node k8s-wk02 node-role.kubernetes.io/worker=worker
8. Smoke test the cluster
Before handing the cluster off, confirm scheduling, networking, and service routing all work. A two-replica nginx deployment is enough:
kubectl create deployment smoke-nginx --image=nginx:latest --replicas=2
kubectl expose deployment smoke-nginx --type=NodePort --port=80
kubectl get pods -o wide
kubectl get svc smoke-nginx
Take the NodePort from the PORT(S) column (something like 80:31458/TCP) and hit any worker’s IP on that port:
curl -sI http://10.0.1.11:31458 | head -3
A 200 OK from the nginx default means the cluster is routing traffic end to end. Clean up:
kubectl delete deployment smoke-nginx
kubectl delete svc smoke-nginx
Production hardening checklist
A fresh kubeadm cluster is functional, not production-ready. Before running workloads that matter, tighten these:
- Multi-control-plane. A single control plane is a single failure domain. Add two more with a stacked etcd topology, or an external etcd cluster if you want independent failure domains. Put Nginx or HAProxy in front of the API servers on port 6443. Our HA kubeadm guide covers the full topology.
- Back up etcd. The cluster state is in etcd. Nothing you do in kubectl matters if that data is gone. Run
etcdctl snapshot saveon a schedule and ship the snapshot off-node. - Pin package versions. Keep the
exclude=kubelet kubeadm kubectlline in the kubernetes.repo so routine patching cannot cause a minor-version jump. Upgrade deliberately withkubeadm upgrade plan. - Enable audit logging. The default kube-apiserver audit policy is empty. Point
--audit-policy-fileat a real policy and write events to disk or an external sink. - Enforce Pod Security Standards. Label namespaces with
pod-security.kubernetes.io/enforce=restrictedwhere possible. The built-in admission plugin is enabled by default since 1.25, but you still need to opt in per namespace. - Rotate kubelet client certs. Kubelet certificates expire after a year. Set
rotateCertificates: trueandserverTLSBootstrap: truein the kubelet config, and approve the server CSRs (or run an approver like kubelet-rubber-stamp). - Monitoring and alerting. A cluster without metrics is flying blind. Our guide to installing Prometheus and Grafana on Kubernetes covers the kube-prometheus-stack Helm chart. Dashboards for node exporter, kube-state-metrics, and etcd are the minimum viable baseline.
Common issues
Error: “[ERROR CRI]: container runtime is not running”
kubeadm preflight could not talk to containerd. Usually the default config was not regenerated, so crictl cannot connect. Regenerate the config and restart:
sudo containerd config default | sudo tee /etc/containerd/config.toml > /dev/null
sudo sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.toml
sudo systemctl restart containerd
Error: “kubelet is not running” after init
Check the kubelet journal:
sudo journalctl -u kubelet --since "5 min ago" --no-pager | tail -50
Cgroup driver mismatch is by far the most frequent cause on Rocky/Alma 9. The fix is the same as the CRI error above: SystemdCgroup = true in containerd’s config, then restart containerd and the kubelet.
Workers stuck at “NotReady” indefinitely
Calico has not pushed the CNI config to the worker. Confirm the calico-node pod on that worker is running: kubectl -n calico-system get pods -o wide | grep WORKER_NAME. If it is pending or crashlooping, inspect its logs and events. A frequent cause is the worker missing the br_netfilter module, which means the sysctl prep step did not fully apply. Re-run:
sudo modprobe br_netfilter overlay
sudo sysctl --system
Pods in ImagePullBackOff
Test the node’s reach to the registry directly:
sudo crictl pull registry.k8s.io/pause:3.10
Failure here usually means egress firewall or proxy. Configure HTTPS_PROXY for containerd in /etc/systemd/system/containerd.service.d/http-proxy.conf and reload systemd.
Starting over
If something went sideways during bootstrap, a clean reset is faster than debugging. On the node you want to wipe:
sudo kubeadm reset -f
sudo rm -rf /etc/cni/net.d ~/.kube
sudo iptables -F && sudo iptables -t nat -F && sudo iptables -t mangle -F && sudo iptables -X
Then re-run kubeadm init (control plane) or kubeadm join (worker).
That’s a working three-node Kubernetes 1.35 cluster on Rocky Linux 9 / AlmaLinux 9 with containerd and Calico. From here, the immediate next steps are usually HA-ifying the control plane, adding Ingress, wiring up persistent storage, and deploying a Prometheus-based monitoring stack. For multi-cluster management, our guide on managing multiple Kubernetes clusters with kubectl and kubectx is a good follow-up.
Thanks for the good article
I love this post.