Deploy HA Kubernetes with Kubespray on Rocky Linux [Guide]

Kubespray is an Ansible-based tool that automates the deployment of production-ready Kubernetes clusters. It handles the full lifecycle – installing container runtimes, etcd, control plane components, and networking plugins across multiple nodes with a single playbook run.

Original content from computingforgeeks.com - post 83958

This guide walks through deploying a highly available Kubernetes 1.34 cluster on Rocky Linux 10 / AlmaLinux 10 using Kubespray v2.30.0. The setup uses 3 control plane nodes with HAProxy and Keepalived for API server load balancing, plus 2 worker nodes. These steps also apply to RHEL 10.

Prerequisites

5 servers running Rocky Linux 10 or AlmaLinux 10 with at least 4GB RAM and 2 vCPUs each
1 deployment workstation (can be your laptop or a separate server) with Python 3.10+ and SSH access to all nodes
All nodes must have internet access to pull container images
SSH key-based authentication configured from the deployment workstation to all 5 nodes
A user with sudo privileges on all nodes (we use deploy in this guide)

Our cluster layout:

Hostname	IP Address	Role
cp1	10.0.1.10	Control Plane + etcd + HAProxy + Keepalived
cp2	10.0.1.11	Control Plane + etcd + HAProxy + Keepalived
cp3	10.0.1.12	Control Plane + etcd + HAProxy + Keepalived
w1	10.0.1.20	Worker
w2	10.0.1.21	Worker

Virtual IP (VIP) for Keepalived: 10.0.1.100

Step 1: Prepare All Nodes

Run these commands on all 5 nodes. Start by updating the system packages.

sudo dnf -y update

Set proper hostnames on each node.

sudo hostnamectl set-hostname cp1    # Run on 10.0.1.10
sudo hostnamectl set-hostname cp2    # Run on 10.0.1.11
sudo hostnamectl set-hostname cp3    # Run on 10.0.1.12
sudo hostnamectl set-hostname w1     # Run on 10.0.1.20
sudo hostnamectl set-hostname w2     # Run on 10.0.1.21

Add all nodes to /etc/hosts on every server.

sudo vi /etc/hosts

Add these entries:

10.0.1.10 cp1
10.0.1.11 cp2
10.0.1.12 cp3
10.0.1.20 w1
10.0.1.21 w2

Disable swap on all nodes. Kubernetes requires swap to be off.

sudo swapoff -a
sudo sed -i '/swap/d' /etc/fstab

Load required kernel modules and set sysctl parameters on all nodes.

sudo modprobe br_netfilter
sudo modprobe overlay

Make the modules persistent across reboots.

sudo tee /etc/modules-load.d/k8s.conf <<'EOF'
br_netfilter
overlay
EOF

Set the required sysctl parameters.

sudo tee /etc/sysctl.d/k8s.conf <<'EOF'
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF
sudo sysctl --system

Step 2: Configure Firewall Rules

Open the required ports on each node. If you need a deeper understanding of firewalld on RHEL-based systems, check that guide first.

On the control plane nodes (cp1, cp2, cp3), open these ports:

sudo firewall-cmd --permanent --add-port=6443/tcp
sudo firewall-cmd --permanent --add-port=2379-2380/tcp
sudo firewall-cmd --permanent --add-port=10250-10252/tcp
sudo firewall-cmd --permanent --add-port=8443/tcp
sudo firewall-cmd --permanent --add-port=179/tcp
sudo firewall-cmd --permanent --add-port=4789/udp
sudo firewall-cmd --permanent --add-rich-rule='rule protocol value="vrrp" accept'
sudo firewall-cmd --reload

On the worker nodes (w1, w2), open these ports:

sudo firewall-cmd --permanent --add-port=10250/tcp
sudo firewall-cmd --permanent --add-port=30000-32767/tcp
sudo firewall-cmd --permanent --add-port=179/tcp
sudo firewall-cmd --permanent --add-port=4789/udp
sudo firewall-cmd --reload

Port reference for Kubernetes components:

Port	Protocol	Purpose
6443	TCP	Kubernetes API server
2379-2380	TCP	etcd client and peer communication
10250	TCP	Kubelet API
10251	TCP	kube-scheduler
10252	TCP	kube-controller-manager
8443	TCP	HAProxy frontend for API server
179	TCP	Calico BGP peering
4789	UDP	VXLAN overlay network
30000-32767	TCP	NodePort services

Step 3: Install and Configure Keepalived on Control Plane Nodes

Keepalived provides a floating virtual IP (VIP) that always points to a healthy control plane node. Install it on all three control plane nodes.

sudo dnf install -y keepalived

On cp1 (10.0.1.10) – this is the MASTER node.

sudo vi /etc/keepalived/keepalived.conf

Add the following configuration:

vrrp_script chk_haproxy {
  script "killall -0 haproxy"
  interval 2
  weight 2
}

vrrp_instance VI_1 {
  interface eth0
  state MASTER
  virtual_router_id 51
  priority 101
  advert_int 1
  unicast_src_ip 10.0.1.10
  unicast_peer {
    10.0.1.11
    10.0.1.12
  }
  virtual_ipaddress {
    10.0.1.100
  }
  track_script {
    chk_haproxy
  }
}

On cp2 (10.0.1.11) – BACKUP with priority 100.

sudo vi /etc/keepalived/keepalived.conf

Add the following configuration:

vrrp_script chk_haproxy {
  script "killall -0 haproxy"
  interval 2
  weight 2
}

vrrp_instance VI_1 {
  interface eth0
  state BACKUP
  virtual_router_id 51
  priority 100
  advert_int 1
  unicast_src_ip 10.0.1.11
  unicast_peer {
    10.0.1.10
    10.0.1.12
  }
  virtual_ipaddress {
    10.0.1.100
  }
  track_script {
    chk_haproxy
  }
}

On cp3 (10.0.1.12) – BACKUP with priority 99.

sudo vi /etc/keepalived/keepalived.conf

Add the following configuration:

vrrp_script chk_haproxy {
  script "killall -0 haproxy"
  interval 2
  weight 2
}

vrrp_instance VI_1 {
  interface eth0
  state BACKUP
  virtual_router_id 51
  priority 99
  advert_int 1
  unicast_src_ip 10.0.1.12
  unicast_peer {
    10.0.1.10
    10.0.1.11
  }
  virtual_ipaddress {
    10.0.1.100
  }
  track_script {
    chk_haproxy
  }
}

Replace eth0 with your actual network interface name if different. Check with ip link show.

Start and enable Keepalived on all three control plane nodes.

sudo systemctl enable --now keepalived

Verify the VIP is assigned on the MASTER node.

$ ip addr show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:f2:92:fd brd ff:ff:ff:ff:ff:ff
    inet 10.0.1.10/24 brd 10.0.1.255 scope global noprefixroute eth0
       valid_lft forever preferred_lft forever
    inet 10.0.1.100/32 scope global eth0
       valid_lft forever preferred_lft forever

Step 4: Install and Configure HAProxy on Control Plane Nodes

HAProxy load balances API server requests across all three control plane nodes. Install it on all three control plane nodes.

sudo dnf install -y haproxy

The HAProxy configuration is identical on all three control plane nodes. For more details on HAProxy setup, see our guide on installing HAProxy on Rocky Linux.

sudo vi /etc/haproxy/haproxy.cfg

Replace the entire file with this configuration:

global
    log         127.0.0.1 local2
    chroot      /var/lib/haproxy
    pidfile     /var/run/haproxy.pid
    maxconn     4000
    user        haproxy
    group       haproxy
    daemon
    stats socket /var/lib/haproxy/stats

defaults
    mode                    http
    log                     global
    option                  httplog
    option                  dontlognull
    option                  http-server-close
    option                  redispatch
    retries                 3
    timeout http-request    10s
    timeout queue           1m
    timeout connect         10s
    timeout client          1m
    timeout server          1m
    timeout http-keep-alive 10s
    timeout check           10s
    maxconn                 3000

# Kubernetes API server frontend
frontend apiserver
    bind *:8443
    mode tcp
    option tcplog
    default_backend apiserver

# Round-robin balancing for API server
backend apiserver
    option httpchk GET /healthz
    http-check expect status 200
    mode tcp
    option ssl-hello-chk
    balance roundrobin
    server cp1 10.0.1.10:6443 check
    server cp2 10.0.1.11:6443 check
    server cp3 10.0.1.12:6443 check

Allow HAProxy to bind to non-local addresses (needed for the VIP).

sudo setsebool -P haproxy_connect_any 1

Start and enable HAProxy on all three control plane nodes.

sudo systemctl enable --now haproxy

Verify HAProxy is listening on port 8443.

$ sudo ss -tlnp | grep 8443
LISTEN 0      3000         *:8443       *:*    users:(("haproxy",pid=12345,fd=7))

Step 5: Set Up the Deployment Workstation

All remaining steps run on your deployment workstation – the machine that will drive the Kubespray Ansible playbook. This can be your laptop or a separate server. Install Python 3, pip, and git.

sudo dnf install -y python3 python3-pip git

Generate an SSH key pair if you have not already, and copy it to all cluster nodes.

ssh-keygen -t ed25519 -N "" -f ~/.ssh/id_ed25519

Copy the public key to each node. Enter the password when prompted.

for host in 10.0.1.10 10.0.1.11 10.0.1.12 10.0.1.20 10.0.1.21; do
  ssh-copy-id deploy@${host}
done

Verify you can SSH into each node without a password.

for host in 10.0.1.10 10.0.1.11 10.0.1.12 10.0.1.20 10.0.1.21; do
  ssh deploy@${host} hostname
done

Step 6: Clone Kubespray and Install Dependencies

Clone the Kubespray repository and check out the v2.30.0 release tag. If you are new to Ansible and need to install it, follow that guide first.

cd ~
git clone https://github.com/kubernetes-sigs/kubespray.git
cd kubespray
git checkout v2.30.0

Create a Python virtual environment and install the required dependencies.

python3 -m venv venv
source venv/bin/activate
pip install -U pip
pip install -r requirements.txt

This installs Ansible 10.7.0 along with cryptography, jmespath, and netaddr. Verify the installation.

$ ansible --version
ansible [core 2.17.x]
  config file = None
  configured module search path = ['/home/deploy/.ansible/plugins/modules']
  python version = 3.12.x

Step 7: Configure Kubespray Inventory

Copy the sample inventory to create your cluster configuration.

cp -rfp inventory/sample inventory/mycluster

Edit the inventory file to define your nodes.

vi inventory/mycluster/inventory.ini

Set the contents to:

[all]
cp1 ansible_host=10.0.1.10 ip=10.0.1.10
cp2 ansible_host=10.0.1.11 ip=10.0.1.11
cp3 ansible_host=10.0.1.12 ip=10.0.1.12
w1  ansible_host=10.0.1.20 ip=10.0.1.20
w2  ansible_host=10.0.1.21 ip=10.0.1.21

[kube_control_plane]
cp1
cp2
cp3

[etcd]
cp1
cp2
cp3

[kube_node]
w1
w2

[calico_rr]

[k8s_cluster:children]
kube_control_plane
kube_node
calico_rr

Step 8: Customize Kubespray Group Variables

Kubespray stores its configuration in inventory/mycluster/group_vars/. Edit the cluster configuration file first.

vi inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml

Set the container runtime to containerd (default in v2.30.0) and confirm these key settings:

# Container runtime - containerd is the default and recommended option
container_manager: containerd

# Network plugin - Calico provides network policy support
kube_network_plugin: calico

# Cluster name
cluster_name: cluster.local

# Pod and service CIDR ranges
kube_pods_subnet: 10.233.64.0/18
kube_service_addresses: 10.233.0.0/18

Now configure the external load balancer settings. Edit the all.yml file.

vi inventory/mycluster/group_vars/all/all.yml

Add or update these settings to point Kubespray at your HAProxy VIP:

# External load balancer for the API server
apiserver_loadbalancer_domain_name: "k8s-api.example.com"
loadbalancer_apiserver:
  address: 10.0.1.100
  port: 8443

# Disable internal load balancer since we use an external one
loadbalancer_apiserver_localhost: false

Make sure k8s-api.example.com resolves to 10.0.1.100 on all nodes. Add it to /etc/hosts on each node if you do not have DNS.

echo "10.0.1.100 k8s-api.example.com" | sudo tee -a /etc/hosts

Step 9: Deploy Kubernetes with Kubespray

Run the deployment playbook from your workstation. Replace deploy with the SSH user that has sudo access on all nodes.

ansible-playbook -i inventory/mycluster/inventory.ini \
  --become --user=deploy --become-user=root \
  cluster.yml

The deployment takes 15-30 minutes depending on network speed and hardware. Kubespray installs containerd, kubeadm, kubelet, kubectl, etcd, and Calico networking across all nodes. You should see zero failed tasks at the end.

PLAY RECAP *********************************************************************
cp1                        : ok=750  changed=148  unreachable=0    failed=0
cp2                        : ok=650  changed=130  unreachable=0    failed=0
cp3                        : ok=650  changed=130  unreachable=0    failed=0
w1                         : ok=480  changed=90   unreachable=0    failed=0
w2                         : ok=480  changed=90   unreachable=0    failed=0

Step 10: Verify the Kubernetes Cluster

SSH into the first control plane node and check the cluster status. The kubectl cheat sheet is a useful reference for common commands.

ssh [email protected]

Copy the kubeconfig to your user account so you can run kubectl without sudo.

mkdir -p ~/.kube
sudo cp /etc/kubernetes/admin.conf ~/.kube/config
sudo chown $(id -u):$(id -g) ~/.kube/config

Check cluster information.

$ kubectl cluster-info
Kubernetes control plane is running at https://k8s-api.example.com:8443
CoreDNS is running at https://k8s-api.example.com:8443/api/v1/namespaces/kube-system/services/coredns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

Verify all nodes are in Ready state.

$ kubectl get nodes -o wide
NAME   STATUS   ROLES           AGE   VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                        KERNEL-VERSION
cp1    Ready    control-plane   10m   v1.34.3   10.0.1.10     <none>        Rocky Linux 10 (Granite Ridge)  6.12.x-0.el10.x86_64
cp2    Ready    control-plane   9m    v1.34.3   10.0.1.11     <none>        Rocky Linux 10 (Granite Ridge)  6.12.x-0.el10.x86_64
cp3    Ready    control-plane   9m    v1.34.3   10.0.1.12     <none>        Rocky Linux 10 (Granite Ridge)  6.12.x-0.el10.x86_64
w1     Ready    <none>          8m    v1.34.3   10.0.1.20     <none>        Rocky Linux 10 (Granite Ridge)  6.12.x-0.el10.x86_64
w2     Ready    <none>          8m    v1.34.3   10.0.1.21     <none>        Rocky Linux 10 (Granite Ridge)  6.12.x-0.el10.x86_64

Check that all system pods are running.

$ kubectl get pods -n kube-system
NAME                                       READY   STATUS    RESTARTS   AGE
calico-kube-controllers-xxxxxxxxxx-xxxxx   1/1     Running   0          8m
calico-node-xxxxx                          1/1     Running   0          8m
calico-node-yyyyy                          1/1     Running   0          8m
calico-node-zzzzz                          1/1     Running   0          8m
coredns-xxxxxxxxxx-xxxxx                   1/1     Running   0          9m
coredns-xxxxxxxxxx-yyyyy                   1/1     Running   0          9m
etcd-cp1                                   1/1     Running   0          10m
etcd-cp2                                   1/1     Running   0          9m
etcd-cp3                                   1/1     Running   0          9m
kube-apiserver-cp1                         1/1     Running   0          10m
kube-apiserver-cp2                         1/1     Running   0          9m
kube-apiserver-cp3                         1/1     Running   0          9m
kube-controller-manager-cp1                1/1     Running   0          10m
kube-controller-manager-cp2                1/1     Running   0          9m
kube-controller-manager-cp3                1/1     Running   0          9m
kube-proxy-xxxxx                           1/1     Running   0          8m
kube-scheduler-cp1                         1/1     Running   0          10m
kube-scheduler-cp2                         1/1     Running   0          9m
kube-scheduler-cp3                         1/1     Running   0          9m
nodelocaldns-xxxxx                         1/1     Running   0          8m

Verify the etcd cluster health.

$ sudo etcdctl --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/ssl/etcd/ssl/ca.pem \
  --cert=/etc/ssl/etcd/ssl/member-cp1.pem \
  --key=/etc/ssl/etcd/ssl/member-cp1-key.pem \
  endpoint health --cluster
https://10.0.1.10:2379 is healthy: successfully committed proposal: took = 10ms
https://10.0.1.11:2379 is healthy: successfully committed proposal: took = 12ms
https://10.0.1.12:2379 is healthy: successfully committed proposal: took = 11ms

Test a deployment to confirm workloads schedule correctly on worker nodes.

kubectl create deployment nginx-test --image=nginx:latest --replicas=2
kubectl get pods -o wide

The pods should land on w1 and w2. Clean up the test deployment after confirming.

kubectl delete deployment nginx-test

Step 11: Access the Cluster from Your Workstation

Copy the kubeconfig from a control plane node to your deployment workstation so you can manage the cluster remotely.

mkdir -p ~/.kube
scp [email protected]:/etc/kubernetes/admin.conf ~/.kube/config

Make sure k8s-api.example.com resolves to 10.0.1.100 on your workstation as well, then test access.

$ kubectl get nodes
NAME   STATUS   ROLES           AGE   VERSION
cp1    Ready    control-plane   15m   v1.34.3
cp2    Ready    control-plane   14m   v1.34.3
cp3    Ready    control-plane   14m   v1.34.3
w1     Ready    <none>          13m   v1.34.3
w2     Ready    <none>          13m   v1.34.3

Adding or Removing Nodes

Kubespray makes scaling easy. To add a new worker, update the inventory file with the new node details, then run the scale playbook.

ansible-playbook -i inventory/mycluster/inventory.ini \
  --become --user=deploy --become-user=root \
  scale.yml --limit=w3

To remove a node, use the remove-node playbook.

ansible-playbook -i inventory/mycluster/inventory.ini \
  --become --user=deploy --become-user=root \
  remove-node.yml -e "node=w3"

To upgrade the cluster to a newer Kubernetes version, update to a newer Kubespray release, then run the upgrade playbook. See the Kubernetes deployment with Rancher guide if you prefer a UI-based cluster manager instead.

ansible-playbook -i inventory/mycluster/inventory.ini \
  --become --user=deploy --become-user=root \
  upgrade-cluster.yml

Conclusion

You now have a highly available Kubernetes 1.34 cluster running on Rocky Linux 10 with 3 control plane nodes, 3 etcd members, and 2 worker nodes – all deployed through Kubespray. The HAProxy and Keepalived layer ensures API server access survives any single control plane node failure.

For production use, add TLS certificates for ingress, set up cluster monitoring with Prometheus and Grafana, configure persistent storage with a CSI driver, and implement regular etcd backup schedules. Consider enabling Kubernetes RBAC policies and network policies through Calico to enforce workload isolation.