Running Kubernetes on Proxmox VE is one of the most practical ways to build a production-grade cluster without spending a fortune on cloud resources. Whether you are building a home lab, standing up a dev/test environment, or running a small production workload, Proxmox gives you a solid Type-1 hypervisor with an API you can automate against. This guide walks through every step: building a reusable Ubuntu 24.04 VM template, deploying a three-node cluster with kubeadm, and bolting on networking and storage so the cluster is actually useful.
Why Run Kubernetes on Proxmox VE
Proxmox VE is a Debian-based hypervisor that ships with KVM, LXC, and a full REST API out of the box. For Kubernetes, that combination solves several problems at once.
Home labs and learning. A single Proxmox host with 64 GB of RAM can run a three-node Kubernetes cluster and still have room for ancillary services like a registry, a database, or a monitoring stack. You get real VMs with real kernel isolation, which is closer to what you will see in production than running minikube on a laptop.
Dev/test environments. Proxmox templates and cloud-init let you spin up a fresh cluster in minutes. Tear it down, rebuild, repeat. If you add Terraform with the Proxmox provider, you can version-control the entire environment and hand it off to CI pipelines.
Small production workloads. For teams that do not need the scale of a managed Kubernetes service, a three-node Proxmox cluster with Ceph or ZFS storage can host Kubernetes with high availability at a fraction of the cost of EKS or GKE. You control the hardware, the networking, and the upgrade schedule.
The rest of this guide assumes you already have a working Proxmox VE 8.x installation. We will build everything from there.
Prerequisites
Before you start, confirm the following:
- Proxmox VE 8.x installed and updated. Version 8.1 or later is recommended. Log in to the web UI and run
pveversionto confirm. - Hardware resources: At least 8 CPU cores, 24 GB RAM, and 100 GB of free storage for the three VMs we will create. More is better, but this is the minimum for a functional cluster.
- Network bridge: A Linux bridge (typically
vmbr0) connected to your LAN. All VMs will attach to this bridge. If you use VLANs, make sure the bridge is VLAN-aware. - Ubuntu 24.04 cloud image: We will download the official cloud image in the template step below.
- DNS or /etc/hosts entries: Each node needs a resolvable hostname. We will configure this with cloud-init, but make sure your local DNS or hosts file can resolve all three names.
- SSH key pair: Generate one if you do not have one. Cloud-init will inject it into each VM so you can log in without a password.
- Internet access on all VMs: Required to pull packages and container images during setup.
Here is the IP plan we will use throughout this guide. Adjust the addresses to match your network:
| Hostname | Role | IP Address | vCPUs | RAM | Disk |
|---|---|---|---|---|---|
| k8s-cp01 | Control plane | 192.168.1.50 | 4 | 8 GB | 50 GB |
| k8s-worker01 | Worker | 192.168.1.51 | 4 | 8 GB | 50 GB |
| k8s-worker02 | Worker | 192.168.1.52 | 4 | 8 GB | 50 GB |
Create an Ubuntu 24.04 VM Template with Cloud-Init
A VM template lets you clone identical machines in seconds. We will build one from the official Ubuntu 24.04 (Noble Numbat) cloud image, configure cloud-init support, and convert it to a template.
SSH into your Proxmox host and download the cloud image:
wget -O /var/lib/vz/template/iso/ubuntu-24.04-cloudimg-amd64.img \
https://cloud-images.ubuntu.com/noble/current/noble-server-cloudimg-amd64.img
Create a new VM that will become the template. We use VM ID 9000 here, but pick any unused ID:
qm create 9000 --name ubuntu-2404-template --memory 4096 --cores 2 \
--net0 virtio,bridge=vmbr0 --scsihw virtio-scsi-single --agent enabled=1
Import the cloud image as the primary disk and attach it to the VM:
qm set 9000 --scsi0 local-lvm:0,import-from=/var/lib/vz/template/iso/ubuntu-24.04-cloudimg-amd64.img
Add a cloud-init drive. This is where Proxmox injects user data, network config, and SSH keys at boot:
qm set 9000 --ide2 local-lvm:cloudinit
Set the boot order so the VM boots from the SCSI disk, and configure the serial console for cloud-init compatibility:
qm set 9000 --boot order=scsi0 --serial0 socket --vga serial0
Resize the disk so there is enough room for Kubernetes components, container images, and workloads:
qm disk resize 9000 scsi0 50G
Set default cloud-init values. These will be overridden per-clone, but it helps to have sane defaults in the template:
qm set 9000 --ciuser ubuntu --sshkeys ~/.ssh/id_rsa.pub --ipconfig0 ip=dhcp
Convert the VM to a template. After this, you cannot start it directly; you can only clone it:
qm template 9000
The template is ready. Every clone will inherit the 50 GB disk, cloud-init drive, virtio NIC, and QEMU guest agent configuration.
Clone Three VMs from the Template
Clone the template three times to create the control plane node and two workers. Full clones are independent of the template, so you can modify or delete the template later without affecting running VMs.
qm clone 9000 110 --name k8s-cp01 --full
qm clone 9000 111 --name k8s-worker01 --full
qm clone 9000 112 --name k8s-worker02 --full
Configure Static IPs and Hostnames via Cloud-Init
Each VM needs a static IP, a gateway, a DNS server, and a hostname. Set these through Proxmox cloud-init parameters so they take effect on first boot.
Configure the control plane node:
qm set 110 --ipconfig0 ip=192.168.1.50/24,gw=192.168.1.1 \
--nameserver 192.168.1.1 --searchdomain lab.local
Configure the first worker:
qm set 111 --ipconfig0 ip=192.168.1.51/24,gw=192.168.1.1 \
--nameserver 192.168.1.1 --searchdomain lab.local
Configure the second worker:
qm set 112 --ipconfig0 ip=192.168.1.52/24,gw=192.168.1.1 \
--nameserver 192.168.1.1 --searchdomain lab.local
Now start all three VMs:
qm start 110
qm start 111
qm start 112
Give them 30 to 60 seconds to boot and apply cloud-init. Then verify SSH access from your workstation:
ssh [email protected] hostname
ssh [email protected] hostname
ssh [email protected] hostname
Each command should return the hostname you assigned. If SSH hangs or is refused, check that the VM has the correct IP and that your SSH public key was injected through cloud-init.
Add entries to /etc/hosts on all three nodes so they can resolve each other by name. SSH into each node and append:
cat <<EOF | sudo tee -a /etc/hosts
192.168.1.50 k8s-cp01
192.168.1.51 k8s-worker01
192.168.1.52 k8s-worker02
EOF
Prepare All Nodes
The following steps must be performed on all three nodes. SSH into each one, or use a tool like tmux with synchronized panes to run commands in parallel.
Disable Swap
Kubernetes requires swap to be turned off. The kubelet will refuse to start if swap is active.
sudo swapoff -a
sudo sed -i '/ swap / s/^/#/' /etc/fstab
Verify swap is off:
free -h
The swap line should show all zeros.
Load Kernel Modules
Kubernetes networking and containerd require the overlay and br_netfilter kernel modules. Load them now and make them persistent across reboots:
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF
sudo modprobe overlay
sudo modprobe br_netfilter
Set Sysctl Parameters
Enable IP forwarding and bridge netfilter so that iptables can see bridged traffic. Without these, pod-to-pod communication will not work:
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF
sudo sysctl --system
Confirm the values are applied:
sysctl net.bridge.bridge-nf-call-iptables net.bridge.bridge-nf-call-ip6tables net.ipv4.ip_forward
All three should return 1.
Install containerd on All Nodes
containerd is the standard container runtime for Kubernetes. We will install it from the official Docker repository, which ships the latest stable release.
Install prerequisite packages and add the Docker GPG key:
sudo apt-get update
sudo apt-get install -y ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg
Add the Docker repository. This gives us access to the containerd.io package:
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install -y containerd.io
Generate the default containerd configuration and enable the systemd cgroup driver. Kubernetes 1.32 expects systemd as the cgroup driver, and mismatches here cause the kubelet to crash-loop:
sudo mkdir -p /etc/containerd
containerd config default | sudo tee /etc/containerd/config.toml > /dev/null
sudo sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.toml
Restart containerd so it picks up the new configuration:
sudo systemctl restart containerd
sudo systemctl enable containerd
sudo systemctl status containerd --no-pager
You should see active (running) in the output. If it is not running, check journalctl -u containerd for errors.
Install kubeadm, kubelet, and kubectl
These three binaries are all you need to bootstrap and manage a Kubernetes cluster. We will pin them to version 1.32 so that future apt upgrades do not accidentally bump your cluster to an untested release.
Add the Kubernetes apt repository. Starting with Kubernetes 1.28, the project moved to community-managed package repositories hosted at pkgs.k8s.io:
sudo mkdir -p /etc/apt/keyrings
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.32/deb/Release.key | \
sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo "deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.32/deb/ /" | \
sudo tee /etc/apt/sources.list.d/kubernetes.list
Install the packages and hold them at the current version:
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl
Verify the installed versions:
kubeadm version
kubectl version --client
kubelet --version
All three should report a 1.32.x version. If they do not, double-check that the repository URL contains v1.32 and re-run the install.
Initialize the Kubernetes Cluster on the Control Plane
The remaining steps in this section run only on the control plane node (k8s-cp01). SSH into 192.168.1.50 to proceed.
Run kubeadm init with the pod network CIDR that Calico expects by default:
sudo kubeadm init \
--control-plane-endpoint=192.168.1.50 \
--pod-network-cidr=192.168.0.0/16 \
--apiserver-advertise-address=192.168.1.50 \
--kubernetes-version=1.32.0
This takes a few minutes. kubeadm pulls the control plane images, generates certificates, and writes static pod manifests. When it finishes, you will see a message with a kubeadm join command. Copy that command and save it somewhere safe. You will need it to join the worker nodes.
Set up kubectl access for your non-root user on the control plane:
mkdir -p $HOME/.kube
sudo cp /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Verify the cluster is responding:
kubectl get nodes
You should see k8s-cp01 in NotReady status. That is expected. The node will not become Ready until a CNI plugin is installed.
Install Calico CNI
Calico provides both networking and network policy enforcement. It is the most widely deployed CNI plugin in production Kubernetes clusters. We will install it using the Calico operator and custom resource method, which is the recommended approach for Kubernetes 1.32.
Install the Tigera operator:
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.28.0/manifests/tigera-operator.yaml
Download the custom resources manifest, verify the CIDR matches what you used in kubeadm init, and apply it:
curl -O https://raw.githubusercontent.com/projectcalico/calico/v3.28.0/manifests/custom-resources.yaml
Open custom-resources.yaml and confirm the cidr field is set to 192.168.0.0/16. If you used a different pod CIDR during kubeadm init, update it here to match. Then apply:
kubectl apply -f custom-resources.yaml
Watch the Calico pods come up:
watch kubectl get pods -n calico-system
Wait until all pods show Running status. This typically takes two to three minutes. Once Calico is healthy, check the node status again:
kubectl get nodes
The control plane node should now show Ready.
Join Worker Nodes to the Cluster
SSH into each worker node and run the kubeadm join command that was printed at the end of kubeadm init. The command looks like this:
sudo kubeadm join 192.168.1.50:6443 --token <your-token> \
--discovery-token-ca-cert-hash sha256:<your-hash>
If you lost the join command or the token has expired (tokens expire after 24 hours), generate a new one from the control plane:
kubeadm token create --print-join-command
Run the join command on both worker nodes. When each one finishes, go back to the control plane and check:
kubectl get nodes
After a minute or two, all three nodes should show Ready:
NAME STATUS ROLES AGE VERSION
k8s-cp01 Ready control-plane 10m v1.32.0
k8s-worker01 Ready <none> 2m v1.32.0
k8s-worker02 Ready <none> 90s v1.32.0
Optionally, label the worker nodes so you can target them with node selectors later:
kubectl label node k8s-worker01 node-role.kubernetes.io/worker=worker
kubectl label node k8s-worker02 node-role.kubernetes.io/worker=worker
Run a quick smoke test to verify the cluster is functional:
kubectl run nginx-test --image=nginx --restart=Never
kubectl get pod nginx-test -o wide
The pod should land on one of the worker nodes and reach Running status within 30 seconds. Clean it up when you are done:
kubectl delete pod nginx-test
Optional: Automate VM Creation with Terraform
Clicking through the Proxmox UI or running qm commands by hand works fine for a one-time setup, but if you plan to rebuild clusters regularly, Terraform is a better approach. The bpg/proxmox Terraform provider (formerly Telmate/proxmox) supports cloud-init, full clones, and all the VM settings we configured above.
Here is a minimal Terraform configuration that creates the three VMs from our template. Create a file named main.tf:
terraform {
required_providers {
proxmox = {
source = "bpg/proxmox"
version = ">= 0.66.0"
}
}
}
provider "proxmox" {
endpoint = "https://proxmox.lab.local:8006/"
username = "root@pam"
password = var.proxmox_password
insecure = true
}
variable "proxmox_password" {
type = string
sensitive = true
}
variable "nodes" {
default = {
"k8s-cp01" = { vmid = 110, ip = "192.168.1.50/24" }
"k8s-worker01" = { vmid = 111, ip = "192.168.1.51/24" }
"k8s-worker02" = { vmid = 112, ip = "192.168.1.52/24" }
}
}
resource "proxmox_virtual_environment_vm" "k8s" {
for_each = var.nodes
name = each.key
node_name = "pve"
vm_id = each.value.vmid
clone {
vm_id = 9000
full = true
}
cpu {
cores = 4
}
memory {
dedicated = 8192
}
initialization {
user_account {
username = "ubuntu"
keys = [file("~/.ssh/id_rsa.pub")]
}
ip_config {
ipv4 {
address = each.value.ip
gateway = "192.168.1.1"
}
}
dns {
servers = ["192.168.1.1"]
domain = "lab.local"
}
}
}
Initialize and apply:
terraform init
terraform plan
terraform apply
Terraform will create all three VMs in parallel, which is significantly faster than sequential qm clone commands. When you want to tear down the cluster and start fresh, run terraform destroy and the VMs are gone. Combine this with an Ansible playbook for the Kubernetes installation steps, and you have a fully automated pipeline from bare Proxmox to a running cluster.
Storage: Longhorn or Local-Path for Persistent Volumes
A Kubernetes cluster without persistent storage is useful only for stateless workloads. For anything that needs to survive a pod restart (databases, message queues, monitoring data), you need a StorageClass and a CSI driver. Here are two solid options for Proxmox-based clusters.
Option 1: Rancher Local Path Provisioner
This is the simplest option. It creates PersistentVolumes backed by directories on the node’s local filesystem. There is no replication, so if a node dies, the data on that node is gone. That said, for dev/test and single-node workloads, it is perfectly adequate.
Install it with a single command:
kubectl apply -f https://raw.githubusercontent.com/rancher/local-path-provisioner/v0.0.30/deploy/local-path-storage.yaml
Set it as the default StorageClass if you want PVCs to use it automatically:
kubectl patch storageclass local-path -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
Option 2: Longhorn
Longhorn is a distributed block storage system built for Kubernetes. It replicates data across nodes, supports snapshots and backups, and has a web UI for management. This is the better choice if you need data resilience across your cluster.
Install the prerequisites on all nodes:
sudo apt-get install -y open-iscsi nfs-common
sudo systemctl enable iscsid --now
Install Longhorn using Helm:
helm repo add longhorn https://charts.longhorn.io
helm repo update
helm install longhorn longhorn/longhorn \
--namespace longhorn-system \
--create-namespace \
--version 1.7.2
Monitor the installation:
kubectl -n longhorn-system get pods -w
Once all pods are running, Longhorn creates a default StorageClass called longhorn. You can access the Longhorn UI by port-forwarding the frontend service:
kubectl -n longhorn-system port-forward svc/longhorn-frontend 8080:80
Then open http://localhost:8080 in your browser to manage volumes, replicas, and backups.
Troubleshooting
Here are the most common issues you will run into and how to fix them.
Node stays in NotReady status
This almost always means the CNI plugin is not installed or not healthy. Check Calico pod status:
kubectl get pods -n calico-system
kubectl describe pods -n calico-system
If Calico pods are in CrashLoopBackOff, the most likely cause is a pod CIDR mismatch between kubeadm init and the Calico custom resources. Verify both use the same CIDR.
kubelet fails to start
Check the kubelet logs:
sudo journalctl -u kubelet -f
Common causes:
- Swap is still on. Run
free -hand verify the swap line shows zeros. Runsudo swapoff -aif needed. - Cgroup driver mismatch. The kubelet expects systemd, but containerd is set to cgroupfs. Verify
SystemdCgroup = truein/etc/containerd/config.tomland restart containerd. - containerd is not running. Check with
sudo systemctl status containerd.
kubeadm join fails with token errors
Tokens expire after 24 hours. Generate a new one from the control plane:
kubeadm token create --print-join-command
Also verify that the worker node can reach the control plane on port 6443. Test with:
curl -k https://192.168.1.50:6443/healthz
You should get back ok. If the connection times out, check firewall rules on both the Proxmox host and the VM.
Pods stuck in Pending or ContainerCreating
Describe the pod to find out why:
kubectl describe pod <pod-name>
Common reasons:
- Insufficient resources. The scheduler cannot find a node with enough CPU or memory. Check
kubectl describe nodeand look at the Allocatable vs. Allocated sections. - Image pull errors. The node cannot reach the container registry. Verify DNS resolution and internet connectivity on the node.
- PVC not bound. If the pod requests a PersistentVolumeClaim and no StorageClass is available, the PVC stays in Pending. Install one of the storage solutions described above.
Cloud-init did not apply on the VM
If a cloned VM boots with DHCP instead of the static IP you configured, cloud-init may not have run. Check inside the VM:
sudo cloud-init status
cat /var/log/cloud-init-output.log
Make sure the cloud-init drive (ide2) is present in the VM configuration. From the Proxmox host:
qm config 110 | grep ide2
If the cloud-init drive is missing, add it with qm set 110 --ide2 local-lvm:cloudinit, regenerate the cloud-init image with qm cloudinit update 110, and reboot the VM.
QEMU guest agent not responding
The QEMU guest agent lets Proxmox report the VM’s IP address and perform clean shutdowns. If it is not working, install it inside the VM:
sudo apt-get install -y qemu-guest-agent
sudo systemctl enable qemu-guest-agent --now
Verify in the Proxmox web UI that the VM shows an IP address on the Summary tab.
Summary
You now have a three-node Kubernetes 1.32 cluster running on Proxmox VE 8.x with Calico networking and your choice of persistent storage. The cloud-init template approach means you can destroy and recreate the cluster quickly, and the optional Terraform workflow takes that a step further by making the entire setup declarative and repeatable.
From here, consider adding an Ingress controller (Nginx or Traefik), a monitoring stack (Prometheus and Grafana), and a GitOps tool (Flux or ArgoCD) to round out the environment. If you plan to run this in production, look into setting up multiple control plane nodes for high availability and backing your Proxmox storage with Ceph for redundancy at the hypervisor level.



























































