Calico works fine until you need to debug why pod X can’t reach pod Y across namespaces. Then you’re staring at iptables dumps with thousands of rules, trying to trace a packet through KUBE-SVC chains, KUBE-SEP chains, DNAT rules, and masquerade rules. On a cluster with 500+ services, that’s not debugging. That’s archaeology.
Cilium takes a fundamentally different approach. Instead of iptables, it uses eBPF programs attached directly to Linux network interfaces to handle packet routing, load balancing, network policy enforcement, and observability. This guide walks through deploying Cilium as the CNI on a production Kubernetes HA cluster, replacing kube-proxy entirely, enforcing network policies, and using Hubble for real-time flow visibility. Every command here was tested on a 4-node cluster (3 control plane, 1 worker) running Kubernetes 1.35.3.
Current as of April 2026. Verified on Ubuntu 24.04.4 LTS (kernel 6.8.0-101-generic) with Kubernetes 1.35.3, Cilium 1.19.2, Hubble 1.18.6, containerd 2.2.2
What eBPF Changes for Kubernetes Networking
Traditional Kubernetes networking relies on kube-proxy to program iptables rules for ClusterIP, NodePort, and LoadBalancer services. Every service creates multiple iptables chains. At 100 services, you have hundreds of rules. At 1,000 services, you have thousands, and every packet traverses them linearly. Rule updates become expensive, and debugging a connectivity issue means reading rules that were never meant for human consumption.
eBPF (extended Berkeley Packet Filter) changes this by running small, verified programs directly inside the Linux kernel. These programs attach to network hooks (TC ingress/egress, XDP, cgroup sockets) and make forwarding decisions in a single hash-table lookup instead of walking iptables chains. Cilium compiles eBPF programs that handle service load balancing, network policy enforcement, and traffic accounting, all without touching iptables.
The practical difference: adding a new Kubernetes service with Cilium means inserting an entry into a BPF hash map (O(1) operation). With iptables, it means appending rules and re-evaluating the entire chain on every packet. At scale, this is the difference between milliseconds and seconds of latency during rule updates.
Prerequisites
Before installing Cilium, you need a working Kubernetes cluster initialized without kube-proxy. The steps below assume you have the cluster infrastructure ready.
- Kubernetes cluster: 3 control plane + 1 worker node (see Deploy HA Kubernetes Cluster with kubeadm)
- OS: Ubuntu 24.04.4 LTS with kernel 6.8.0-101-generic (kernel 5.10+ required, 5.15+ recommended)
- Container runtime: containerd 2.2.2
- Helm 3 installed on the machine where you run
cilium install - kubectl configured with cluster admin access (kubectl and kubectx guide)
- No existing CNI: the cluster must be initialized without a CNI plugin. Nodes will show NotReady until Cilium is deployed
Verify your kernel version on each node:
uname -r
The output should show kernel 5.10 or newer:
6.8.0-101-generic
Install the Cilium CLI
The Cilium CLI handles installation, upgrades, and connectivity testing. Install it on the machine where you manage the cluster (typically the first control plane node or your workstation).
CILIUM_CLI_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/cilium-cli/main/stable.txt)
CLI_ARCH=amd64
curl -L --fail --remote-name-all https://github.com/cilium/cilium-cli/releases/download/${CILIUM_CLI_VERSION}/cilium-linux-${CLI_ARCH}.tar.gz{,.sha256sum}
sha256sum --check cilium-linux-${CLI_ARCH}.tar.gz.sha256sum
sudo tar xzvfC cilium-linux-${CLI_ARCH}.tar.gz /usr/local/bin
rm cilium-linux-${CLI_ARCH}.tar.gz{,.sha256sum}
Confirm the CLI version:
cilium version --client
You should see the installed version:
cilium-cli: v0.19.2 compiled with go1.24.2 on linux/amd64
Bootstrap Kubernetes Without kube-proxy
Cilium replaces kube-proxy entirely when kubeProxyReplacement is enabled. To avoid conflicts, skip the kube-proxy addon during cluster initialization. If you already have a cluster with kube-proxy running, you can remove it later, but starting clean is simpler.
Initialize the first control plane node with --skip-phases=addon/kube-proxy:
sudo kubeadm init \
--control-plane-endpoint "10.0.1.10:6443" \
--upload-certs \
--pod-network-cidr=10.244.0.0/16 \
--skip-phases=addon/kube-proxy
The --skip-phases=addon/kube-proxy flag tells kubeadm not to deploy the kube-proxy DaemonSet. Without it, kube-proxy would install iptables rules that conflict with Cilium’s eBPF service routing. The --pod-network-cidr defines the CIDR range Cilium will use for pod IP allocation.
Set up kubectl access after initialization:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
At this point, the node shows NotReady because no CNI is installed yet. That’s expected.
kubectl get nodes
The node status confirms no CNI is active:
NAME STATUS ROLES AGE VERSION
cp01 NotReady control-plane 45s v1.35.3
Install Cilium with kube-proxy Replacement
Deploy Cilium with kube-proxy replacement enabled and Hubble for observability. The cilium install command uses Helm under the hood.
cilium install \
--set kubeProxyReplacement=true \
--set k8sServiceHost=10.0.1.10 \
--set k8sServicePort=6443 \
--set hubble.relay.enabled=true \
--set hubble.ui.enabled=true
The k8sServiceHost and k8sServicePort point Cilium to the API server directly since kube-proxy isn’t available to route ClusterIP traffic. On a multi-control-plane setup, use the load balancer IP.
Wait for Cilium to become ready (usually 60 to 90 seconds):
cilium status --wait
All components should show OK:
/¯¯\
/¯¯\__/¯¯\ Cilium: OK
\__/¯¯\__/ Operator: OK
/¯¯\__/¯¯\ Envoy DaemonSet: OK
\__/¯¯\__/ Hubble Relay: OK
\__/ ClusterMesh: disabled
DaemonSet cilium Desired: 1, Ready: 1/1
DaemonSet cilium-envoy Desired: 1, Ready: 1/1
Deployment cilium-operator Desired: 1, Ready: 1/1
Deployment hubble-relay Desired: 1, Ready: 1/1
Deployment hubble-ui Desired: 1, Ready: 1/1
Containers: cilium Running: 1
cilium-envoy Running: 1
cilium-operator Running: 1
hubble-relay Running: 1
hubble-ui Running: 1
Verify that no kube-proxy pods exist in the cluster:
kubectl get pods -n kube-system -l k8s-app=kube-proxy
This should return nothing:
No resources found in kube-system namespace.
Confirm kube-proxy replacement is active by checking the Cilium configuration:
kubectl -n kube-system exec ds/cilium -- cilium-dbg status | grep KubeProxyReplacement
The output confirms Cilium is handling all service routing:
KubeProxyReplacement: True [eth0 10.0.1.10 (Direct Routing)]
Join Additional Nodes
Join the remaining control plane and worker nodes using the standard kubeadm join command. No special Cilium flags are needed for joining nodes because Cilium runs as a DaemonSet and automatically deploys to new nodes.
For additional control plane nodes:
sudo kubeadm join 10.0.1.10:6443 \
--token <token> \
--discovery-token-ca-cert-hash sha256:<hash> \
--control-plane --certificate-key <cert-key>
For the worker node:
sudo kubeadm join 10.0.1.10:6443 \
--token <token> \
--discovery-token-ca-cert-hash sha256:<hash>
Once all nodes join, verify the cluster is fully ready:
kubectl get nodes -o wide
All four nodes should show Ready status:
NAME STATUS ROLES AGE VERSION INTERNAL-IP OS-IMAGE
cp01 Ready control-plane 12m v1.35.3 10.0.1.10 Ubuntu 24.04.4 LTS
cp02 Ready control-plane 8m v1.35.3 10.0.1.11 Ubuntu 24.04.4 LTS
cp03 Ready control-plane 6m v1.35.3 10.0.1.12 Ubuntu 24.04.4 LTS
worker01 Ready <none> 4m v1.35.3 10.0.1.20 Ubuntu 24.04.4 LTS
Cilium automatically deploys its agent and Envoy proxy to each new node. Check that all Cilium pods are running:
kubectl get pods -n kube-system -l k8s-app=cilium -o wide
You should see one Cilium pod per node, all in Running state:
NAME READY STATUS RESTARTS AGE NODE
cilium-4xk7p 1/1 Running 0 12m cp01
cilium-8bnmq 1/1 Running 0 8m cp02
cilium-j2nvr 1/1 Running 0 6m cp03
cilium-wmz5t 1/1 Running 0 4m worker01
Verify the eBPF Datapath
With all nodes running, inspect the eBPF datapath configuration. This confirms Cilium is using the correct routing mode, masquerading method, and BPF map allocation.
kubectl -n kube-system exec ds/cilium -- cilium-dbg status --verbose
Key sections from the output (trimmed for readability):
KubeProxyReplacement: True [eth0 10.0.1.10 (Direct Routing)]
Routing: Tunnel [vxlan]
Masquerading: IPTables [IPv4: Enabled, IPv6: Disabled]
Controller Status: 48/48 healthy
IP MASQ: IPTables
IPv4 BIG TCP: Disabled
IPv6 BIG TCP: Disabled
BandwidthManager: Disabled
Connectivity: OK
BPF Maps: dynamic sizing: on
Name Size
Auth: 524288
Non-TCP connection tracking: 262144
TCP connection tracking: 524288
Endpoint policy: 65536
IP cache: 512000
NAT: 524288
Neighbor table: 524288
Session affinity: 65536
Signal: 524288
Sock reverse NAT: 65536
Tunnel: 65536
Cluster health: 4/4 reachable (2026-04-02T14:23:15Z)
The output confirms tunnel mode with VXLAN encapsulation, IPTables masquerading for outbound traffic, and dynamically sized BPF maps. All 4 nodes are reachable in the cluster health check.
Check the BPF conntrack table to see active connections tracked by Cilium’s eBPF datapath:
kubectl -n kube-system exec ds/cilium -- cilium-dbg bpf ct list global | wc -l
On our test cluster with basic workloads running:
1017
That’s 1,017 connection tracking entries managed entirely in eBPF maps, with zero iptables conntrack involvement.
Now run the built-in connectivity test suite. This deploys test pods and validates pod-to-pod, pod-to-service, and pod-to-external connectivity:
cilium connectivity test
The test takes a few minutes. On our 4-node cluster, all tests passed:
✅ All 4 tests (42 actions) successful, 0 tests skipped, 1 scenarios skipped.
If any test fails, check the Cilium agent logs with kubectl -n kube-system logs ds/cilium and look for BPF program compilation errors or connectivity issues between nodes.
Network Policies with Cilium
Kubernetes NetworkPolicy is where Cilium’s eBPF advantage becomes most visible. Traditional CNIs translate policies into iptables rules, which are hard to debug and slow to update. Cilium compiles policies into BPF programs that evaluate at the kernel level with per-packet granularity and full Hubble visibility into what was allowed or denied.
To demonstrate, create a demo namespace with three deployments: frontend, backend, and database.
kubectl create namespace demo
kubectl -n demo run frontend --image=nginx:alpine --labels="app=frontend" --port=80
kubectl -n demo run backend --image=nginx:alpine --labels="app=backend" --port=80
kubectl -n demo run database --image=nginx:alpine --labels="app=database" --port=80
kubectl -n demo expose pod frontend --port=80
kubectl -n demo expose pod backend --port=80
kubectl -n demo expose pod database --port=80
Wait for all pods to be running, then confirm every pod can reach every other pod (no policies yet):
kubectl -n demo exec frontend -- curl -s --max-time 3 backend
kubectl -n demo exec frontend -- curl -s --max-time 3 database
kubectl -n demo exec backend -- curl -s --max-time 3 database
All three commands return the Nginx welcome HTML. With no network policy in place, all pod-to-pod traffic flows freely.
Default Deny All Ingress
The first step in any production network policy strategy is default-deny. This blocks all ingress traffic to pods in the namespace unless explicitly allowed by a policy.
cat <<'POLICY' | kubectl apply -f -
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-ingress
namespace: demo
spec:
podSelector: {}
policyTypes:
- Ingress
POLICY
Now test connectivity again:
kubectl -n demo exec frontend -- curl -s --max-time 3 backend
The request times out with exit code 28, confirming the default-deny policy is enforced:
curl: (28) Connection timed out after 3001 milliseconds
command terminated with exit code 28
All three pods are now isolated. No pod can reach any other pod on ingress. This is the foundation of zero-trust networking in Kubernetes.
Allow Frontend to Backend
With default-deny in place, selectively allow the frontend pod to reach the backend on port 80:
cat <<'POLICY' | kubectl apply -f -
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-frontend-to-backend
namespace: demo
spec:
podSelector:
matchLabels:
app: backend
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
app: frontend
ports:
- protocol: TCP
port: 80
POLICY
Test from frontend to backend:
kubectl -n demo exec frontend -- curl -s --max-time 3 backend
The frontend now gets the Nginx response from backend:
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
...
But frontend to database is still blocked:
kubectl -n demo exec frontend -- curl -s --max-time 3 database
Connection times out as expected:
curl: (28) Connection timed out after 3001 milliseconds
command terminated with exit code 28
Allow Backend to Database
Now open the path from backend to database, completing the three-tier architecture (frontend -> backend -> database, with no direct frontend -> database access):
cat <<'POLICY' | kubectl apply -f -
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-backend-to-database
namespace: demo
spec:
podSelector:
matchLabels:
app: database
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
app: backend
ports:
- protocol: TCP
port: 80
POLICY
Verify backend reaches database:
kubectl -n demo exec backend -- curl -s --max-time 3 database
Backend gets the HTML response from database. Now confirm frontend still cannot bypass backend and reach database directly:
kubectl -n demo exec frontend -- curl -s --max-time 3 database
Still denied:
curl: (28) Connection timed out after 3001 milliseconds
command terminated with exit code 28
This is exactly the traffic flow you want in a production three-tier application: frontend talks to backend, backend talks to database, and the frontend has no direct database access. Network policies enforced by eBPF, not iptables chains.
Hubble: Network Observability
Hubble is Cilium’s built-in observability layer. It captures every network flow processed by the eBPF datapath, including which policies allowed or denied each connection. This is what makes debugging network policies practical instead of guesswork.
Install the Hubble CLI
Download and install the Hubble CLI binary:
HUBBLE_VERSION=$(curl -s https://raw.githubusercontent.com/cilium/hubble/master/stable.txt)
HUBBLE_ARCH=amd64
curl -L --fail --remote-name-all https://github.com/cilium/hubble/releases/download/$HUBBLE_VERSION/hubble-linux-${HUBBLE_ARCH}.tar.gz{,.sha256sum}
sha256sum --check hubble-linux-${HUBBLE_ARCH}.tar.gz.sha256sum
sudo tar xzvfC hubble-linux-${HUBBLE_ARCH}.tar.gz /usr/local/bin
rm hubble-linux-${HUBBLE_ARCH}.tar.gz{,.sha256sum}
Enable port forwarding to the Hubble relay service:
cilium hubble port-forward &
Verify the Hubble connection and flow capacity:
hubble status
The output shows the flow buffer and current throughput:
Healthcheck (via localhost:4245): Ok
Current/Max Flows: 4095/4095 (100.00%)
Flows/s: 21.38
Connected Nodes: 4/4
4,095 flows buffered across all 4 nodes, with about 21 flows per second. In production clusters with real traffic, you’ll want to increase the buffer size (covered in the performance section below).
Observing Policy Decisions in Real Time
Hubble’s real power is seeing exactly which policies allowed or denied each flow. Watch the demo namespace while generating traffic:
hubble observe --namespace demo --verdict DROPPED
In another terminal, try the blocked frontend-to-database connection. Hubble shows the denied flow with the specific policy verdict:
Apr 2 14:31:07.892: demo/frontend:43210 (ID:17901) -> demo/database:80 (ID:29433) Policy denied DROPPED (TCP Flags: SYN)
Apr 2 14:31:08.903: demo/frontend:43210 (ID:17901) -> demo/database:80 (ID:29433) Policy denied DROPPED (TCP Flags: SYN)
Each flow entry includes the source pod and identity (ID:17901), destination pod and identity (ID:29433), the port, the verdict (DROPPED), the reason (Policy denied), and the TCP flags. The identity IDs are Cilium’s security identities, assigned based on pod labels. This is how Cilium evaluates policies without relying on IP addresses.
Now observe allowed flows:
hubble observe --namespace demo --verdict FORWARDED -l app=backend
When backend reaches database, Hubble shows the forwarded flow:
Apr 2 14:32:41.115: demo/backend:52840 (ID:22156) -> demo/database:80 (ID:29433) to-endpoint FORWARDED (TCP Flags: SYN)
Apr 2 14:32:41.116: demo/backend:52840 (ID:22156) -> demo/database:80 (ID:29433) to-endpoint FORWARDED (TCP Flags: ACK)
Compare this to debugging the same scenario with iptables. With Cilium and Hubble, you see the source, destination, verdict, and reason in one line. With iptables, you’d need to enable LOG targets, parse syslog, and correlate across multiple chain traversals. Not comparable.
Hubble UI
Hubble UI provides a graphical service map showing real-time traffic flows between pods. Since we enabled it during installation, expose it locally:
kubectl -n kube-system port-forward svc/hubble-ui 12000:80 &
Access the UI at http://localhost:12000. Select the demo namespace to see a visual service dependency map with green (allowed) and red (denied) flow lines between frontend, backend, and database pods. In production, expose this behind an ingress or Gateway API route with authentication.
Cilium vs Calico vs Flannel
Choosing a CNI depends on your cluster size, policy requirements, and operational complexity tolerance. Here’s how the three most common options compare based on production experience.
| Feature | Cilium | Calico | Flannel |
|---|---|---|---|
| Datapath | eBPF (kernel-level) | iptables or eBPF (beta) | VXLAN overlay only |
| kube-proxy replacement | Yes (production-ready) | Yes (with eBPF mode) | No |
| NetworkPolicy support | Full K8s + CiliumNetworkPolicy | Full K8s + Calico policies | None |
| L7 policy (HTTP/gRPC) | Yes (built-in via Envoy) | Limited (requires Istio) | No |
| Observability | Hubble (flows, metrics, UI) | Calico Enterprise only | None |
| Encryption (WireGuard) | Yes (native) | Yes (native) | No |
| Multi-cluster (ClusterMesh) | Yes | Yes (Calico Federation) | No |
| Minimum kernel | 5.10+ (5.15+ recommended) | 3.10+ | 3.10+ |
| Resource overhead | Higher (eBPF compilation) | Moderate | Low |
| Complexity | High (powerful but more to learn) | Medium | Low |
| Best for | Large clusters, security-focused, observability | General production, hybrid clouds | Dev/test, simple clusters |
Calico remains a solid choice for clusters that need standard network policies without the eBPF learning curve. Flannel is fine for development environments where you don’t need network policies at all. Cilium pulls ahead when you need kube-proxy replacement at scale, L7 policy enforcement, or built-in observability without deploying a separate service mesh.
Performance Considerations
Cilium’s eBPF datapath offers measurable advantages at scale, but it’s not a free upgrade. Understanding where the gains come from (and what the tradeoffs are) helps you make informed decisions.
eBPF vs iptables at Scale
The performance difference between eBPF and iptables grows with cluster size. With 100 Kubernetes services, iptables works fine because the rule chain is manageable. At 1,000 services, iptables creates roughly 10,000+ rules, and every packet walks through them sequentially until it finds a match. Rule updates require locking the iptables table, which can cause brief traffic drops during updates.
Cilium’s eBPF service map uses hash tables for O(1) lookups regardless of the number of services. Adding service number 5,000 has the same lookup cost as service number 1. Rule updates are atomic BPF map operations with no lock contention. In clusters with 2,000+ services, this translates to measurably lower latency on service routing and faster pod startup times (no waiting for iptables rule sync).
BPF Map Sizing
Cilium dynamically sizes BPF maps based on system memory, but for large clusters you may need to tune them. The key maps to watch:
- CT (connection tracking): Default 524,288 entries for TCP. Each entry consumes about 100 bytes. At 50,000 concurrent connections, you’re using roughly 5MB. Increase with
--set bpf.ctTcpMax=1048576for busy clusters - NAT map: Sized to match CT. If you increase CT, increase NAT to match
- Policy map: 65,536 entries per endpoint by default. Only needs tuning if a single pod has thousands of distinct policy rules (unusual)
- IP cache: 512,000 entries. Needs tuning only in very large multi-cluster setups
Monitor map utilization with:
kubectl -n kube-system exec ds/cilium -- cilium-dbg bpf ct list global -o json | python3 -c "import sys,json; entries=json.load(sys.stdin); print(f'Active entries: {len(entries)}')"
If active entries approach 80% of the map maximum, increase the limit before it fills up. A full BPF map means new connections get dropped silently.
Hubble Performance Impact
Hubble adds observability overhead. Each Cilium agent exports flow events to the Hubble relay, which aggregates them. The default ring buffer holds 4,095 flows per node. For production clusters with high traffic volume, increase the buffer:
cilium upgrade \
--set hubble.metrics.enabled="{dns,drop,tcp,flow,port-distribution,icmp,httpV2}" \
--set hubble.relay.replicas=3 \
--set hubble.eventBufferCapacity=65535
On clusters processing 10,000+ flows per second, Hubble relay should have dedicated resources (CPU and memory requests/limits) to avoid contention with other kube-system workloads.
When to Choose Cilium
Cilium is worth the operational complexity when your cluster has 500+ services where iptables rule churn becomes a bottleneck, when you need L7-aware network policies (HTTP path/method matching) without deploying Istio, when network policy debugging is eating your team’s time (Hubble pays for itself fast), or when you’re running multi-cluster setups that need ClusterMesh. For small clusters under 50 nodes with basic network policy needs, Calico is simpler to operate and has a lower resource footprint.
Whatever CNI you choose, make sure your etcd backups cover the cluster state, and that your monitoring stack captures CNI-level metrics. A misconfigured CNI can take down the entire cluster, and recovering without backups means rebuilding from scratch.
For more on Cilium’s architecture and advanced features like Cluster Mesh, BGP integration, and bandwidth management, refer to the upstream documentation.