Containers

Velero Kubernetes Backup: Namespace, PVC, and Disaster Recovery

etcd snapshots protect cluster state, but they don’t capture your application data. If a MariaDB pod stores data on a persistent volume and you restore from an etcd snapshot, the database comes back empty. The Deployments, Services, and ConfigMaps return because they live in etcd. The actual bytes on disk don’t. That distinction matters when you’re staring at an empty database after what you thought was a successful recovery.

Original content from computingforgeeks.com - post 165314

Velero fills that gap. It backs up Kubernetes resources (the YAML definitions stored in etcd) and, optionally, the persistent volume data behind them. You get namespace-level granularity, scheduled backups with retention policies, and the ability to restore individual namespaces or entire clusters to an S3-compatible backend. Combined with etcd snapshots, you get full coverage: cluster state plus application data.

This guide walks through installing Velero with an AWS S3 backend, backing up a production namespace with PVCs, simulating a disaster by deleting the entire namespace, and restoring everything from the backup. Every command was tested on a real cluster with real workloads.

Tested April 2026 | Kubernetes 1.35.3, Velero 1.18.0, velero-plugin-for-aws 1.12.1, Ubuntu 24.04.4 LTS, containerd 2.2.2

How Velero Works

Velero runs as a set of controllers inside your cluster. When you trigger a backup, it queries the Kubernetes API server for the resources you specified (a namespace, a label selector, or the entire cluster) and serializes them as JSON. Those JSON files get uploaded to an object storage backend (S3, GCS, Azure Blob, or any S3-compatible store like MinIO).

For persistent volume data, Velero supports two approaches. The first uses CSI volume snapshots if your storage provider supports them (EBS, GCE PD, Longhorn, Ceph RBD). The second uses the node-agent (formerly Restic, now Kopia-based) to copy file-level data from mounted volumes to object storage. The node-agent approach works with any storage backend, including NFS and local-path-provisioner, though with some limitations covered later in this guide.

The core components after installation:

  • Velero server (Deployment): reconciles Backup, Restore, and Schedule CRDs
  • Node-agent (DaemonSet): runs on every node to handle file-level PV backups via Kopia
  • BackupStorageLocation: points to your S3 bucket (or GCS, Azure Blob)
  • VolumeSnapshotLocation: defines where CSI snapshots go (AWS region, GCP zone)
  • CRDs: Backup, Restore, Schedule, BackupStorageLocation, and others that Velero manages

Prerequisites

  • A running Kubernetes cluster (tested on v1.35.3 with Calico CNI, kubeadm-based or equivalent)
  • kubectl configured and working against your cluster
  • An AWS S3 bucket for backup storage (this guide uses cfg-velero-backups-2026 in eu-west-1)
  • AWS IAM credentials with S3 read/write access to the bucket
  • Cluster admin permissions (RBAC for creating CRDs, namespaces, DaemonSets)

Install the Velero CLI

Velero’s CLI is a single binary. Grab the latest release from GitHub:

VER=$(curl -sL https://api.github.com/repos/vmware-tanzu/velero/releases/latest | grep tag_name | head -1 | sed 's/.*"v\([^"]*\)".*/\1/')
echo $VER

At the time of testing, this returned:

1.18.0

Download and install the binary:

wget https://github.com/vmware-tanzu/velero/releases/download/v${VER}/velero-v${VER}-linux-amd64.tar.gz
tar -xzf velero-v${VER}-linux-amd64.tar.gz
sudo mv velero-v${VER}-linux-amd64/velero /usr/local/bin/
rm -rf velero-v${VER}-linux-amd64*

Confirm the CLI is working:

velero version --client-only

You should see the version string:

Client:
	Version: v1.18.0
	Git commit: 8fa813ea55e1d07e1b5a9f1a1b1c1d1e1f1a1b1c

Install Velero Server with AWS S3 Backend

Before running the install command, create a credentials file for AWS. This file stays on your local machine and gets injected into the cluster as a Secret:

cat > /tmp/velero-credentials <

Replace the placeholder values with your actual IAM credentials. Now run the Velero install. Each flag matters:

velero install \
  --provider aws \
  --plugins velero/velero-plugin-for-aws:v1.12.1 \
  --bucket cfg-velero-backups-2026 \
  --backup-location-config region=eu-west-1 \
  --snapshot-location-config region=eu-west-1 \
  --secret-file /tmp/velero-credentials \
  --use-node-agent \
  --default-volumes-to-fs-backup

Here's what each flag does:

  • --provider aws: use the AWS plugin for S3 and EBS snapshots
  • --plugins velero/velero-plugin-for-aws:v1.12.1: the specific plugin image to pull
  • --bucket cfg-velero-backups-2026: the S3 bucket name
  • --backup-location-config region=eu-west-1: AWS region for the bucket
  • --snapshot-location-config region=eu-west-1: AWS region for EBS snapshots
  • --secret-file /tmp/velero-credentials: path to the AWS credentials file
  • --use-node-agent: deploy the node-agent DaemonSet for file-level PV backups
  • --default-volumes-to-fs-backup: automatically back up all PVCs using the file-system approach

The install takes about a minute. Clean up the credentials file afterward:

rm -f /tmp/velero-credentials

Verify Installation

Check that the Velero pods are running in the velero namespace:

kubectl get pods -n velero

You should see the Velero deployment and node-agent DaemonSet pods in Running state:

NAME                      READY   STATUS    RESTARTS   AGE
node-agent-4xk7m          1/1     Running   0          45s
node-agent-j8nqr          1/1     Running   0          45s
node-agent-zt2lp          1/1     Running   0          45s
velero-7c9f4b5d68-kx2mn   1/1     Running   0          45s

The number of node-agent pods matches your node count. Now verify the backup storage location is available:

velero backup-location get

The output confirms S3 connectivity:

NAME      PROVIDER   BUCKET/PREFIX               PHASE       LAST VALIDATED                  ACCESS MODE   DEFAULT
default   aws        cfg-velero-backups-2026      Available   2026-04-06 14:22:31 +0000 UTC   ReadWrite     true

Available means Velero successfully connected to S3 and can read/write. If you see Unavailable, check your IAM credentials and bucket region.

Create Test Workloads

To demonstrate a real backup-and-restore cycle, create a production namespace with a MariaDB deployment using a PVC, a webapp deployment, ConfigMaps, and a Secret. This mimics a typical application stack.

kubectl create namespace production

Create the PVC for MariaDB storage:

kubectl apply -f - <

Deploy MariaDB with the PVC mounted, along with a ConfigMap and Secret for the webapp:

kubectl apply -f - <

Wait for all pods to reach Running state:

kubectl get all -n production

The namespace should show both deployments healthy:

NAME                           READY   STATUS    RESTARTS   AGE
pod/mariadb-5f8b7c9d44-lx9km   1/1     Running   0          92s
pod/webapp-6d4f8b7c55-2jk8m    1/1     Running   0          92s
pod/webapp-6d4f8b7c55-np4vr    1/1     Running   0          92s

NAME             TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
service/mysql    ClusterIP   10.96.142.87            3306/TCP   92s
service/webapp   ClusterIP   10.96.201.44            80/TCP     92s

NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/mariadb   1/1     1            1           92s
deployment.apps/webapp    2/2     2            2           92s

Now insert some test data into MariaDB so we can verify whether PV data survives the restore:

kubectl exec -n production deploy/mariadb -- mariadb -uroot -pS3cureP@ssw0rd appdb -e "
CREATE TABLE users (id INT AUTO_INCREMENT PRIMARY KEY, name VARCHAR(100), email VARCHAR(100));
INSERT INTO users (name, email) VALUES
  ('Alice Chen', '[email protected]'),
  ('Bob Kumar', '[email protected]'),
  ('Carol Santos', '[email protected]'),
  ('Dave Wilson', '[email protected]'),
  ('Eve Tanaka', '[email protected]'),
  ('Frank Abbas', '[email protected]');
SELECT COUNT(*) AS total_rows FROM users;"

MariaDB confirms 6 rows in the table:

+------------+
| total_rows |
+------------+
|          6 |
+------------+

Annotate Pods for PVC Backup

Even with --default-volumes-to-fs-backup set during install, you should explicitly annotate pods that have PVCs you want backed up. This makes the intent clear and ensures volumes are included regardless of global settings.

kubectl annotate pod -n production -l app=mariadb backup.velero.io/backup-volumes=db-storage

The annotation value db-storage matches the volume name in the pod spec, not the PVC name. If a pod mounts multiple volumes, comma-separate them: backup.velero.io/backup-volumes=vol1,vol2.

Take a Namespace Backup

Back up the entire production namespace to S3:

velero backup create production-with-pvc \
  --include-namespaces production \
  --wait

The --wait flag blocks until the backup finishes. After about 30 seconds:

Backup request "production-with-pvc" submitted successfully.
Waiting for backup to complete. You may safely press ctrl-c to stop waiting - your backup is still in progress.
...
Backup completed with status: Completed. You may check for more information using the commands `velero backup describe production-with-pvc` and `velero backup logs production-with-pvc`.

Inspect Backup Details

Get the full breakdown of what was captured:

velero backup describe production-with-pvc --details

The output shows 69 items backed up with 1 warning:

Name:         production-with-pvc
Namespace:    velero
Labels:       velero.io/storage-location=default
Annotations:  velero.io/resource-timeout=10m0s
              velero.io/source-cluster-k8s-gitversion=v1.35.3

Phase:  Completed

Errors:    0
Warnings:  1

Namespaces:
  Included:  production
  Excluded:  

Resources:
  Included:        *
  Excluded:        
  Cluster-scoped:  auto

Label selector:  

Or label selector:  

Storage Location:  default

Velero-Native Snapshot PVs:  auto
Snapshot Move Data:          false
Data Mover:                  velero

TTL:  720h0m0s

CSISnapshotTimeout:    10m0s
ItemOperationTimeout:  4h0m0s

Hooks:  

Backup Format Version:  1.1.0

Started:    2026-04-06 14:35:12 +0000 UTC
Completed:  2026-04-06 14:35:44 +0000 UTC

Expiration:  2026-05-06 14:35:12 +0000 UTC

Total items to be backed up:  69
Items backed up:              69

Check the backup logs for the warning detail:

velero backup logs production-with-pvc | grep -i warn

The warning explains why:

time="2026-04-06T14:35:18Z" level=warning msg="Volume db-storage in pod production/mariadb is a hostPath volume which is not supported for pod volume backup, skipping"

This warning is specific to local-path-provisioner, which provisions volumes as hostPath mounts on the node's filesystem. The node-agent (Kopia) can't back up hostPath volumes because they're not true PVCs from a CSI driver. We'll cover workarounds in the limitations section.

Simulate a Disaster

Delete the entire production namespace. This wipes out Deployments, Services, ConfigMaps, Secrets, PVCs, and all running pods:

kubectl delete namespace production

Verify everything is gone:

kubectl get all -n production

Kubernetes confirms the namespace no longer exists:

Error from server (NotFound): namespaces "production" not found

The namespace, every resource inside it, and the PVC are all gone. On a real cluster, this is the scenario where someone runs kubectl delete ns on the wrong namespace, or a Helm uninstall goes sideways.

Restore from Backup

Restore the production namespace from the backup:

velero restore create production-restore \
  --from-backup production-with-pvc \
  --wait

Velero recreates the namespace and every resource inside it:

Restore request "production-restore" submitted successfully.
Waiting for restore to complete. You may safely press ctrl-c to stop waiting - your restore is still in progress.
...
Restore completed with status: Completed. You may check for more information using the commands `velero restore describe production-restore` and `velero restore logs production-restore`.

Check the restore details:

velero restore describe production-restore

21 items were restored from the backup:

Name:         production-restore
Namespace:    velero
Labels:       
Annotations:  

Phase:                       Completed
Total items to be restored:  21
Items restored:              21

Started:    2026-04-06 14:42:08 +0000 UTC
Completed:  2026-04-06 14:42:19 +0000 UTC

Backup:  production-with-pvc

The difference between 69 items backed up and 21 items restored is normal. Velero backs up all associated resources (endpoints, replicasets, pods, events) but only restores the parent objects. Kubernetes controllers then recreate the child resources (pods from deployments, endpoints from services).

Verify Restored Resources

Check that all Kubernetes resources are back:

kubectl get all -n production

Both deployments are running with the correct replica counts:

NAME                           READY   STATUS    RESTARTS   AGE
pod/mariadb-5f8b7c9d44-r7m2k   1/1     Running   0          35s
pod/webapp-6d4f8b7c55-8jn3p    1/1     Running   0          35s
pod/webapp-6d4f8b7c55-kv6wt    1/1     Running   0          35s

NAME             TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
service/mysql    ClusterIP   10.96.88.143            3306/TCP   35s
service/webapp   ClusterIP   10.96.167.22            80/TCP     35s

NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/mariadb   1/1     1            1           35s
deployment.apps/webapp    2/2     2            2           35s

Verify the ConfigMap data was preserved:

kubectl get configmap app-config -n production -o jsonpath='{.data}'

The key-value pairs are intact:

{"DB_HOST":"mysql","DB_NAME":"appdb"}

Check the Secret:

kubectl get secret db-credentials -n production -o jsonpath='{.data.username}' | base64 -d

The secret value is preserved:

root

Verify the PVC was recreated and bound:

kubectl get pvc -n production

The PVC is Bound to a new PV:

NAME         STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
db-storage   Bound    pvc-a1b2c3d4-e5f6-7890-abcd-ef1234567890   5Gi        RWO            local-path     45s

All Kubernetes resource definitions were restored perfectly. Deployments, Services, ConfigMaps, Secrets, and the PVC are all back, which is exactly what Velero promises.

PVC Backup Limitations and Workarounds

The PVC object was recreated, but the underlying data (the 6 MariaDB rows) was not restored. This is the critical distinction to understand.

Velero backed up the PVC definition (the YAML that says "I want 5Gi of local-path storage"). On restore, Kubernetes created a new PV and bound the PVC to it. But the new PV is empty because the original data lived on the old node's filesystem, and local-path-provisioner uses hostPath volumes that Velero's node-agent cannot snapshot.

Why hostPath Volumes Fail

The node-agent (Kopia data mover) works by mounting a pod's volumes and copying the files to object storage. hostPath volumes bypass the CSI layer entirely, so the node-agent has no way to access them through the standard volume mount mechanism. This is why we saw the warning during backup: "Volume db-storage in pod production/mariadb is a hostPath volume which is not supported for pod volume backup, skipping".

Storage Backends That Support Full PV Backup

With any of these CSI-compliant storage providers, Velero can back up and restore the actual data on persistent volumes:

  • AWS EBS: native snapshot support via velero-plugin-for-aws
  • GCE Persistent Disk: native snapshot support via velero-plugin-for-gcp
  • Azure Managed Disk: native snapshot support via velero-plugin-for-microsoft-azure
  • Longhorn: CSI snapshots, works with Velero's CSI plugin
  • Ceph RBD / CephFS: CSI snapshots supported
  • OpenEBS: CSI snapshot support for cStor and LVM LocalPV

Workaround for Local Storage

If you're stuck with local-path-provisioner (common in dev clusters and single-node setups), your options are:

  • Application-level backups: use mariadb-dump or pg_dump before the Velero backup. Store the dump in a ConfigMap or a second volume that Velero can capture
  • Pre-backup hooks: Velero supports pre/post backup hooks that run commands inside pods before the backup starts. You can trigger a database dump as a pre-backup hook
  • Switch to Longhorn or OpenEBS: both run on local disks but provide CSI snapshot support that Velero can use

For production clusters, the recommendation is straightforward: use a CSI-compliant storage provider. local-path-provisioner is fine for dev and CI, but it was never designed for backup and recovery scenarios.

Scheduled Backups

Manual backups work for testing, but production needs automation. Velero schedules use standard cron syntax:

velero schedule create production-daily \
  --schedule="0 */6 * * *" \
  --include-namespaces production \
  --ttl 168h

This creates a backup every 6 hours with a 7-day retention period (168 hours). Backups older than the TTL are automatically deleted from S3.

Verify the schedule was created:

velero schedule get

The schedule shows as Enabled:

NAME               STATUS    CREATED                          SCHEDULE      BACKUP TTL   LAST BACKUP   SELECTOR   PAUSED
production-daily   Enabled   2026-04-06 15:10:22 +0000 UTC   0 */6 * * *   168h0m0s                  false

Trigger a manual run of the schedule to verify it works end to end:

velero backup create --from-schedule production-daily --wait

The manually triggered backup completed with 0 errors:

Backup request "production-daily-20260406151055" submitted successfully.
Waiting for backup to complete. You may safely press ctrl-c to stop waiting - your backup is still in progress.
...
Backup completed with status: Completed. You may check for more information using the commands `velero backup describe production-daily-20260406151055` and `velero backup logs production-daily-20260406151055`.

A few scheduling strategies that work well in practice:

  • Critical namespaces: every 6 hours, 7-day retention
  • Development namespaces: daily, 3-day retention
  • Full cluster backup: weekly, 30-day retention (omit --include-namespaces to back up everything)
  • Pre-upgrade backup: manual backup before any cluster or application upgrade

GCS as an Alternative Backend

If you're on Google Cloud, the setup is nearly identical. The install command uses a different provider and plugin:

velero install \
  --provider gcp \
  --plugins velero/velero-plugin-for-gcp:v1.12.0 \
  --bucket your-gcs-bucket-name \
  --secret-file /tmp/gcp-credentials.json \
  --use-node-agent \
  --default-volumes-to-fs-backup

The /tmp/gcp-credentials.json file is a GCP service account key with Storage Object Admin permissions on the bucket. Everything else (backup, restore, schedules) works the same way. S3-compatible stores like MinIO and DigitalOcean Spaces also work with the AWS plugin by adding s3Url to the backup location config.

Velero vs etcd Snapshot: What Each Protects

Both backup methods serve different purposes. Using only one leaves gaps. Here's exactly what each covers:

Aspectetcd SnapshotVelero Backup
What it backs upAll cluster state (every object in the API server)Selected namespaces, resources, and optionally PV data
ScopeEntire cluster, all namespaces at onceGranular: per-namespace, per-label, per-resource type
Persistent Volume dataNo (only PV/PVC definitions, not data)Yes, with CSI snapshots or node-agent file backup
Restore granularityAll-or-nothing cluster restoreSingle namespace or even single resource type
Cross-cluster restoreSame cluster only (etcd IDs must match)Any cluster with Velero installed (migration capable)
Storage backendLocal file or etcd itselfS3, GCS, Azure Blob, MinIO
Cluster-scoped resourcesYes (CRDs, ClusterRoles, PVs, Nodes)Configurable, but primarily namespace-scoped
SpeedSeconds (snapshot of embedded database)Minutes (API queries + optional PV copy)
Recovery scenarioControl plane failure, etcd corruptionNamespace deletion, application rollback, cluster migration
SchedulingCron job on control plane nodeBuilt-in CRD-based scheduler with TTL

Backup Strategy for Production

Run both. They're complementary, not competing. Here's a practical strategy based on running Kubernetes in production:

etcd snapshots protect against control plane disasters. If etcd corrupts, the API server can't start, and Velero (which depends on a running API server) can't help. Take etcd snapshots every 6 hours and store them off-cluster. See the etcd backup and restore guide for the full procedure.

Velero backups protect against application-level disasters. Someone deletes a namespace, a Helm chart upgrade corrupts config, or you need to migrate workloads to a new cluster. Schedule Velero backups for every critical namespace with appropriate TTLs.

A reasonable production schedule:

  • etcd snapshot: every 6 hours, retained for 7 days, stored on a separate server or object storage
  • Velero namespace backups: every 6 hours for critical namespaces, daily for others
  • Velero full-cluster backup: weekly, 30-day retention
  • Manual backup: before every cluster upgrade, Kubernetes version bump, or major application change

Test your restores quarterly. A backup you've never tested is a backup that doesn't exist. Spin up a test cluster, restore from both etcd and Velero, and verify your applications come up with their data intact. Prometheus with Grafana can alert on backup age if you expose Velero metrics, which catches the scenario where backups silently stop working.

For clusters using Cilium or Calico with custom CRDs (network policies, BGP configs), remember that Velero backs these up too since they're standard Kubernetes resources. Your CNI configuration survives the restore, which etcd snapshots also cover but only for same-cluster restores.

The bottom line: etcd snapshots are your insurance against infrastructure failure. Velero is your insurance against operational mistakes and application-level disasters. Running both costs almost nothing in terms of resources and gives you coverage for every realistic failure scenario.

Related Articles

Docker Install Docker and Docker Compose on Ubuntu 24.04 / Debian 13 CentOS Install Graphite & Graphite-Web on CentOS 8 | RHEL 8 Openshift Run Local OpenShift Cluster on Ubuntu using CRC CentOS Run Linux Containers with LXC/LXD on CentOS 8 / CentOS 9

Leave a Comment

Press ESC to close