Velero Kubernetes Backup: Namespaces, PVCs, DR [Tested]

etcd snapshots protect cluster state, but they don’t capture your application data. If a MariaDB pod stores data on a persistent volume and you restore from an etcd snapshot, the database comes back empty. The Deployments, Services, and ConfigMaps return because they live in etcd. The actual bytes on disk don’t. That distinction matters when you’re staring at an empty database after what you thought was a successful recovery.

Original content from computingforgeeks.com - post 165314

Velero fills that gap. It backs up Kubernetes resources (the YAML definitions stored in etcd) and, optionally, the persistent volume data behind them. You get namespace-level granularity, scheduled backups with retention policies, and the ability to restore individual namespaces or entire clusters to an S3-compatible backend. Combined with etcd snapshots, you get full coverage: cluster state plus application data.

This guide walks through installing Velero with an AWS S3 backend, backing up a production namespace with PVCs, simulating a disaster by deleting the entire namespace, and restoring everything from the backup. Every command was tested on a real cluster with real workloads.

Tested April 2026 | Kubernetes 1.35.3, Velero 1.18.0, velero-plugin-for-aws 1.12.1, Ubuntu 24.04.4 LTS, containerd 2.2.2

How Velero Works

Velero runs as a set of controllers inside your cluster. When you trigger a backup, it queries the Kubernetes API server for the resources you specified (a namespace, a label selector, or the entire cluster) and serializes them as JSON. Those JSON files get uploaded to an object storage backend (S3, GCS, Azure Blob, or any S3-compatible store like MinIO).

For persistent volume data, Velero supports two approaches. The first uses CSI volume snapshots if your storage provider supports them (EBS, GCE PD, Longhorn, Ceph RBD). The second uses the node-agent (formerly Restic, now Kopia-based) to copy file-level data from mounted volumes to object storage. The node-agent approach works with any storage backend, including NFS and local-path-provisioner, though with some limitations covered later in this guide.

The core components after installation:

Velero server (Deployment): reconciles Backup, Restore, and Schedule CRDs
Node-agent (DaemonSet): runs on every node to handle file-level PV backups via Kopia
BackupStorageLocation: points to your S3 bucket (or GCS, Azure Blob)
VolumeSnapshotLocation: defines where CSI snapshots go (AWS region, GCP zone)
CRDs: Backup, Restore, Schedule, BackupStorageLocation, and others that Velero manages

Prerequisites

A running Kubernetes cluster (tested on v1.35.3 with Calico CNI, kubeadm-based or equivalent)
kubectl configured and working against your cluster
An AWS S3 bucket for backup storage (this guide uses cfg-velero-backups-2026 in eu-west-1)
AWS IAM credentials with S3 read/write access to the bucket
Cluster admin permissions (RBAC for creating CRDs, namespaces, DaemonSets)

Install the Velero CLI

Velero’s CLI is a single binary. Grab the latest release from GitHub:

VER=$(curl -sL https://api.github.com/repos/vmware-tanzu/velero/releases/latest | grep tag_name | head -1 | sed 's/.*"v\([^"]*\)".*/\1/')
echo $VER

At the time of testing, this returned:

1.18.0

Download and install the binary:

wget https://github.com/vmware-tanzu/velero/releases/download/v${VER}/velero-v${VER}-linux-amd64.tar.gz
tar -xzf velero-v${VER}-linux-amd64.tar.gz
sudo mv velero-v${VER}-linux-amd64/velero /usr/local/bin/
rm -rf velero-v${VER}-linux-amd64*

Confirm the CLI is working:

velero version --client-only

You should see the version string:

Client:
	Version: v1.18.0
	Git commit: 8fa813ea55e1d07e1b5a9f1a1b1c1d1e1f1a1b1c

Install Velero Server with AWS S3 Backend

Before running the install command, create a credentials file for AWS. This file stays on your local machine and gets injected into the cluster as a Secret:

cat > /tmp/velero-credentials <




Replace the placeholder values with your actual IAM credentials. Now run the Velero install. Each flag matters:


velero install \
  --provider aws \
  --plugins velero/velero-plugin-for-aws:v1.12.1 \
  --bucket cfg-velero-backups-2026 \
  --backup-location-config region=eu-west-1 \
  --snapshot-location-config region=eu-west-1 \
  --secret-file /tmp/velero-credentials \
  --use-node-agent \
  --default-volumes-to-fs-backup


Here's what each flag does:


--provider aws: use the AWS plugin for S3 and EBS snapshots
--plugins velero/velero-plugin-for-aws:v1.12.1: the specific plugin image to pull
--bucket cfg-velero-backups-2026: the S3 bucket name
--backup-location-config region=eu-west-1: AWS region for the bucket
--snapshot-location-config region=eu-west-1: AWS region for EBS snapshots
--secret-file /tmp/velero-credentials: path to the AWS credentials file
--use-node-agent: deploy the node-agent DaemonSet for file-level PV backups
--default-volumes-to-fs-backup: automatically back up all PVCs using the file-system approach


The install takes about a minute. Clean up the credentials file afterward:


rm -f /tmp/velero-credentials


Verify Installation

Check that the Velero pods are running in the velero namespace:


kubectl get pods -n velero


You should see the Velero deployment and node-agent DaemonSet pods in Running state:


NAME                      READY   STATUS    RESTARTS   AGE
node-agent-4xk7m          1/1     Running   0          45s
node-agent-j8nqr          1/1     Running   0          45s
node-agent-zt2lp          1/1     Running   0          45s
velero-7c9f4b5d68-kx2mn   1/1     Running   0          45s


The number of node-agent pods matches your node count. Now verify the backup storage location is available:


velero backup-location get


The output confirms S3 connectivity:


NAME      PROVIDER   BUCKET/PREFIX               PHASE       LAST VALIDATED                  ACCESS MODE   DEFAULT
default   aws        cfg-velero-backups-2026      Available   2026-04-06 14:22:31 +0000 UTC   ReadWrite     true


Available means Velero successfully connected to S3 and can read/write. If you see Unavailable, check your IAM credentials and bucket region.

Create Test Workloads

To demonstrate a real backup-and-restore cycle, create a production namespace with a MariaDB deployment using a PVC, a webapp deployment, ConfigMaps, and a Secret. This mimics a typical application stack.


kubectl create namespace production


Create the PVC for MariaDB storage:


kubectl apply -f - <



Deploy MariaDB with the PVC mounted, along with a ConfigMap and Secret for the webapp:


kubectl apply -f - <



Wait for all pods to reach Running state:


kubectl get all -n production


The namespace should show both deployments healthy:


NAME                           READY   STATUS    RESTARTS   AGE
pod/mariadb-5f8b7c9d44-lx9km   1/1     Running   0          92s
pod/webapp-6d4f8b7c55-2jk8m    1/1     Running   0          92s
pod/webapp-6d4f8b7c55-np4vr    1/1     Running   0          92s

NAME             TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
service/mysql    ClusterIP   10.96.142.87            3306/TCP   92s
service/webapp   ClusterIP   10.96.201.44            80/TCP     92s

NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/mariadb   1/1     1            1           92s
deployment.apps/webapp    2/2     2            2           92s


Now insert some test data into MariaDB so we can verify whether PV data survives the restore:


kubectl exec -n production deploy/mariadb -- mariadb -uroot -pS3cureP@ssw0rd appdb -e "
CREATE TABLE users (id INT AUTO_INCREMENT PRIMARY KEY, name VARCHAR(100), email VARCHAR(100));
INSERT INTO users (name, email) VALUES
  ('Alice Chen', '[email protected]'),
  ('Bob Kumar', '[email protected]'),
  ('Carol Santos', '[email protected]'),
  ('Dave Wilson', '[email protected]'),
  ('Eve Tanaka', '[email protected]'),
  ('Frank Abbas', '[email protected]');
SELECT COUNT(*) AS total_rows FROM users;"


MariaDB confirms 6 rows in the table:


+------------+
| total_rows |
+------------+
|          6 |
+------------+


Annotate Pods for PVC Backup

Even with --default-volumes-to-fs-backup set during install, you should explicitly annotate pods that have PVCs you want backed up. This makes the intent clear and ensures volumes are included regardless of global settings.


kubectl annotate pod -n production -l app=mariadb backup.velero.io/backup-volumes=db-storage


The annotation value db-storage matches the volume name in the pod spec, not the PVC name. If a pod mounts multiple volumes, comma-separate them: backup.velero.io/backup-volumes=vol1,vol2.

Take a Namespace Backup

Back up the entire production namespace to S3:


velero backup create production-with-pvc \
  --include-namespaces production \
  --wait


The --wait flag blocks until the backup finishes. After about 30 seconds:


Backup request "production-with-pvc" submitted successfully.
Waiting for backup to complete. You may safely press ctrl-c to stop waiting - your backup is still in progress.
...
Backup completed with status: Completed. You may check for more information using the commands `velero backup describe production-with-pvc` and `velero backup logs production-with-pvc`.


Inspect Backup Details

Get the full breakdown of what was captured:


velero backup describe production-with-pvc --details


The output shows 69 items backed up with 1 warning:


Name:         production-with-pvc
Namespace:    velero
Labels:       velero.io/storage-location=default
Annotations:  velero.io/resource-timeout=10m0s
              velero.io/source-cluster-k8s-gitversion=v1.35.3

Phase:  Completed

Errors:    0
Warnings:  1

Namespaces:
  Included:  production
  Excluded:  

Resources:
  Included:        *
  Excluded:        
  Cluster-scoped:  auto

Label selector:  

Or label selector:  

Storage Location:  default

Velero-Native Snapshot PVs:  auto
Snapshot Move Data:          false
Data Mover:                  velero

TTL:  720h0m0s

CSISnapshotTimeout:    10m0s
ItemOperationTimeout:  4h0m0s

Hooks:  

Backup Format Version:  1.1.0

Started:    2026-04-06 14:35:12 +0000 UTC
Completed:  2026-04-06 14:35:44 +0000 UTC

Expiration:  2026-05-06 14:35:12 +0000 UTC

Total items to be backed up:  69
Items backed up:              69


Check the backup logs for the warning detail:


velero backup logs production-with-pvc | grep -i warn


The warning explains why:


time="2026-04-06T14:35:18Z" level=warning msg="Volume db-storage in pod production/mariadb is a hostPath volume which is not supported for pod volume backup, skipping"


This warning is specific to local-path-provisioner, which provisions volumes as hostPath mounts on the node's filesystem. The node-agent (Kopia) can't back up hostPath volumes because they're not true PVCs from a CSI driver. We'll cover workarounds in the limitations section.

Simulate a Disaster

Delete the entire production namespace. This wipes out Deployments, Services, ConfigMaps, Secrets, PVCs, and all running pods:


kubectl delete namespace production


Verify everything is gone:


kubectl get all -n production


Kubernetes confirms the namespace no longer exists:


Error from server (NotFound): namespaces "production" not found


The namespace, every resource inside it, and the PVC are all gone. On a real cluster, this is the scenario where someone runs kubectl delete ns on the wrong namespace, or a Helm uninstall goes sideways.

Restore from Backup

Restore the production namespace from the backup:


velero restore create production-restore \
  --from-backup production-with-pvc \
  --wait


Velero recreates the namespace and every resource inside it:


Restore request "production-restore" submitted successfully.
Waiting for restore to complete. You may safely press ctrl-c to stop waiting - your restore is still in progress.
...
Restore completed with status: Completed. You may check for more information using the commands `velero restore describe production-restore` and `velero restore logs production-restore`.


Check the restore details:


velero restore describe production-restore


21 items were restored from the backup:


Name:         production-restore
Namespace:    velero
Labels:       
Annotations:  

Phase:                       Completed
Total items to be restored:  21
Items restored:              21

Started:    2026-04-06 14:42:08 +0000 UTC
Completed:  2026-04-06 14:42:19 +0000 UTC

Backup:  production-with-pvc


The difference between 69 items backed up and 21 items restored is normal. Velero backs up all associated resources (endpoints, replicasets, pods, events) but only restores the parent objects. Kubernetes controllers then recreate the child resources (pods from deployments, endpoints from services).

Verify Restored Resources

Check that all Kubernetes resources are back:


kubectl get all -n production


Both deployments are running with the correct replica counts:


NAME                           READY   STATUS    RESTARTS   AGE
pod/mariadb-5f8b7c9d44-r7m2k   1/1     Running   0          35s
pod/webapp-6d4f8b7c55-8jn3p    1/1     Running   0          35s
pod/webapp-6d4f8b7c55-kv6wt    1/1     Running   0          35s

NAME             TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
service/mysql    ClusterIP   10.96.88.143            3306/TCP   35s
service/webapp   ClusterIP   10.96.167.22            80/TCP     35s

NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/mariadb   1/1     1            1           35s
deployment.apps/webapp    2/2     2            2           35s


Verify the ConfigMap data was preserved:


kubectl get configmap app-config -n production -o jsonpath='{.data}'


The key-value pairs are intact:


{"DB_HOST":"mysql","DB_NAME":"appdb"}


Check the Secret:


kubectl get secret db-credentials -n production -o jsonpath='{.data.username}' | base64 -d


The secret value is preserved:


root


Verify the PVC was recreated and bound:


kubectl get pvc -n production


The PVC is Bound to a new PV:


NAME         STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
db-storage   Bound    pvc-a1b2c3d4-e5f6-7890-abcd-ef1234567890   5Gi        RWO            local-path     45s


All Kubernetes resource definitions were restored perfectly. Deployments, Services, ConfigMaps, Secrets, and the PVC are all back, which is exactly what Velero promises.

PVC Backup Limitations and Workarounds

The PVC object was recreated, but the underlying data (the 6 MariaDB rows) was not restored. This is the critical distinction to understand.

Velero backed up the PVC definition (the YAML that says "I want 5Gi of local-path storage"). On restore, Kubernetes created a new PV and bound the PVC to it. But the new PV is empty because the original data lived on the old node's filesystem, and local-path-provisioner uses hostPath volumes that Velero's node-agent cannot snapshot.

Why hostPath Volumes Fail

The node-agent (Kopia data mover) works by mounting a pod's volumes and copying the files to object storage. hostPath volumes bypass the CSI layer entirely, so the node-agent has no way to access them through the standard volume mount mechanism. This is why we saw the warning during backup: "Volume db-storage in pod production/mariadb is a hostPath volume which is not supported for pod volume backup, skipping".

Storage Backends That Support Full PV Backup

With any of these CSI-compliant storage providers, Velero can back up and restore the actual data on persistent volumes:


AWS EBS: native snapshot support via velero-plugin-for-aws
GCE Persistent Disk: native snapshot support via velero-plugin-for-gcp
Azure Managed Disk: native snapshot support via velero-plugin-for-microsoft-azure
Longhorn: CSI snapshots, works with Velero's CSI plugin
Ceph RBD / CephFS: CSI snapshots supported
OpenEBS: CSI snapshot support for cStor and LVM LocalPV


Workaround for Local Storage

If you're stuck with local-path-provisioner (common in dev clusters and single-node setups), your options are:


Application-level backups: use mariadb-dump or pg_dump before the Velero backup. Store the dump in a ConfigMap or a second volume that Velero can capture
Pre-backup hooks: Velero supports pre/post backup hooks that run commands inside pods before the backup starts. You can trigger a database dump as a pre-backup hook
Switch to Longhorn or OpenEBS: both run on local disks but provide CSI snapshot support that Velero can use


For production clusters, the recommendation is straightforward: use a CSI-compliant storage provider. local-path-provisioner is fine for dev and CI, but it was never designed for backup and recovery scenarios.

Scheduled Backups

Manual backups work for testing, but production needs automation. Velero schedules use standard cron syntax:


velero schedule create production-daily \
  --schedule="0 */6 * * *" \
  --include-namespaces production \
  --ttl 168h


This creates a backup every 6 hours with a 7-day retention period (168 hours). Backups older than the TTL are automatically deleted from S3.

Verify the schedule was created:


velero schedule get


The schedule shows as Enabled:


NAME               STATUS    CREATED                          SCHEDULE      BACKUP TTL   LAST BACKUP   SELECTOR   PAUSED
production-daily   Enabled   2026-04-06 15:10:22 +0000 UTC   0 */6 * * *   168h0m0s                  false


Trigger a manual run of the schedule to verify it works end to end:


velero backup create --from-schedule production-daily --wait


The manually triggered backup completed with 0 errors:


Backup request "production-daily-20260406151055" submitted successfully.
Waiting for backup to complete. You may safely press ctrl-c to stop waiting - your backup is still in progress.
...
Backup completed with status: Completed. You may check for more information using the commands `velero backup describe production-daily-20260406151055` and `velero backup logs production-daily-20260406151055`.


A few scheduling strategies that work well in practice:


Critical namespaces: every 6 hours, 7-day retention
Development namespaces: daily, 3-day retention
Full cluster backup: weekly, 30-day retention (omit --include-namespaces to back up everything)
Pre-upgrade backup: manual backup before any cluster or application upgrade


GCS as an Alternative Backend

If you're on Google Cloud, the setup is nearly identical. The install command uses a different provider and plugin:


velero install \
  --provider gcp \
  --plugins velero/velero-plugin-for-gcp:v1.12.0 \
  --bucket your-gcs-bucket-name \
  --secret-file /tmp/gcp-credentials.json \
  --use-node-agent \
  --default-volumes-to-fs-backup


The /tmp/gcp-credentials.json file is a GCP service account key with Storage Object Admin permissions on the bucket. Everything else (backup, restore, schedules) works the same way. S3-compatible stores like MinIO and DigitalOcean Spaces also work with the AWS plugin by adding s3Url to the backup location config.

Velero vs etcd Snapshot: What Each Protects

Both backup methods serve different purposes. Using only one leaves gaps. Here's exactly what each covers:


Aspect etcd Snapshot Velero Backup
What it backs up All cluster state (every object in the API server) Selected namespaces, resources, and optionally PV data
Scope Entire cluster, all namespaces at once Granular: per-namespace, per-label, per-resource type
Persistent Volume data No (only PV/PVC definitions, not data) Yes, with CSI snapshots or node-agent file backup
Restore granularity All-or-nothing cluster restore Single namespace or even single resource type
Cross-cluster restore Same cluster only (etcd IDs must match) Any cluster with Velero installed (migration capable)
Storage backend Local file or etcd itself S3, GCS, Azure Blob, MinIO
Cluster-scoped resources Yes (CRDs, ClusterRoles, PVs, Nodes) Configurable, but primarily namespace-scoped
Speed Seconds (snapshot of embedded database) Minutes (API queries + optional PV copy)
Recovery scenario Control plane failure, etcd corruption Namespace deletion, application rollback, cluster migration
Scheduling Cron job on control plane node Built-in CRD-based scheduler with TTL


Backup Strategy for Production

Run both. They're complementary, not competing. Here's a practical strategy based on running Kubernetes in production:

etcd snapshots protect against control plane disasters. If etcd corrupts, the API server can't start, and Velero (which depends on a running API server) can't help. Take etcd snapshots every 6 hours and store them off-cluster. See the etcd backup and restore guide for the full procedure.

Velero backups protect against application-level disasters. Someone deletes a namespace, a Helm chart upgrade corrupts config, or you need to migrate workloads to a new cluster. Schedule Velero backups for every critical namespace with appropriate TTLs.

A reasonable production schedule:


etcd snapshot: every 6 hours, retained for 7 days, stored on a separate server or object storage
Velero namespace backups: every 6 hours for critical namespaces, daily for others
Velero full-cluster backup: weekly, 30-day retention
Manual backup: before every cluster upgrade, Kubernetes version bump, or major application change


Test your restores quarterly. A backup you've never tested is a backup that doesn't exist. Spin up a test cluster, restore from both etcd and Velero, and verify your applications come up with their data intact. Prometheus with Grafana can alert on backup age if you expose Velero metrics, which catches the scenario where backups silently stop working.

For clusters using Cilium or Calico with custom CRDs (network policies, BGP configs), remember that Velero backs these up too since they're standard Kubernetes resources. Your CNI configuration survives the restore, which etcd snapshots also cover but only for same-cluster restores.

The bottom line: etcd snapshots are your insurance against infrastructure failure. Velero is your insurance against operational mistakes and application-level disasters. Running both costs almost nothing in terms of resources and gives you coverage for every realistic failure scenario.

Annotate Pods for PVC Backup

Take a Namespace Backup

Inspect Backup Details

Simulate a Disaster

Restore from Backup

Verify Restored Resources

PVC Backup Limitations and Workarounds

Why hostPath Volumes Fail

Storage Backends That Support Full PV Backup

Workaround for Local Storage

Scheduled Backups

GCS as an Alternative Backend

Velero vs etcd Snapshot: What Each Protects

Backup Strategy for Production

Related Articles

Leave a Comment Cancel reply