When it comes to backing up OpenShift Container Platform, the goal isn’t just “take backups” — it’s guaranteed recovery. That means combining etcd backups + application backups + storage strategy + testing.
Here’s a clear, real-world best-practice guide 👇
1. Understand what you MUST back up
OpenShift has two critical layers:
Cluster state (control plane)
- Stored in etcd
- Includes:
- API objects (pods, deployments, routes, secrets, etc.)
- cluster config
Without this → cluster is gone
Application data
- Persistent Volumes (databases, files)
- App-specific configs
Without this → apps restore empty/broken
2. Back up etcd (CRITICAL)
Use OpenShift’s built-in backup:
oc debug node/<master-node>chroot /host/usr/local/bin/cluster-backup.sh /backup/location
Best practices:
- Run daily (or more frequently)
- Store backups off-cluster
- Encrypt backups (contains secrets!)
- Keep multiple copies (rotation)
3. Use Velero for app-level backups
Use Velero for:
- Namespaces
- Kubernetes resources
- Persistent volumes
Best practices:
- Backup per namespace/app, not always full cluster
- Use labels:
velero backup create app-backup --selector app=myapp
- Schedule backups:
velero schedule create daily --schedule="0 1 * * *"
4. Handle persistent volumes properly
Choose one strategy:
Option A: Storage snapshots
- Fast, consistent
- Cloud-native (AWS EBS, etc.)
Option B: File-level backups (Restic via Velero)
- Works everywhere
- Slower but portable
👉 Best practice:
- Use snapshots for databases
- Use Velero + Restic for portability
5. Backup important namespaces
Focus on:
openshift-*(critical configs)- app namespaces
- operators (stateful ones)
Avoid blindly backing everything unless needed.
6. Secure your backups
- Encrypt at rest (S3, etc.)
- Restrict access (IAM roles)
- Never expose etcd backups publicly
Remember:
etcd backup contains ALL secrets
7. Test restores regularly (MOST IMPORTANT)
A backup is useless if restore fails.
Test:
velero restore create --from-backup <backup-name>
Also test:
- full cluster rebuild from etcd
- namespace restore
Do this in a staging cluster
8. Use off-cluster storage
Never store backups only inside cluster.
Use:
- S3 / object storage
- external NFS
- backup systems
9. Define RPO / RTO
- RPO (data loss tolerance)
- RTO (recovery time)
Example:
- etcd backup every 6 hours
- app backup every 1 hour
10. Common mistakes to avoid
❌ Only backing up etcd
❌ Not backing up PV data
❌ Never testing restore
❌ Storing backups inside cluster
❌ No encryption
❌ Backing up everything blindly (slow, noisy)
Recommended architecture
OpenShift Cluster
├── etcd backup → secure storage (daily)
├── Velero backups → object storage
└── PV snapshots → cloud storage
External Storage
├── S3 bucket
├── encrypted + versioned
└── lifecycle policies
Pro-level setup (what enterprises do)
- Velero + S3 + IAM roles
- Automated schedules (hourly/daily)
- Separate backup account
- Cross-region replication
- Periodic DR drills
Key takeaway
- etcd backup = cluster brain
- Velero backup = workloads
- PV backup = actual data
You need all three for real disaster recovery.