OCP backup

Here’s a comprehensive breakdown of OCP backup — covering the two distinct layers you need to protect.


The two backup layers in OCP

OCP backup is not a single thing — you need two separate strategies working together:

LayerWhat it protectsTool
Control plane (etcd)Cluster state — all Kubernetes/OCP objects, CRDs, configs, RBACcluster-backup.sh / EtcdBackup CR
Application dataNamespaces, workloads, PVs/PVCs, imagesOADP (OpenShift API for Data Protection)

Use etcd backups with automated snapshots to protect and recover the cluster itself. Use OADP to protect and recover your applications and their data on top of a healthy cluster. — they are complementary, not interchangeable. OADP will not successfully backup and restore operators or etcd.


Layer 1 — etcd backup (control plane)

etcd is the key-value store for OpenShift Container Platform, which persists the state of all resource objects. An etcd backup plays a crucial role in disaster recovery.

What the backup produces

Running cluster-backup.sh on a control plane node generates two files:

  • snapshot_<timestamp>.db — the etcd snapshot (all cluster state)
  • static_kuberesources_<timestamp>.tar.gz — static pod manifests + encryption keys (if etcd encryption is enabled)

How to take a manual backup

# SSH into any control plane node
ssh core@master-0.example.com
# Run the built-in backup script
sudo /usr/local/bin/cluster-backup.sh /home/core/backup
# Copy the backup off-cluster immediately
scp core@master-0:/home/core/backup/* /safe/offsite/location/

Automated scheduled backup (OCP 4.14+)

You can create a CRD to define the schedule and retention type of automated backups:

# 1. Create a PVC for backup storage
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: etcd-backup-pvc
namespace: openshift-etcd
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 200Gi
---
# 2. Schedule recurring backups
apiVersion: config.openshift.io/v1alpha1
kind: Backup
metadata:
name: etcd-recurring-backup
spec:
etcd:
schedule: "20 4 * * *" # Daily at 04:20 UTC
timeZone: "UTC"
pvcName: etcd-backup-pvc
retentionPolicy:
retentionType: RetentionNumber
retentionNumber:
maxNumberOfBackups: 15

Key rules for etcd backups

Do not take an etcd backup before the first certificate rotation completes, which occurs 24 hours after installation, otherwise the backup will contain expired certificates. It is also recommended to take etcd backups during non-peak usage hours, as it is a blocking action.

  • Backups only need to be taken from one master — there is no need to run on every master. Store backups in either an offsite location or somewhere off the server.
  • Be sure to take an etcd backup after you upgrade your cluster. When you restore your cluster, you must use an etcd backup that was taken from the same z-stream release — for example, an OCP 4.14.2 cluster must use a backup taken from 4.14.2.

Restore procedure (high level)

# On the designated recovery control plane node:
sudo -E /usr/local/bin/cluster-restore.sh /home/core/backup
# After restore completes, force etcd redeployment:
oc edit etcd cluster
# Add under spec:
# unsupportedConfigOverrides:
# forceRedeploymentReason: recovery-2025-04-17
# Monitor etcd pods coming back up
oc get pods -n openshift-etcd | grep -v quorum

Layer 2 — OADP (application backup)

OADP uses Velero to perform both backup and restore tasks for either resources and/or internal images, while also being capable of working with persistent volumes via Restic or with snapshots.

Install OADP via OperatorHub

Operators → OperatorHub → search "OADP" → Install

Configure a backup location (S3 example)

apiVersion: oadp.openshift.io/v1alpha1
kind: DataProtectionApplication
metadata:
name: dpa-cluster
namespace: openshift-adp
spec:
configuration:
velero:
defaultPlugins:
- openshift # Required for OCP-specific resources
- aws
nodeAgent:
enable: true
uploaderType: kopia # Preferred over restic in OADP 1.3+
backupLocations:
- name: default
velero:
provider: aws
default: true
objectStorage:
bucket: my-ocp-backups
prefix: cluster-1
credential:
name: cloud-credentials
key: cloud
snapshotLocations:
- name: default
velero:
provider: aws
config:
region: ca-central-1

Taking an application backup

# Backup a specific namespace
apiVersion: velero.io/v1
kind: Backup
metadata:
name: my-app-backup
namespace: openshift-adp
spec:
includedNamespaces:
- my-app
- my-app-db
defaultVolumesToFsBackup: true # Use kopia/restic for PVs
storageLocation: default
ttl: 720h0m0s # 30-day retention
# Scheduled backup (daily at 2am)
apiVersion: velero.io/v1
kind: Schedule
metadata:
name: daily-app-backup
namespace: openshift-adp
spec:
schedule: "0 2 * * *"
template:
includedNamespaces:
- "*" # All namespaces
excludedNamespaces:
- openshift-* # Exclude platform namespaces
- kube-*
defaultVolumesToFsBackup: true
storageLocation: default
ttl: 168h0m0s # 7-day retention

Restoring from OADP

apiVersion: velero.io/v1
kind: Restore
metadata:
name: my-app-restore
namespace: openshift-adp
spec:
backupName: my-app-backup
includedNamespaces:
- my-app
restorePVs: true

PV backup methods

MethodHow it worksBest for
CSI SnapshotsPoint-in-time volume snapshot via storage driverCloud PVs (AWS EBS, Azure Disk, Ceph RBD)
Kopia/Restic (fs backup)File-level copy streamed to object storageAny PV, slower but universal

Supported backup storage targets

OADP supports AWS, MS Azure, GCP, Multicloud Object Gateway, and S3-compatible object storage (MinIO, NooBaa, etc.). Snapshot backups can be performed for AWS, Azure, GCP, and CSI snapshot-enabled cloud storage such as Ceph FS and Ceph RBD.


Best practices summary

PracticeDetail
3-2-1 rule3 copies, 2 media types, 1 offsite — etcd snapshots must be stored outside the cluster
Test restoresRegularly restore to a test cluster — an untested backup is not a backup
Version locketcd restores must use a backup from the same OCP z-stream version
Frequencyetcd: at minimum daily; before every upgrade; OADP: daily or per RPO requirement
Exclude platform namespacesDon’t include openshift-* in OADP — OADP doesn’t restore operators or etcd
EncryptionEncrypt backup storage at rest; etcd snapshot includes encryption keys if etcd encryption is on
Monitor backup jobsSet up alerts on failed Schedule or EtcdBackup CRs

Leave a comment