A solid Terraform + Velero + OCP automated DR setup usually splits into three lanes:
- Terraform rebuilds the cluster infrastructure and base OCP install.
- OADP/Velero backs up and restores applications, namespaces, and PV data.
- etcd backup/restore protects the control plane state and must use a backup from the same OCP z-stream when restoring. (Red Hat Documentation)
Recommended architecture
Git / CI ├─ Terraform │ ├─ network, subnets, DNS, LB, IAM │ ├─ OCP install prerequisites │ └─ optional object storage + KMS │ ├─ OCP bootstrap/install │ ├─ Post-install automation │ ├─ OADP Operator │ ├─ DataProtectionApplication │ ├─ BackupStorageLocation │ └─ VolumeSnapshotLocation │ ├─ Scheduled protection │ ├─ etcd snapshots │ ├─ Velero/OADP schedules │ └─ CSI snapshots or file-system backup │ └─ DR pipeline ├─ Terraform recreate infra ├─ reinstall OCP ├─ etcd restore if doing full cluster rollback └─ Velero restore for apps/data
That layout matches Red Hat’s split between control plane backup/restore and application backup/restore via OADP, and OADP exposes the main objects you automate: Backup, Restore, Schedule, BackupStorageLocation, and VolumeSnapshotLocation. (Red Hat Documentation)
What each piece should own
Terraform should manage
- cloud network, subnets, routes, load balancers, DNS, IAM, object storage, encryption, and the repeatable OCP install scaffolding. This keeps rebuilds deterministic. The OCP install docs cover cluster-wide installation configuration, while backup guidance expects you to recover onto working infrastructure. (Red Hat Documentation)
OADP/Velero should manage
- namespace-scoped app backups, cluster resources related to apps, and PV protection. Red Hat recommends OADP for application backup/restore on OpenShift, and Velero supports both CSI snapshots and file-system backup. (Red Hat Documentation)
etcd should be separate
- use OpenShift’s control-plane backup flow for etcd. Red Hat explicitly says a restore must use an etcd backup from the same z-stream release, and OpenShift provides
cluster-restore.shandquorum-restore.shto simplify recovery. (Red Hat Documentation)
Best-practice deployment pattern
Use Terraform for infra, then GitOps or post-install automation to apply OADP resources. I would not use Terraform to micromanage every backup object forever; it is better for bootstrap and guardrails than for day-to-day backup lifecycle.
A practical pattern is:
- Terraform creates bucket, IAM, KMS, DNS, LB, install config, and optional cluster manifests.
- OCP comes up.
- A post-install job applies:
- OADP Operator
- cloud credentials secret
DataProtectionApplication- one
BackupStorageLocation - one or more
VolumeSnapshotLocation Scheduleobjects per app tier.
This lines up with Red Hat’s OADP install flow and Velero’s native schedule model. (Red Hat Documentation)
Reference implementation
1) Terraform: object storage and IAM
This is the part Terraform is best at. Exact provider blocks vary by cloud, but the minimum is:
- object storage bucket for backups
- encryption
- versioning / lifecycle
- IAM role or credentials for Velero/OADP
resource "aws_s3_bucket" "velero" { bucket = var.velero_bucket_name}resource "aws_s3_bucket_versioning" "velero" { bucket = aws_s3_bucket.velero.id versioning_configuration { status = "Enabled" }}resource "aws_s3_bucket_server_side_encryption_configuration" "velero" { bucket = aws_s3_bucket.velero.id rule { apply_server_side_encryption_by_default { sse_algorithm = "aws:kms" kms_master_key_id = var.kms_key_arn } }}
2) OADP install on OpenShift
On current OpenShift, app backup/restore is done through the OADP Operator, which provides the main backup objects and integrates Velero with supported storage providers. (Red Hat Documentation)
3) DataProtectionApplication
This is the core OADP object that wires backup and snapshot locations.
apiVersion: oadp.openshift.io/v1alpha1kind: DataProtectionApplicationmetadata: name: dpa namespace: openshift-adpspec: backupLocations: - velero: provider: aws default: true objectStorage: bucket: infra-cloud-velero-prod prefix: ocp-prod config: region: us-east-1 snapshotLocations: - velero: provider: aws config: region: us-east-1 configuration: velero: defaultPlugins: - openshift - aws - csi
OADP’s API surface includes BackupStorageLocation and VolumeSnapshotLocation, and CSI snapshot support is the preferred volume path when your storage supports it. (Red Hat Documentation)
4) Scheduled backups
Velero schedules are cron-based repeatable backup requests. (Velero)
Example for critical apps:
apiVersion: velero.io/v1kind: Schedulemetadata: name: apps-hourly namespace: openshift-adpspec: schedule: "0 * * * *" template: includedNamespaces: - payments - customer-api snapshotVolumes: true ttl: 720h
Example for lower-priority namespaces:
apiVersion: velero.io/v1kind: Schedulemetadata: name: apps-daily namespace: openshift-adpspec: schedule: "0 2 * * *" template: includedNamespaces: - reporting - internal-tools snapshotVolumes: true ttl: 2160h
Velero also supports filtering by namespace, labels, and resource type, which is useful for separating critical workloads from everything else. (Velero)
5) etdc backup automation
Keep this outside Velero. OpenShift’s backup docs separate control-plane backup from OADP app backup, and Red Hat says you only need to save the etcd backup from a single control plane host. (Red Hat Documentation)
Typical automation pattern:
- privileged automation job or external runner
- SSH to one control plane node
- run
cluster-backup.sh - copy backup artifacts off-cluster to encrypted object storage
- tag with OCP version and timestamp
Recovery workflow
App-only DR
Use this when the cluster still exists:
- Reinstall missing operator/app prerequisites if needed.
- Run Velero/OADP restore for selected namespaces or apps.
- CSI-backed PV restore happens through the CSI plugin during PVC restore. (Velero)
Full-cluster DR
Use this when the cluster is gone:
- Terraform recreates infra.
- Reinstall OCP.
- Restore etcd from a same-z-stream backup.
- Reconcile operators.
- Use OADP/Velero to restore app data and resources that are outside or after the control-plane restore point. (Red Hat Documentation)
Practical backup policy
A good production baseline is:
- etcd: daily plus pre-upgrade snapshot
- tier-1 apps: hourly schedule
- tier-2 apps: daily schedule
- PVs: CSI snapshots where supported, file-system backup where snapshots are unavailable or portability matters. Velero documents both CSI snapshot support and file-system backup, including snapshot-data movement options. (Red Hat Documentation)
Guardrails to add
- Encrypt the backup bucket.
- Keep backups off-cluster.
- Tag every etcd backup with OCP z-stream.
- Separate schedules by business tier, not “back up everything hourly.”
- Test both restore and full rebuild regularly. Red Hat’s backup docs are explicitly framed around recovering from disaster scenarios, not just creating backups. (Red Hat Documentation)
What I would automate first
If you want the highest payoff with the least complexity, automate this order:
- Terraform for infra + bucket + IAM
- OCP install
- OADP Operator +
DataProtectionApplication - namespace-based
Scheduleobjects - etcd backup job to off-cluster storage
- one restore drill for app-only recovery
- one restore drill for full cluster rebuild
My recommendation
For OpenShift, use OADP on top of Velero rather than installing raw Velero by hand unless you have a very specific reason. That is the supported OpenShift path for application backup/restore, while etcd remains a separate control-plane backup stream. (Red Hat Documentation)