This is a classic way for interviewers to see if you have actually managed a cluster in production. Day 1 is about getting the cluster alive; Day 2 is about keeping it from dying.
In a senior interview, they expect you to spend most of your time talking about Day 2, as that represents 99% of a cluster’s lifespan
Day 1: Installation & Provisioning
Focus: Automation, Infrastructure, and “Getting to Green.”
| Task | On-Premise Reality Check |
| DNS Setup | Creating the critical records: api, api-int, and *.apps. Without these, the bootstrap will fail. |
| Load Balancing | Setting up external HAProxy or F5 (for UPI) or ensuring VIPs are reserved (for IPI). |
| Ignition Configs | Using the installer to generate .ign files and serving them via HTTP/PXE to the bare metal/VM nodes. |
| Certificate Approval | Manually running oc get csr and approving them to allow nodes to join the cluster. |
| Registry Mirroring | (If Air-gapped) Setting up the local Quay/Nexus registry and the ImageContentSourcePolicy. |
Day 2: Maintenance & Operations
Focus: Stability, Compliance, and Scaling.
1. Lifecycle Management
- Cluster Upgrades: Navigating the “Update Graph.” Choosing between the
stableandfastchannels. - Certificate Rotation: Monitoring the expiration of the internal API and Ingress CA (though OpenShift now automates most of this, an admin must know how to fix a “stuck” rotation).
- Node Scaling: Adding new Bare Metal workers via the Assisted Installer or expanding VMware Resource Pools.
2. Performance & Health
- Etcd Maintenance: Performing periodic defragmentation and manual snapshots before any major change.
- Logging Stack Management: Tuning the Elasticsearch/Fluentd (or Loki) stack. On-premise, this often means managing “PVC full” issues when logs grow too fast.
- Pruning: Running
oc adm pruneto clean up old builds, images, and deployments that are cluttering the etcd database.
3. Security & Governance
- RBAC Auditing: Ensuring developers aren’t using
cluster-adminfor daily tasks. - SCC Policy: Managing exceptions for specialized workloads (e.g., giving a monitoring agent
privilegedaccess). - Quota Management: Defining
ResourceQuotasper Project to prevent a single “noisy neighbor” from consuming all physical RAM on your ESXi hosts.
The “Senior Admin” Bonus: Disaster Recovery (DR)
An interviewer will almost certainly ask: “What is your DR strategy for on-prem?”
A high-quality answer includes:
- Etcd Backups: Stored outside the cluster (e.g., on an external S3 bucket or NAS).
- Velero: Using the Velero operator to back up application metadata and Persistent Volumes (using CSI snapshots).
- Multi-Cluster: Having a second “Passive” cluster in a different data center and using Red Hat Advanced Cluster Management (RHACM) to shift traffic via DNS if the primary DC goes dark.
Final Interview Tip: The “Why”
When answering, don’t just say what you did; say why it matters for the business:
- Wrong: “I configured the MTU to 1400.”
- Right: “I lowered the MTU to 1400 to prevent packet fragmentation over our Geneve tunnels, which reduced our application latency by 30%.”