Mastering OpenShift on VMware and Bare Metal: Key Insights

Administering OpenShift on VMware vSphere or Bare Metal is significantly more complex than cloud environments because you are responsible for the “underlay” (the physical or virtual infrastructure) as well as the “overlay” (OpenShift).

In a 2026 interview, expect a focus on automation, connectivity in restricted environments, and hardware lifecycle.


1. Installation & Provisioning (The Foundation)

Q1: Compare IPI vs. UPI in the context of VMware vSphere.

  • IPI (Installer-Provisioned Infrastructure): The installer has the vCenter credentials. It automatically creates the Folder, Virtual Machines, and Resource Pools. It also handles the VIP (Virtual IPs) for the API and Ingress via Keepalived.
  • UPI (User-Provisioned Infrastructure): You manually create the VMs, set up the Load Balancers (F5, HAProxy), and configure DNS.
  • Interview Tip: Mention that IPI is preferred for speed and “automated scaling,” but UPI is often mandatory in “Brownfield” environments where the networking team won’t give the installer full control over the VLANs.

Q2: How does OpenShift interact with physical hardware for Bare Metal?

Answer: It uses the Metal3 project and the Bare Metal Operator (BMO).

  • The admin provides the BMC (Baseboard Management Controller) details—like IPMI, iDRAC (Dell), or iLO (HP)—to OpenShift.
  • OpenShift uses these to remotely power on the server, PXE boot it, and install RHCOS (Red Hat Enterprise Linux CoreOS).

2. Infrastructure Operations

Q3: What is a “Disconnected” (Air-Gapped) Installation?

Answer: Common in on-prem data centers with high security.

  • The Problem: OpenShift usually pulls images from quay.io.
  • The Solution: You must set up a Local Mirror Registry (like Red Hat Quay or Sonatype Nexus).
  • Process: You use the oc mirror plugin to download all required images to a portable disk, move it inside the secure zone, and push them to your local registry. You then configure the cluster to use an ImageContentSourcePolicy to redirect all image pulls to your local IP.

Q4: How do you handle storage on VMware vs. Bare Metal?

  • VMware: Use the vSphere CSI Driver. This allows OpenShift to talk to vCenter and dynamically provision .vmdk files as Persistent Volumes (PVs).
  • Bare Metal: You typically use LVM (Local Storage Operator) for fast local SSDs or OpenShift Data Foundation (ODF) (based on Ceph). ODF is the industry standard for on-prem because it provides S3-compatible, Block, and File storage within the cluster itself.

3. High Availability & Networking

Q5: On Bare Metal, how do you handle Load Balancing for the API and Ingress?

Answer: Since there is no “AWS ELB” on-prem, you have two choices:

  1. External: Use a physical appliance like an F5 Big-IP or a pair of HAProxy nodes managed by your team.
  2. Internal (MetalLB): Use the MetalLB Operator. It allows you to assign a range of IPs from your corporate network to the OpenShift Router so it can act like a cloud load balancer.

Q6: What happens if a Master (Control Plane) node dies in a Bare Metal cluster?

Answer: * Quorum: You must have 3 Masters to maintain an etcd quorum. If one dies, the cluster survives. If two die, the API becomes read-only or crashes.

  • Recovery: On Bare Metal, recovery is manual. You must reinstall the OS, use the kube-etcd-operator to remove the old member, and then use the cluster-bootstrap process to add the new node back into the etcd ring.

4. Advanced Troubleshooting

Q7: A worker node is “NotReady” on VMware. What is your first check?

Answer: Beyond the logs, I check the VMware Tools status and Time Sync.

  • If the ESXi host and the VM have a clock drift (common if NTP is misconfigured), the certificates for the Kubelet will fail to validate, and the node will go NotReady.
  • I would also check the MachineConfigPool (MCP). If the node is stuck in “Updating,” it might be failing to pull an OS image from the internal registry.

Q8: What is “Assisted Installer”?

Answer: It’s the modern way to install OpenShift on-prem. It provides a web-based GUI that generates a “Discovery ISO.” You boot your physical servers with this ISO; they “check in” to the portal, and you can then click “Install” to deploy the whole cluster without writing complex YAML files.


Technical “Buzzwords” for 2026:

  • OVN-Kubernetes: The default network plugin (replaces OpenShift SDN).
  • LVM Storage: Used for high-performance databases on bare metal.
  • Red Hat Advanced Cluster Management (RHACM): If the company has multiple on-prem clusters, they will use this to manage them all from one dashboard.

Debugging etcd is the highest level of OpenShift administration. If etcd is healthy, the cluster is healthy; if etcd is failing, the API will be sluggish or completely unresponsive.

Here is the technical deep-dive on how to diagnose and fix etcd on-premise.


1. Checking the High-Level Status

Before diving into logs, check if the Etcd Operator is happy. If the operator is degraded, it usually means it’s struggling to manage the quorum.

Bash

# Check the status of the etcd cluster operator
oc get clusteroperator etcd
# Check the status of the individual etcd pods
oc get pods -n openshift-etcd -l app=etcd

2. Testing Quorum and Health (The etcdctl way)

In OpenShift 4.x, etcd runs as Static Pods on the master nodes. To run diagnostic commands, you must use a helper script or exec into the container.

The “Is it alive?” check:

Bash

# Get a list of etcd members and their health
oc rsh -n openshift-etcd etcd-master-0 etcdctl endpoint health --cluster -w table

The Performance check (Disk Latency):

On-premise (especially VMware), Disk I/O latency is the #1 killer of etcd. If your storage is slow, etcd will lose quorum.

Bash

# Check the sync duration
oc rsh -n openshift-etcd etcd-master-0 etcdctl check perf

Interview Pro-Tip: Mention that etcd requires fsync latency of less than 10ms. If it’s higher, your underlying VMware datastore or Bare Metal disks are too slow for an enterprise cluster.


3. Investigating Logs

If a pod is crashing, check the logs specifically for “leader” issues or “wal” (Write Ahead Log) errors.

Bash

# View the last 100 lines of logs from a specific member
oc logs -n openshift-etcd etcd-master-0 -c etcd --tail=100

What to look for:

  • "lost leader": Indicates network instability between master nodes.
  • "apply entries took too long": Indicates slow disk or high CPU pressure on the master node.
  • "database space exceeded": The 8GB quota has been reached (requires a defrag).

4. Critical Recovery: The “Master Node Replacement”

If a master node (e.g., master-1) hardware fails permanently on Bare Metal, you must follow these steps to restore the cluster health:

  1. Remove the ghost member:Tell etcd to stop looking for the dead node.Bashoc rsh -n openshift-etcd etcd-master-0 etcdctl member list oc rsh -n openshift-etcd etcd-master-0 etcdctl member remove <dead-member-id>
  2. Clean up the Node object:oc delete node master-1
  3. Re-provision: Boot the new hardware with the RHCOS ISO. If using IPI, the Machine API might do this for you. If UPI, you must manually trigger the CSR (Certificate Signing Request) approval.
  4. Approve CSRs:The new master won’t join until you approve its certificates:oc get csr | grep Pending | awk '{print $1}' | xargs oc adm certificate approve

5. Compaction and Defragmentation

Over time, etcd keeps versions of objects. If the database grows too large, the cluster will stop accepting writes (Error: mvcc: database space exceeded).

The Fix:

OpenShift normally handles this automatically, but as an admin, you might need to force it:

Bash

# Defragment the endpoint
oc rsh -n openshift-etcd etcd-master-0 etcdctl defrag --cluster

The “Final Boss” Interview Question:

“We lost 2 out of 3 master nodes. The API is down. How do you recover?”

The Answer:

  1. Since quorum is lost (needs $n/2 + 1$ nodes), you must perform a Single Master Recovery.
  2. Stop the etcd service on the remaining healthy master.
  3. Run the etcd-snapshot-restore.sh script (shipped with OpenShift) using a previous backup.
  4. This forces the remaining master to become a “New Cluster” of one.
  5. Once the API is back up, you re-join the other two nodes as brand-new members.

Since OpenShift 4.12+, OVN-Kubernetes has become the default network provider, replacing the older OpenShift SDN. For an on-premise administrator, understanding this is vital because it changes how traffic flows from your physical switches into your pods.


1. OVN-Kubernetes Architecture

Unlike the old SDN which used Open vSwitch (OVS) in a basic way, OVN (Open Virtual Network) brings a distributed logical router and switch to every node.

  • Geneve Encap: OVN uses Geneve (Generic Network Virtualization Encapsulation) instead of VXLAN to tunnel traffic between nodes. It’s more flexible and allows for more metadata.
  • The Gateway: Every node has a “Gateway” that handles traffic entering and exiting the cluster. On-premise, this is where your physical network interface (e.g., eno1 or ens192) meets the virtual world.

2. On-Premise Networking Challenges

Q1: How does OpenShift handle “External” IPs on-prem?

In the cloud, you have a LoadBalancer service. On-prem, you don’t.

The Admin Solution: MetalLB.

As an admin, you configure a MetalLB Operator with an IP address pool from your actual data center VLAN. When a developer creates a Service of type LoadBalancer, MetalLB uses ARP (Layer 2) or BGP (Layer 3) to announce that IP address to your physical routers.

Q2: What is the “Ingress VIP” and “API VIP”?

During a VMware/Bare Metal IPI install, you are asked for two IPs:

  1. API VIP: The floating IP used to talk to the control plane (Port 6443).
  2. Ingress VIP: The floating IP for all application traffic (Ports 80/443).Mechanism: OpenShift uses Keepalived and HAProxy internally to float these IPs between the master nodes (for API) or worker nodes (for Ingress). If the node holding the IP fails, it “floats” to another node in seconds.

3. Troubleshooting the Network

If pods can’t talk to each other, follow this “inside-out” path:

Step 1: Check the Cluster Network Operator (CNO)

If the CNO is degraded, the entire network is unstable.

Bash

oc get clusteroperator network

Step 2: Trace the Flow with oc adm network

OpenShift provides a built-in tool to verify if two pods can actually talk to each other across nodes:

Bash

oc adm pod-network diagnostic

Step 3: Inspect the OVN Database

Since OVN stores the network state in a database (Northbound and Southbound DBs), you can check if the logical flows are actually created.

Bash

# Get the logs of the ovnkube-master
oc logs -n openshift-ovn-kubernetes -l app=ovnkube-master

4. Key Concepts for Interview Scenarios

Scenario: “Applications are slow only when talking to external databases.”

  • Likely Culprit: MTU Mismatch. * Explanation: Geneve encapsulation adds 100 bytes of overhead to every packet. If your physical network is set to standard MTU (1500), but OpenShift is also sending 1500, the packets get fragmented, causing a massive performance hit.
  • The Fix: Ensure the cluster MTU is set to 1400 (1500 – 100) or enable Jumbo Frames (9000) on your physical switches.

Scenario: “How do you isolate traffic between two departments on the same cluster?”

  • The Answer: NetworkPolicies. * OVN-Kubernetes supports standard Kubernetes NetworkPolicy objects. By default, all pods can talk to all pods. I would implement a “Deny-All” default policy and then explicitly allow traffic only between required microservices.

Summary for Administrator Interview

FeatureOpenShift SDN (Old)OVN-Kubernetes (New/Standard)
EncapsulationVXLANGeneve
Network PolicyLimitedFully Featured (Egress/Ingress)
Hybrid CloudHard to implementDesigned for it (IPsec support)
Windows SupportNoYes

Leave a comment