To help you ace a senior-level Red Hat OpenShift Container Platform (OCP) interview, you need to be prepared for deep architectural questions, scenario-based troubleshooting, and Day-2 operational challenges.
Here is a compilation of high-impact OCP interview questions, categorized by domain, complete with the “Architect-Level” answers interviewers look for.
1. Architecture & Control Plane Internals
Q1: What happens under the hood when you run an OpenShift cluster upgrade? How does it differ from upstream Kubernetes?
- The Expected Answer: Upstream Kubernetes requires manual, step-by-step updates of binaries (kubeadm, etcd, kubelet) across nodes. OpenShift automates this entirely via the Cluster Version Operator (CVO) and the Machine Config Operator (MCO).
- When an upgrade is triggered, the CVO updates the cluster’s internal operators in a strict dependency graph. The MCO then generates new OS boot images or configurations (Ignition files) for the Red Hat Enterprise Linux CoreOS (RHCOS) nodes. Worker nodes are gracefully drained one by one, cordoned, updated at the OS level, rebooted, and uncordoned without application downtime.
Q2: How does OpenShift manage node configurations declaratively?
- The Expected Answer: OpenShift uses the Machine Config Operator (MCO). You do not SSH into RHCOS nodes to change settings. Instead, you create a
MachineConfigor aTunedCRD. The MCO applies these changes by updating the underlying Ignition files, writing files to disk, or modifying kernel parameters, and handles rolling reboots across the designated Machine Config Pools (e.g., workers or masters) if required.
2. Networking & Traffic Management
Q3: Explain the difference between an OpenShift Route and a standard Kubernetes Ingress.
- The Expected Answer: A Route is an OpenShift-native Custom Resource (CRD) created long before Kubernetes standardized the
Ingressobject. It is managed by the OpenShift Ingress Controller (powered by HAProxy). Routes support advanced features out-of-the-box like explicit FQDN configurations, split-traffic split weights for canary deployments, and simple TLS edge/passthrough/re-encryption parameters without needing complex annotation syntax required by generic Ingress controllers.
Q4: What is the purpose of the Multus CNI in OpenShift, and when would you architect a solution around it?
- The Expected Answer: By default, a Kubernetes pod only has one network interface connected to the cluster’s internal overlay network (OVNKubernetes). Multus CNI is a meta-plugin that allows a pod to attach to multiple network interfaces.
- This is critical for telecommunications (NFV workloads), low-latency databases, or legacy applications that require one interface for standard cluster service communication and a secondary, raw interface mapped directly to a physical corporate VLAN or storage network (via SR-IOV or Macvlan).
3. Performance & Day-2 Operations
Q5: If an etcd metric indicates that disk peer-to-peer response times or commit latencies are spiking, what are the architectural implications and how do you remediate it?
- The Expected Answer: Spiking etcd latency risks a cluster split-brain or control plane panic, as the Raft consensus engine relies on strict, rapid disk synchronization. If latencies cross thresholds, control plane components lose connection to state.
- Remediation: First, ensure etcd is backed by dedicated, low-latency NVMe/SSD storage (at least 500-1000 sustained IOPS). Second, run a proactive etcd defragmentation script across control plane pods to clear dead space. Third, implement proper
ResourceQuotasto stop applications from spamming the API server with ephemeral object writes (like excessive ConfigMaps or build histories).
Q6: How do you optimize OpenShift for real-time or ultra-low-latency workloads?
- The Expected Answer: I use the Node Tuning Operator (NTO) and declare a
PerformanceProfile. This allows me to implement:- CPU Pinning/Isolation: Dedicating specific CPU cores strictly to application workloads while reserving cores 0-3 for OS and Kubelet overhead.
- HugePages: Allocating 1G or 2M memory chunks to prevent the Linux kernel from wasting cycles on translation lookaside buffer (TLB) lookups.
- Real-time Kernel (
kernel-rt): Enabling the real-time kernel extensions built into RHCOS to eliminate microsecond-level scheduling jitter.
4. Security & Fleet Governance
Q7: What is Red Hat Advanced Cluster Management (ACM), and how does it manage application delivery at scale?
- The Expected Answer: ACM is a multi-cluster control plane running on a Hub-and-Spoke model. It manages cluster lifecycles, configuration compliance, and application routing across a hybrid-cloud fleet.
- For application delivery, ACM utilizes Placement Rules based on labels. Instead of deploying an app to a single cluster, you tell ACM: “Deploy this payload to all clusters labeled
environment=productionandregion=eu.” It hooks directly into GitOps tools like ArgoCD to orchestrate application rollouts automatically when new clusters are spun up or imported.
Q8: How do you enforce a Zero-Trust architecture across developer-facing namespaces?
The Expected Answer:
- 1. Authentication: Bind OpenShift OAuth to a corporate OIDC provider (like Okta/Entra ID) with MFA active.
- 2. Authorization: Implement strict RBAC, eliminating cluster-admin roles and enforcing localized
RoleBindingsper tenant namespace. - 3. Network Isolation: Apply a default-deny-all
NetworkPolicyat the namespace boundary, explicitly whitelisting only known ingress/egress service lines. - 4. Runtime Security: Enforce the
restrictedPod Security Admission (PSA) profile to prevent containers from running as root or accessing host IPC spaces.
Interview Technique Tip:
When answering these, use the STAR Method (Situation, Task, Action, Result) whenever possible. Don’t just explain what the component is—explain a real scenario where you configured or troubleshot it to save money, reduce latency, or stop a security incident.