OpenShift (OCP) interview

April 16, 2026April 17, 2026 techhadoop ai, cloud, devops, kubernetes, technology

For an OpenShift (OCP) interview in 2026, you should expect questions that move beyond basic Kubernetes concepts and focus on enterprise operations, automation (Operators), and security.

Here is a curated list of high-value interview questions categorized by role and complexity.

1. Architectural Concepts

What is the role of the Cluster Version Operator (CVO)?
- Answer: The CVO is the heart of OCP 4.x upgrades. It monitors the “desired state” of the cluster’s operators (the “payload”) and ensures the cluster is updated in a safe, coordinated manner across all components.
Explain the difference between an Infrastructure Node and a Worker Node.
- Answer: Infrastructure nodes are used to host “cluster-level” services like the Router (Ingress), Monitoring (Prometheus/Grafana), and Registry. By labeling nodes as infra, companies can often save on Red Hat subscription costs, as these nodes typically don’t require the same licensing as nodes running application workloads.
What is the “Etcd Quorum” and why is it important in OCP?
- Answer: OpenShift typically requires an odd number of Control Plane nodes (usually 3) to maintain a quorum in the etcd database. If you lose more than half of your masters, the cluster becomes read-only to prevent data corruption.

2. Networking & Traffic (The Gateway API Era)

Explain Ingress vs. Route vs. Gateway API. (See previous discussion)
- Key Focus: Interviewers want to know if you understand that Routes are OCP-native, Ingress is K8s-standard, and Gateway API is the future standard for advanced traffic management (canary, mirroring, etc.).
How does “Service Serving Certificate Secrets” work in OCP?
- Answer: OCP can automatically generate a TLS certificate for a Service. You annotate a Service with service.beta.openshift.io/serving-cert-secret-name. OCP then creates a secret containing a cert/key signed by the internal Cluster CA, allowing for easy end-to-end encryption.

3. Security (The “Hardest” Category)

Scenario: A developer says their pod won’t start because of a “Security Context” error. What do you check?
- Answer: I would check the Security Context Constraints (SCC). By default, OCP runs pods with the restricted-v2 SCC, which prevents running as root. If the pod requires root or host access, I’d check if the ServiceAccount has been granted a more permissive SCC like anyuid or privileged.
What are NetworkPolicies vs. EgressFirewalls?
- Answer: NetworkPolicies control traffic between pods inside the cluster (East-West). EgressFirewalls (part of OCP’s OVN-Kubernetes) control traffic leaving the cluster to external IPs or CIDR blocks (North-South).

4. Troubleshooting & Operations

How do you recover a cluster if the Control Plane certificates have expired?
- Answer: This usually involves using the oc adm certificate approve command to approve pending CSRs (Certificate Signing Requests) or manually rolling back the cluster clock if it’s an emergency. OCP 4.x generally tries to auto-renew these, but clock drift can break it.
Describe the Source-to-Image (S2I) workflow.
- Answer: S2I takes source code from Git, injects it into a “builder image” (like Node.js or Java), and outputs a ready-to-run container image. It simplifies the CI/CD process for developers who don’t want to write Dockerfiles.

5. Advanced / 2026 Trends

What is OpenShift Virtualization (KubeVirt)?
- Answer: It allows you to run legacy Virtual Machines (VMs) as pods on OpenShift. This is critical for “modernizing” apps where one part is a container and the other is a legacy Windows or Linux VM that can’t be containerized yet.
How does Red Hat Advanced Cluster Management (RHACM) help in a multi-cluster setup?
- Answer: RHACM provides a single pane of glass to manage security policies, application placement, and cluster lifecycle (creation/deletion) across multiple OCP clusters on AWS, Azure, and on-prem.

Quick Tip for the Interview

Whenever you answer, use the phrase “Operator-led design.” OpenShift 4 is built entirely on Operators. If the interviewer asks, “How do I fix the registry?” the best answer starts with, “I would check the status of the Image Registry Operator using oc get clusteroperator.” This shows you understand the fundamental architecture of the platform.

As an OpenShift Administrator, your interview will focus heavily on cluster stability, lifecycle management (upgrades), security enforcement, and the “Day 2” operations that keep an enterprise cluster running.

Here are the top admin-focused interview questions for 2026, divided by functional area.

1. Cluster Lifecycle & Maintenance

How does the Cluster Version Operator (CVO) manage upgrades, and what do you check if an upgrade hangs at 57%?
- Answer: The CVO coordinates with all other cluster operators to reach a specific “desired version.” If it hangs, I check oc get clusteroperators to see which specific operator is degraded. Usually, it’s the Machine Config Operator (MCO) waiting for nodes to drain or the Authentication Operator having issues with etcd.
What is the “Must-Gather” tool, and when would you use it?
- Answer: oc adm must-gather is the primary diagnostic tool. It launches a pod that collects logs, CRD states, and operating system debugging info. I use it before opening a Red Hat support ticket or when a complex issue involves multiple operators.
Explain how to back up and restore the etcd database.
- Answer: I use the etcd-snapshot.sh script provided on the control plane nodes. For restoration, I must stop the static pods for the API server and etcd, then use the backup to restore the data directory. It’s critical to do this on a single control plane node first to re-establish a quorum.

2. Node & Infrastructure Management

What is a MachineConfigPool (MCP), and why would you pause it?
- Answer: An MCP groups nodes (like master or worker) so the MCO can apply configurations to them. I would pause an MCP during a sensitive maintenance window or when troubleshooting a configuration change that I don’t want to roll out to all nodes at once.
How do you add a custom SSH key or a CronJob to the underlying RHCOS nodes?
- Answer: You don’t log into the nodes manually. You create a MachineConfig YAML. The MCO then detects this, reboots the nodes (if necessary), and applies the change to the immutable filesystem.
What happens if a node enters a NotReady state?
- Answer: First, I check node pressure (CPU/Memory/Disk). Then I check the kubelet and crio services on the node using oc debug node/<node-name>. I also check for network reachability between the node and the Control Plane.

3. Networking & Security

What is the benefit of OVN-Kubernetes over the legacy OpenShift SDN?
- Answer: OVN-K is the default in 4.x. It supports modern features like IPsec encryption for pod-to-pod traffic, smarter load balancing, and Egress IPs for specific projects to exit the cluster via a fixed IP address for firewall white-listing.
A user is complaining they can’t reach a service in another project. What do you check?
- Answer:
  1. NetworkPolicies: Is there a policy blocking “Cross-Namespace” traffic?
  2. Service/Endpoints: Does the Service have active Endpoints (oc get endpoints)?
  3. Namespace labels: If using a high-isolation network plugin, do the namespaces have the correct labels to “talk” to each other?
How do you restrict a specific group of users from creating LoadBalancer type services?
- Answer: I would use an Admission Controller or a specialized RBAC role that removes the update/create verbs for the services/status resource, or more commonly, use a Policy Engine like Gatekeeper/OPA to deny the request.

4. Storage & Capacity Planning

How do you handle “Volume Expansion” if a database runs out of space?
- Answer: If the underlying StorageClass supports allowVolumeExpansion: true, I simply edit the PersistentVolumeClaim (PVC) and increase the storage value. OpenShift and the CSI driver handle the resizing of the file system on the fly.
What is the difference between ReadWriteOnce (RWO) and ReadWriteMany (RWX)?
- Answer: RWO allows only one node to mount the volume (good for databases). RWX allows multiple nodes/pods to mount it simultaneously (required for shared file storage like NFS or ODF).

5. Scenario-Based: “The Midnight Call”

Scenario: The Web Console is down, and oc commands are timing out. Where do you start?
- Answer: This sounds like an API Server or etcd failure. I would:
  1. Log into a Control Plane node directly via SSH.
  2. Check the status of static pods in /etc/kubernetes/manifests.
  3. Run crictl ps to see if the kube-apiserver or etcd containers are crashing.
  4. Check the node’s disk space (etcd often fails if the disk is 100% full).

Pro-Tip for Admin Interviews:

In 2026, emphasize GitOps. Mention that you prefer managing cluster configurations (like HTPasswd providers or Quota objects) via ArgoCD rather than manual oc apply commands. This shows you are an admin who values Idempotency and Disaster Recovery.

cert-manager in Kubernetes

April 16, 2026April 16, 2026 techhadoop cloud, devops, docker, kubernetes, technology

cert-manager in Kubernetes is an add-on that automates TLS certificate management.

It helps you:

request certificates
renew them before they expire
store them as Kubernetes Secrets
connect them to Ingresses, gateways, and apps

Common uses:

Let’s Encrypt certificates for public apps
internal PKI certificates for cluster services
webhook, mTLS, and service cert automation

Core pieces:

Issuer / ClusterIssuer: tells cert-manager where certs come from
- Issuer = namespaced
- ClusterIssuer = cluster-wide
Certificate: declares the cert you want
Secret: where the cert and private key get stored
Challenge / Order: used mainly with ACME providers like Let’s Encrypt

Typical flow:

You install cert-manager.
Create an Issuer or ClusterIssuer.
Create a Certificate resource or annotate an Ingress.
cert-manager gets the cert and saves it into a Secret.
Your Ingress/controller uses that Secret for HTTPS.

Very simple example with Let’s Encrypt:

			
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    email: you@example.com
    server: https://acme-v02.api.letsencrypt.org/directory
    privateKeySecretRef:
      name: letsencrypt-prod
    solvers:
    - http01:
        ingress:
          class: nginx

		

Then a certificate:

			
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: myapp-cert
  namespace: default
spec:
  secretName: myapp-tls
  issuerRef:
    name: letsencrypt-prod
    kind: ClusterIssuer
  dnsNames:
  - example.com

		

Why people use it:

avoids manual certificate renewal
reduces downtime from expired certs
works natively with Kubernetes resources
supports public and private certificate authorities

Alright — let’s walk through cert-manager in Kubernetes step by step, focusing on the three things you asked for:

1️⃣ How to install cert-manager

The easiest and most reliable way is using Helm.

Add the Helm repo

			
helm repo add jetstack https://charts.jetstack.io
helm repo update

Install cert-manager (with CRDs)

			
helm install cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --set installCRDs=true

Verify installation

kubectl get pods -n cert-manager

You should see pods like:

cert-manager
cert-manager-webhook
cert-manager-cainjector

2️⃣ How cert-manager works with Ingress

This is the most common real-world setup.

🔁 Flow (important concept)

You create an Ingress
Add a special annotation → tells cert-manager to issue a cert
cert-manager:
- requests cert from Let’s Encrypt
- completes challenge (HTTP-01 or DNS-01)
- stores cert in a Secret
Ingress uses that Secret → HTTPS works automatically

🧩 Example Ingress with cert-manager

			
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: myapp
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
  tls:
  - hosts:
    - example.com
    secretName: myapp-tls
  rules:
  - host: example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: myapp-service
            port:
              number: 80

		

👉 Key things:

annotation → tells cert-manager which issuer to use
tls.secretName → where the cert will be stored
cert-manager fills that Secret automatically

3️⃣ Issuer vs ClusterIssuer vs Certificate

This is where people usually get confused — here’s the clean breakdown:

Issuer

Namespace-scoped
Only works inside one namespace

			
kind: Issuer
metadata:
  name: my-issuer
  namespace: default

Use when:

small apps
strict namespace isolation

ClusterIssuer

Cluster-wide
Can be used from any namespace

			
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod

Use when:

shared infrastructure
most production setups ✅

🟣 Certificate

Declares what cert you want
cert-manager turns this into a real cert

			
kind: Certificate
metadata:
  name: my-cert
spec:
  secretName: my-tls
  dnsNames:
  - example.com
  issuerRef:
    name: letsencrypt-prod
    kind: ClusterIssuer

		

How they connect together

Think of it like this:

Issuer / ClusterIssuer → “Who gives me the cert?”
Certificate → “I want a cert for this domain”
Secret → “Here’s your actual cert + key”
Ingress → “Use this cert for HTTPS”

Typical production setup

Most teams use:

✅ ClusterIssuer (Let’s Encrypt)
✅ Ingress annotation (auto certs)
❌ No manual Certificate objects (optional)

Bonus: HTTP-01 vs DNS-01

HTTP-01 (most common)

cert-manager creates a temporary endpoint
Let’s Encrypt verifies via HTTP
works with Ingress

DNS-01

adds a DNS record to prove ownership
needed for:
- wildcard certs (*.example.com)
- internal services

Great question — this is where cert-manager becomes really powerful.

At a high level:

👉 cert-manager = certificate lifecycle automation
👉 Service mesh (Istio / Linkerd) = uses certificates for mTLS between services

So cert-manager can act as the certificate authority (or CA manager) for your mesh.

🧠 Big picture: how they fit together

			
cert-manager → issues certificates
        ↓
service mesh → uses them for mTLS
        ↓
secure pod-to-pod communication

		

🔐 What mTLS in a service mesh actually means

In both Istio and Linkerd:

Every pod gets a certificate + private key
Pods authenticate each other using certs
Traffic is:
- encrypted ✅
- authenticated ✅
- tamper-proof ✅

⚙️ Option 1: Built-in CA (default behavior)

Istio / Linkerd by default:

run their own internal CA
automatically issue certs to pods
rotate certs

👉 This works out-of-the-box and is easiest.

🧩 Option 2: Using cert-manager as the CA

This is where integration happens.

Instead of mesh managing certs itself:

👉 cert-manager becomes the source of truth for certificates

🧱 Architecture with cert-manager

			
cert-manager
   ↓
(Issuer / ClusterIssuer)
   ↓
Mesh control plane (Istio / Linkerd)
   ↓
Sidecars / proxies in pods

		

🔵 Istio + cert-manager

Default Istio:

uses istiod as CA

With cert-manager:

you replace Istio’s CA with:
- cert-manager + external CA (Vault, Let’s Encrypt, internal PKI)

Common approach: Istio + cert-manager + external CA

cert-manager:

manages root/intermediate certs

Istio:

requests workload certs from that CA

Why do this?

centralized certificate management
enterprise PKI integration (e.g. HashiCorp Vault)
compliance requirements

Linkerd + cert-manager

Linkerd has cleaner native integration.

👉 Linkerd actually recommends using cert-manager.

How it works:

cert-manager issues:
- trust anchor (root cert)
- issuer cert
Linkerd uses those to:
- issue certs to proxies
- rotate automatically

Example flow:

Create a ClusterIssuer (e.g. self-signed or Vault)
cert-manager generates:
- root cert
- intermediate cert
Linkerd control plane uses them
Sidecars get short-lived certs

🔁 Certificate lifecycle in mesh (with cert-manager)

cert-manager creates CA certs
mesh control plane uses them
sidecars request short-lived certs
certs rotate automatically

When to use cert-manager with a mesh

✅ Use cert-manager if:

you need custom CA / PKI
you want centralized certificate control
you’re integrating with:
- Vault
- enterprise PKI
compliance/security requirements

❌ Skip it if:

you just want simple mTLS
default mesh CA is enough

Important distinction

👉 cert-manager does NOT handle:

traffic encryption itself
service-to-service routing

👉 service mesh does NOT handle:

external certificate issuance (well)
complex PKI integrations (alone)

Simple mental model

cert-manager = certificate factory
Istio / Linkerd = security + traffic engine

Interview-style summary

If you need a sharp answer:

“cert-manager integrates with service meshes by acting as an external certificate authority. While Istio and Linkerd can issue certificates internally, cert-manager enables centralized PKI management, supports external CAs like Vault, and provides automated rotation, making it useful for production-grade mTLS setups.”

Here’s a real-world debugging checklist for cert-manager + service mesh / mTLS, organized in the order that usually finds the issue fastest.

1. Start with the symptom, not the YAML

First sort the failure into one of these buckets:

Certificate issuance problem: Secrets are missing, Certificate is not Ready, ACME challenges fail, or issuer/webhook errors appear. cert-manager’s troubleshooting flow centers on the Certificate, CertificateRequest, Order, and Challenge resources. (cert-manager)
Mesh identity / mTLS problem: certificates exist, but workloads still fail handshakes, sidecars can’t get identities, or mesh health checks fail. Istio and Linkerd both separate certificate management from runtime identity distribution. (Istio)

That split matters because cert-manager can be healthy while the mesh is broken, and vice versa. (cert-manager)

2. Confirm the control planes are healthy

Check the obvious first:

			
kubectl get pods -n cert-manager
kubectl get pods -n istio-system
kubectl get pods -n linkerd

For cert-manager, the important core components are the controller, webhook, and cainjector; webhook issues are a documented source of certificate failures. (cert-manager)

For Linkerd, run:

linkerd check

Linkerd’s official troubleshooting starts with linkerd check, and many identity and certificate problems show up there directly. (Linkerd)

For Istio, check control-plane health and then inspect config relevant to CA integration if you are using istio-csr or another external CA path. Istio’s cert-manager integration for workload certificates requires specific CA-server changes. (cert-manager)

3. Check the certificate objects before the Secrets

If cert-manager is involved, do this before anything else:

			
kubectl get certificate -A
kubectl describe certificate <name> -n <ns>
kubectl get certificaterequest -A
kubectl describe certificaterequest <name> -n <ns>

cert-manager’s own troubleshooting guidance points to these resources first because they expose the reason issuance or renewal failed. (cert-manager)

What you’re looking for:

Ready=False
issuer not found
permission denied
webhook validation errors
failed renewals
pending requests that never progress

If you’re using ACME, continue with:

			
kubectl get order,challenge -A
kubectl describe order <name> -n <ns>
kubectl describe challenge <name> -n <ns>

ACME failures are usually visible at the Order / Challenge level. (cert-manager)

4. Verify the issuer chain and secret contents

Typical failure pattern: the Secret exists, but it is the wrong Secret, wrong namespace, missing keys, or signed by the wrong CA.

Check:

			
kubectl get issuer,clusterissuer -A
kubectl describe issuer <name> -n <ns>
kubectl describe clusterissuer <name>
kubectl get secret <secret-name> -n <ns> -o yaml

For mesh-related certs, validate:

the Secret name matches what the mesh expects
the Secret is in the namespace the mesh component actually reads
the chain is correct
the certificate has not expired
the issuer/trust anchor relationship is the intended one

In Linkerd specifically, the trust anchor and issuer certificate are distinct, and Linkerd documents that workload certs rotate automatically but the control-plane issuer/trust-anchor credentials do not unless you set up rotation. (Linkerd)

5. Check expiration and rotation next

A lot of “random” mesh outages are just expired identity material.

For Linkerd, verify:

trust anchor validity
issuer certificate validity
whether rotation was automated or done manually

Linkerd’s docs are explicit that proxy workload certs rotate automatically, but issuer and trust anchor rotation require separate handling; expired root or issuer certs are a known failure mode. (Linkerd)

For Istio, if using a custom CA or Kubernetes CSR integration, verify the configured CA path and signing certs are still valid and match the active mesh configuration. (cert-manager)

6. If this is Istio, verify whether the mesh is using its built-in CA or an external one

This is a very common confusion point.

If you use cert-manager with Istio workloads, you are typically not just “adding cert-manager”; you are replacing or redirecting the CA flow, often through istio-csr or Kubernetes CSR integration. cert-manager’s Istio integration docs call out changes like disabling the built-in CA server and setting the CA address. (cert-manager)

So check:

Is istiod acting as CA, or is an external CA path configured?
Is caAddress pointing to the expected service?
If istio-csr is used, is it healthy and reachable?
Are workload cert requests actually reaching the intended signer?

If that split-brain exists, pods may get no certs or certs from the wrong signer. That is an inference from how Istio’s custom CA flow is wired. (cert-manager)

7. If this is Linkerd, run the identity checks early

For Linkerd, do not guess. Run:

			
linkerd check
linkerd check --proxy

The Linkerd troubleshooting docs center on linkerd check, and certificate / identity issues often surface there more quickly than raw Kubernetes inspection. (Linkerd)

Then look for:

identity component failures
issuer/trust-anchor mismatch
certificate expiration warnings
injected proxies missing identity

If linkerd check mentions expired identity material, go straight to issuer/trust-anchor rotation docs. (Linkerd)

8. Verify sidecar or proxy injection happened

If the pod is not meshed, mTLS debugging is a distraction.

Check:

kubectl get pod <pod> -n <ns> -o yaml

Look for the expected sidecar/proxy containers and mesh annotations. If they are absent, the issue is injection or policy, not certificate issuance. Istio and Linkerd both rely on the dataplane proxy to actually use workload identities for mTLS. (Istio)

9. Check policy mismatches after identities are confirmed

Once certificates and proxies look correct, inspect whether the traffic policy demands mTLS where the peer does not support it.

For Istio, check authentication policy objects such as PeerAuthentication and any destination-side expectations. Istio’s authentication docs cover how mTLS policy is applied. (Istio)

Classic symptom:

one side is strict mTLS
the other side is plaintext, outside mesh, or not injected

That usually produces handshake/reset errors even when cert-manager is completely fine. This is an inference from Istio’s mTLS policy model. (Istio)

10. Read the logs in this order

When the issue is still unclear, the best signal usually comes from logs in this order:

cert-manager controller
cert-manager webhook
mesh identity/CA component (istiod, istio-csr, or Linkerd identity)
the source and destination proxy containers

Use:

			
kubectl logs -n cert-manager deploy/cert-manager
kubectl logs -n cert-manager deploy/cert-manager-webhook
kubectl logs -n istio-system deploy/istiod
kubectl logs -n <istio-csr-namespace> deploy/istio-csr
kubectl logs -n linkerd deploy/linkerd-identity
kubectl logs <pod> -n <ns> -c <proxy-container>

		

cert-manager specifically documents webhook and issuance troubleshooting as core paths. Linkerd and Istio docs likewise center on their identity components for mesh cert issues. (cert-manager)

11. For ingress or gateway TLS, separate north-south from east-west

A lot of teams mix up:

ingress/gateway TLS
service-to-service mTLS

With Istio, cert-manager integration for gateways is straightforward and separate from workload identity. Istio’s docs show cert-manager managing gateway TLS credentials, while workload certificate management is handled through different CA mechanisms. (Istio)

So ask:

Is the failure only at ingress/gateway?
Or only pod-to-pod?
Or both?

If only ingress is broken, inspect the gateway Secret and gateway config, not mesh identity. (Istio)

12. Fast triage map

Use this shortcut:

Certificate not Ready → inspect CertificateRequest, Order, Challenge, issuer, webhook. (cert-manager)
Secret exists but mesh still fails → inspect trust chain, expiry, namespace, and mesh CA configuration. (cert-manager)
Linkerd only → run linkerd check, then inspect issuer/trust anchor status. (Linkerd)
Istio + cert-manager for workloads → verify external CA wiring, especially CA server disablement and caAddress. (cert-manager)
Handshake failures with healthy certs → inspect mesh policy and whether both endpoints are actually meshed. (Istio)

13. The three most common root causes

In practice, the big ones are:

Expired or non-rotated issuer / trust anchor, especially in Linkerd. (Linkerd)
Istio external CA miswiring, especially when using cert-manager for workloads rather than just gateway TLS. (cert-manager)
Policy/injection mismatch, where strict mTLS is enabled but one side is not part of the mesh. (Istio)

14. Minimal command pack to keep handy

			
kubectl get certificate,certificaterequest,issuer,clusterissuer -A
kubectl describe certificate <name> -n <ns>
kubectl get order,challenge -A
kubectl logs -n cert-manager deploy/cert-manager
kubectl logs -n cert-manager deploy/cert-manager-webhook
linkerd check
linkerd check --proxy
kubectl logs -n istio-system deploy/istiod
kubectl get pods -A -o wide
kubectl get secret -A

		

Flux (or FluxCD)

April 16, 2026April 16, 2026 techhadoop ai, cloud, devops, kubernetes, technology

Flux (or FluxCD) is a GitOps continuous delivery tool for Kubernetes. Here’s a concise breakdown:

What it does

Flux is an operator that runs in your Kubernetes cluster, constantly comparing the cluster’s live state to the state defined in your Git repo. If they differ, Flux automatically makes changes to the cluster to match the repo. In other words, Git is the single source of truth — you push a change to Git, Flux detects it and applies it to the cluster automatically, with no manual kubectl apply needed.

How it works — core components

Core components of FluxCD (the GitOps Toolkit) include the Source Controller, Kustomize Controller, Helm Controller, and Notification Controller. Each is a separate Kubernetes controller responsible for one concern:

Source Controller — watches Git repos, Helm repos, OCI registries, and S3 buckets for changes
Kustomize Controller — applies raw YAML and Kustomize overlays to the cluster
Helm Controller — manages HelmRelease objects (declarative Helm chart deployments)
Notification Controller — sends alerts to Slack, Teams, etc. when syncs succeed or fail

Key characteristics

Pull-based model: Flux enables pure pull-based GitOps application deployments — no access to clusters is needed by the source repo or by any other cluster. This is more secure than push-based pipelines where your CI system needs cluster credentials.
Drift detection: If your live cluster diverges from Git (e.g., due to manual edits), Flux will detect the drift and revert it, ensuring deterministic deployments.
Kubernetes-native: Flux v2 is built from the ground up to use Kubernetes’ API extension system. Everything is a CRD — GitRepository, Kustomization, HelmRelease, etc.
Security-first: Flux uses true Kubernetes RBAC via impersonation and supports multiple Git repositories. It follows a pull vs. push model, least amount of privileges, and adheres to Kubernetes security policies with tight integration with security tools.
Multi-cluster: Flux can use one Kubernetes cluster to manage apps in either the same or other clusters, spin up additional clusters, and manage cluster fleets.

CNCF standing & adoption

Flux is a Cloud Native Computing Foundation (CNCF) graduated project, used in production by various organisations and cloud providers. Notable users include Deutsche Telekom (managing 200+ clusters with just 10 engineers), the US Department of Defense, and Microsoft Azure (which uses Flux natively in AKS and Azure Arc).

Flux vs. Argo CD (the main alternative)

Flux CD is highly composable — use only the controllers you need. It’s preferred by teams who already think in CRDs and reconciliation loops, and is excellent for infrastructure-as-code and complex dependency handling. The main trade-off is that Flux has some drawbacks such as lack of a native UI and a steep learning curve. Argo CD is the better choice if your team wants a rich visual dashboard out of the box.

Relation to OCP

Flux is commonly used with OpenShift as the GitOps engine for managing cluster configuration and application deployments. Red Hat also ships OpenShift GitOps (based on Argo CD) as an official operator, so in OCP environments you’ll encounter both — Flux tends to be chosen by platform engineering teams who want tighter Kubernetes-native control, while OpenShift GitOps is the supported out-of-the-box option from Red Hat.

Here’s a thorough breakdown of how Flux integrates with OCP:

Installation — two options

Option 1: Flux Operator via OperatorHub (recommended)

Flux can be installed on a Red Hat OpenShift cluster directly from OperatorHub using the Flux Operator — an open-source project part of the Flux ecosystem that provides a declarative API for the lifecycle management of the Flux controllers on OpenShift.

Once installed, you declare a FluxInstance CR with cluster.type: openshift:

			
apiVersion: fluxcd.controlplane.io/v1
kind: FluxInstance
metadata:
  name: flux
  namespace: flux-system
spec:
  distribution:
    version: "2.x"
    registry: "ghcr.io/fluxcd"
  cluster:
    type: openshift       # ← tells Flux it's on OCP
    multitenant: true
    networkPolicy: true
  sync:
    kind: GitRepository
    url: "https://my-git-server.com/my-org/my-fleet.git"
    ref: "refs/heads/main"
    path: "clusters/my-cluster"

		

Option 2: flux bootstrap CLI

The best way to install Flux on OpenShift via CLI is to use the flux bootstrap command. This command works with GitHub, GitLab, as well as generic Git providers. You require cluster-admin privileges to install Flux on OpenShift.

The OCP-specific challenge: SCCs

OCP’s default restricted-v2 SCC blocks containers from running as root — and Flux controllers, like many Kubernetes tools, need specific adjustments to run cleanly. The official integration handles this by:

Shipping a scc.yaml manifest that grants Flux controllers the correct non-root SCC permissions
Patching the Kustomization to remove the default SecComp profile and enforce the correct UID expected by Flux images, preventing OCP from altering the container user

The cluster.type: openshift flag in the FluxInstance spec automatically applies these adjustments — no manual SCC patching needed when using the Flux Operator.

What the integration looks like end-to-end

			
┌─────────────────────────────────────────────────────┐
│                   Git Repository                     │
│  clusters/my-cluster/                               │
│    ├── flux-system/   (Flux bootstrap manifests)    │
│    ├── namespaces/    (OCP Projects)                │
│    ├── rbac/          (Roles, RoleBindings, SCCs)   │
│    └── apps/          (Deployments, Routes, etc.)   │
└────────────────────┬────────────────────────────────┘
                     │  pull (every ~1 min)
                     ▼
┌─────────────────────────────────────────────────────┐
│              OCP Cluster (flux-system ns)           │
│  source-controller   → watches Git/OCI/Helm repos  │
│  kustomize-controller→ applies YAML/Kustomize       │
│  helm-controller     → manages HelmReleases         │
│  notification-ctrl   → sends alerts to Slack etc.  │
└─────────────────────────────────────────────────────┘

		

Multi-tenancy on OCP

When multitenant: true is set, Flux uses true Kubernetes RBAC via impersonation — meaning each tenant’s Kustomization runs under its own service account, scoped to its own namespace. This maps naturally to OCP Projects, where each team or app gets an isolated namespace with its own SCC and RBAC policies.

The pattern looks like this in Git:

			
# tenants/team-a/kustomization.yaml
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  name: team-a-apps
  namespace: flux-system
spec:
  serviceAccountName: team-a-reconciler  # impersonates this SA
  targetNamespace: team-a               # deploys into this OCP Project
  path: ./tenants/team-a/apps
  sourceRef:
    kind: GitRepository
    name: fleet-repo

		

Each team-a-reconciler service account only has permissions within team-a‘s namespace — enforced by both RBAC and the namespace’s SCC policies.

Key considerations for OCP + Flux

Topic	Detail
Testing	Flux v2.3 was the first release end-to-end tested on OpenShift.
Operator lifecycle	When a subscription is applied, OpenShift’s Operator Lifecycle Manager (OLM) automatically handles upgrading Flux.
Enterprise support	Backwards compatibility with older versions of Kubernetes and OpenShift is offered by vendors such as ControlPlane that provide enterprise support for Flux.
vs. OpenShift GitOps	Red Hat ships its own GitOps operator (based on Argo CD) as the officially supported option. Flux on OCP is community/third-party supported, preferred by teams who want a more Kubernetes-native, CLI-driven approach.
NetworkPolicy	Setting `networkPolicy: true` in the FluxInstance spec automatically creates NetworkPolicies for the `flux-system` namespace, restricting controller-to-controller traffic.

OCP (OpenShift Container Platform) Security Best Practices

April 16, 2026 techhadoop ai, cloud, devops, kubernetes, technology

Identity & Access Control

RBAC & Least Privilege: Every user, service account, and process should possess only the absolute minimum permissions needed. Isolate workloads using distinct service accounts, each bound to Roles containing relevant permissions, and avoid attaching sensitive permissions directly to user accounts.
Strong Authentication: Implement robust authentication mechanisms such as multi-factor authentication (MFA) or integrate with existing identity management systems to prevent unauthorized access.
Audit Regularly: Regularly audit Roles, ClusterRoles, RoleBindings, and SCC usage to ensure they remain aligned with the principle of least privilege and current needs.
Avoid kubeadmin: Don’t use the default kubeadmin superuser account in production — integrate with an enterprise identity provider instead.

Cluster & Node Hardening

Use RHCOS for nodes: It is best to leverage OpenShift’s relationship with cloud providers and use the most recent Red Hat Enterprise Linux CoreOS (RHCOS) for all OCP cluster nodes. RHCOS is designed to be as immutable as possible, and any changes to the node must be authorized through the Red Hat Machine Operator — no direct user access needed.
Control plane HA: A minimum of three control-plane nodes should be configured to allow accessibility in a node outage event.
Network isolation: Strict network isolation prevents unauthorized external ingress to OpenShift cluster API endpoints, nodes, or pod containers. The DNS, Ingress Controller, and API server can be set to private after installation.

Container Image Security

Scan images continuously: Use image scanning tools to detect vulnerabilities and malware within container images. Use trusted container images from reputable sources and regularly update them to include the latest security patches.
Policy enforcement: Define and enforce security policies for container images, ensuring that only images meeting specific criteria — such as being signed by trusted sources or containing no known vulnerabilities — are deployed.
No root containers: OpenShift has stricter security policies than vanilla Kubernetes — running a container as root is forbidden by default.

Security Context Constraints (SCCs)

OpenShift uses Security Context Constraints (SCCs) that give your cluster a strong security base. By default, OpenShift prevents cluster containers from accessing protected Linux features such as shared file systems, root access, and certain core capabilities like the KILL command. Always use the most restrictive SCC that still allows your workload to function.

Network Security

Zero-trust networking: Apply granular access controls between individual pods, namespaces, and services in Kubernetes clusters and external resources, including databases, internal applications, and third-party cloud APIs.
Use NetworkPolicies to restrict east-west traffic between namespaces and pods by default.
Egress control: Use Egress Gateways or policies to control outbound traffic from pods.

Compliance & Monitoring

Compliance Operator: The OpenShift Compliance Operator supports profiles for standards including PCI-DSS versions 3.2.1 and 4.0, enabling automated compliance scanning across the cluster.
Continuous monitoring: Use robust logging and monitoring solutions to gain visibility into container behavior, network flows, and resource utilization. Set up alerts for abnormalities like unusually high memory or CPU usage that could indicate compromise.
Track CVEs proactively: Security, bug fix, and enhancement updates for OCP are released as asynchronous errata through the Red Hat Network. Registry images should be scanned upon notification and patched if affected by new vulnerabilities.

Namespace & Project Isolation

Using projects and namespaces simplifies management and enhances security by limiting the potential impact of compromised applications, segregating resources based on application/team/environment, and ensuring users can only access the resources they are authorized to use.

Key tools to leverage: Advanced Cluster Security (ACS/StackRox), Compliance Operator, OpenShift built-in image registry with scanning, and NetworkPolicy/Calico for zero-trust networking.

SCCs (Security Context Constraints) are OpenShift’s pod-level security gate — separate from RBAC. The golden rules are: always start from restricted-v2, never modify built-in SCCs, create custom ones when needed, assign them to dedicated service accounts (not users), and never grant anyuid or privileged to app workloads.

RBAC controls what users and service accounts can do via the API. The key principle is deny-by-default — bind roles to groups rather than individuals, keep bindings namespace-scoped unless cross-namespace is genuinely needed, audit regularly with oc auth can-i and oc policy who-can, and never touch default system ClusterRoles.

Network Policies implement microsegmentation at the pod level. The pattern is always: default-deny first, then explicitly open only what’s needed — ingress from the router, traffic from the same namespace, and specific app-to-app flows. For egress, use EgressNetworkPolicy to whitelist specific CIDRs or domains and block everything else.

All three layers work together: RBAC controls the API plane, SCCs control the node plane, and NetworkPolicies control the network plane. A strong OCP security posture needs all three.

AKS – Security Best Practice

April 15, 2026April 15, 2026 techhadoop ai, azure, cloud, kubernetes, technology

For a brand-new microservices project in 2026, security isn’t just a “layer” you add at the end—it’s baked into the infrastructure. AKS has introduced several “secure-by-default” features that simplify this.

Here are the essential security best practices for your new setup:

1. Identity over Secrets (Zero Trust)

In 2026, storing connection strings or client secrets in Kubernetes “Secrets” is considered an anti-pattern.

Best Practice: Use Microsoft Entra Workload ID.
Why: Instead of your app having a password to access a database, your Pod is assigned a “Managed Identity.” Azure confirms the Pod’s identity via a signed token, granting it access without any static secrets that could be leaked.
New in 2026: Enable Conditional Access for Workload Identities to ensure a microservice can only connect to your database if it’s running inside your specific VNet.

2. Harden the Host (Azure Linux 3.0)

The operating system running your nodes is part of your attack surface.

Best Practice: Standardize on Azure Linux 3.0 (CBL-Mariner).
Why: It is a “distroless-adjacent” host OS. It contains ~500 packages compared to the thousands in Ubuntu, drastically reducing the number of vulnerabilities (CVEs) you have to patch.
Advanced Isolation: For sensitive services (like payment processing), enable Pod Sandboxing. This uses Kata Containers to run the service in a dedicated hardware-isolated micro-VM, preventing “container breakout” attacks where a hacker could jump from your app to the node.

3. Network “Blast Radius” Control

If one microservice is compromised, you don’t want the attacker to move laterally through your entire cluster.

Best Practice: Use Cilium for Network Policy.
Why: As of 2026, Cilium is the gold standard for AKS networking. It uses eBPF technology to filter traffic at the kernel level.
Strategy: Implement a Default Deny policy. By default, no service should be able to talk to any other service unless you explicitly write a rule allowing it.

4. API Server Protection

The Kubernetes API server is the “front door” to your cluster. If someone gets in here, they own everything.

Best Practice: Use API Server VNet Integration (Private Clusters).
Why: This ensures your cluster’s management endpoint is not reachable from the public internet. It exists only inside your private network.
Access Control: Use Microsoft Entra RBAC (Role-Based Access Control). Never use the “cluster-admin” local account. Link permissions to your team’s Entra ID groups so that when an employee leaves the company, their cluster access is revoked instantly.

5. Continuous Supply Chain Security

Security starts before the code even reaches AKS.

Best Practice: Enable Defender for Cloud and Binary Authorization.
Why: Defender for Cloud scans your images in the Azure Container Registry (ACR) for malware and secrets.
Enforcement: Use Azure Policy for Kubernetes to block any deployment that hasn’t been scanned or isn’t coming from your trusted registry.

Summary Security Checklist

Area	2026 Standard
Identity	Microsoft Entra Workload ID (No Secrets)
OS	Azure Linux 3.0 with OS Guard
Network	Cilium with mTLS (Service Mesh)
Access	Private Cluster + Entra RBAC
Governance	Azure Policy “Restricted” Baseline

Pro-Tip: Check your Secure Score in Microsoft Defender for Cloud weekly. It will give you a prioritized list of “quick fixes” for your specific AKS cluster based on real-time threats.

With a Service Mesh (specifically the Istio-based add-on for AKS), you are moving toward a “Zero Trust” network architecture. In this setup, the network is no longer trusted by default; every connection must be verified and encrypted.

Here is the 2026 security blueprint for running microservices with Istio on AKS.

1. Automated mTLS (Encryption in Transit)

By default, traffic between Kubernetes Pods is unencrypted. With Istio, you can enforce Strict Mutual TLS (mTLS) without changing a single line of application code.

The Best Practice: Apply a PeerAuthentication policy at the namespace level set to STRICT.
The Result: Any service that tries to connect via plain text will be instantly rejected by the sidecar proxy. This ensures that even if an attacker gains access to your internal network, they cannot “sniff” sensitive data (like headers or tokens) passing between services.

2. Identity-Based Authorization

IP addresses are ephemeral in Kubernetes and shouldn’t be used for security. Istio uses SPIFFE identities based on the service’s Kubernetes Service Account.

The Best Practice: Use AuthorizationPolicy to define “Who can talk to Whom.”
Example: You can create a rule that says the Email Service can only receive requests from the Orders Service, and only if the request is a POST to the /send-receipt endpoint. Everything else is blocked at the source.

3. Secure the “Front Door” (Ingress Gateway)

In 2026, the Kubernetes Gateway API has reached full GA (General Availability) for the AKS Istio add-on.

The Best Practice: Use the Gateway and HTTPRoute resources instead of the older Ingress objects.
Security Benefit: It allows for better separation of concerns. Your platform team can manage the physical load balancer (the Gateway), while your developers manage the routing rules (HTTPRoute) for their specific microservices.

4. Dapr + Istio: The “Power Couple”

Since you are building microservices, you might also use Dapr for state and messaging. In 2026, these two work together seamlessly but require one key configuration:

The Best Practice: If both are present, let Istio handle the mTLS and Observability, and disable mTLS in Dapr.
Why: Having two layers of encryption (“double wrapping” packets) adds significant latency and makes debugging network issues a nightmare.

5. Visualizing the “Blast Radius”

The biggest security risk in microservices is lateral movement.

The Best Practice: Use the Kiali dashboard (integrated with AKS) to view your service graph in real-time.
The Security Win: If you see a weird line of communication between your Public Web Frontend and your Internal Payment Database that shouldn’t exist, you’ve found a security hole or a misconfiguration before it becomes a breach.

Summary Security Checklist for Istio on AKS

Task	2026 Recommended Tool
Transport Security	`PeerAuthentication` (mode: STRICT)
Service Permissions	Istio `AuthorizationPolicy`
External Traffic	Kubernetes Gateway API (Managed Istio Ingress)
Egress (Outgoing)	Service Entry (Block all traffic to external sites except specific approved domains)
Auditing	Azure Monitor for Containers + Istio Access Logs

Warning for 2026: Ensure your worker nodes have enough “headroom.” Istio sidecars (Envoy proxies) consume roughly 0.5 to 1.0 vCPU and several hundred MBs of RAM per Pod. For a project with many small microservices, this “sidecar tax” can add up quickly.

Ingress and API Gateways

March 28, 2026March 28, 2026 techhadoop ai, cloud, devops, kubernetes, technology

In the world of Kubernetes and OpenShift, both Ingress and API Gateways serve as the entry point for external traffic. While they overlap in functionality, they operate at different levels of the networking stack and offer different “intelligence” regarding how they handle requests.

Think of Ingress as a simple receptionist directing people to the right room, while an API Gateway is a concierge who also checks IDs, translates languages, and limits how many people enter at once.

1. What is Ingress?

Ingress is a native Kubernetes resource (Layer 7) that manages external access to services, typically HTTP and HTTPS.

Primary Job: Simple routing based on the URL path (e.g., /api) or the hostname (e.g., app.example.com).
Implementation: In OCP, this is usually handled by the OpenShift Ingress Controller (based on HAProxy) using Routes.
Pros: Lightweight, standard across Kubernetes, and built-in.
Cons: Limited “logic.” It’s hard to do complex things like rate limiting, authentication, or request transformation without custom annotations.

2. What is an API Gateway?

An API Gateway is a more sophisticated proxy that sits in front of your microservices to provide “cross-cutting concerns.”

Primary Job: API Management. It handles security, monitoring, and orchestration.
Key Features:
- Authentication/Authorization: Validating JWT tokens or API keys before the request hits the service.
- Rate Limiting: Ensuring one user doesn’t spam your backend.
- Payload Transformation: Changing a XML request to JSON for a modern backend.
- Circuit Breaking: Stopping traffic to a failing service to prevent a total system crash.
Examples: Kong, Tyk, Apigee, or the Red Hat 3scale API Management platform.

Key Comparison Table

Feature	Ingress / Route	API Gateway
OSI Layer	Layer 7 (HTTP/S)	Layer 7 + Application Logic
Main Goal	Expose services to the internet	Protect and manage APIs
Complexity	Low	High
Security	Basic SSL/TLS termination	JWT, OAuth, mTLS, IP Whitelisting
Traffic Control	Simple Load Balancing	Rate Limiting, Quotas, Retries
Cost	Usually free (built into OCP)	Often requires licensing or extra infra

When to use which?

Use Ingress/Routes when: You have a web application and just need to point a domain name to a service. It’s the “plumbing” of the cluster.
Use an API Gateway when: You are exposing APIs to third parties, need strict usage tracking (monetization), or want to centralize security logic so your developers don’t have to write auth code for every single microservice.

The “Modern” Middle Ground: Gateway API

There is a newer Kubernetes standard called the Gateway API. It is designed to replace Ingress by providing the power of an API Gateway (like header-based routing and traffic splitting) while remaining a standard part of the Kubernetes ecosystem. In OpenShift, you can enable the Gateway API through the Operator.

To help you see the evolution, here is how the “old” standard (Ingress) compares to the “new” standard (Gateway API).

1. The Traditional Ingress

Ingress is a single, “flat” resource. It’s simple but limited because the person who owns the app (the developer) and the person who owns the network (the admin) have to share the same file.

YAML

			
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-app-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  rules:
  - host: app.example.com
    http:
      paths:
      - path: /api
        pathType: Prefix
        backend:
          service:
            name: api-service
            port:
              number: 80

		

The Problem: If you want to do something fancy like a “Canary deployment” (sending 10% of traffic to a new version), you usually have to use messy, vendor-specific annotations.

2. The Modern Gateway API

The Gateway API breaks the configuration into pieces. This allows the Cluster Admin to define the entry point (the Gateway) and the Developer to define how their specific app is reached (the HTTPRoute).

The Admin’s Part (The Infrastructure):

YAML

			
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: external-gateway
spec:
  gatewayClassName: openshift-default
  listeners:
  - name: http
    protocol: HTTP
    port: 80

		

The Developer’s Part (The Logic & Traffic Splitting):

YAML

			
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: my-app-route
spec:
  parentRefs:
  - name: external-gateway
  hostnames:
  - "app.example.com"
  rules:
  - matches:
    - path: { type: PathPrefix, value: /api }
    backendRefs:
    - name: api-v1
      port: 80
      weight: 90  # 90% of traffic here
    - name: api-v2
      port: 80
      weight: 10  # 10% of traffic to the new version!

		

Summary of Differences

Feature	Ingress	Gateway API
Structure	Monolithic (One file for everything)	Role-based (Separated for Admin vs Dev)
Traffic Splitting	Requires non-standard annotations	Built-in (Weights/Canary)
Extensibility	Limited	High (Supports TCP, UDP, TLS, GRPC)
Portability	High (but annotations are not)	Very High (Standardized across vendors)

Why OpenShift is moving this way

OpenShift 4.12+ fully supports the Gateway API because it solves the “annotation hell” that occurred when users tried to make basic Ingress act like a full API Gateway. It gives you the power of a professional Gateway (like Kong or Istio) but stays within the native Kubernetes language.

In OpenShift 4.15 and later (reaching General Availability in 4.19), the Gateway API is managed by the Cluster Ingress Operator. Unlike standard Kubernetes where you might have to install many CRDs manually, OpenShift streamlines this by bundling the controller logic into its existing operators.

Here is the step-by-step process to enable and use it.

1. Enable the Gateway API CRDs

In newer versions of OCP, the CRDs are often present but “dormant” until a GatewayClass is created. The Ingress Operator watches for a specific controllerName to trigger the installation of the underlying proxy (which is Istio/Envoy in the Red Hat implementation).

Create the GatewayClass:

YAML

			
apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
  name: openshift-default
spec:
  controllerName: openshift.io/gateway-controller/v1

		

What happens next? The Ingress Operator will automatically detect this and start a deployment called istiod-openshift-gateway in the openshift-ingress namespace.

2. Set up a Wildcard Certificate (Required)

Unlike standard Routes, the Gateway API in OCP does not automatically generate a default certificate. You need to provide a TLS secret in the openshift-ingress namespace.

Bash

			
# Example: Creating a self-signed wildcard for testing
oc -n openshift-ingress create secret tls gwapi-wildcard \
  --cert=wildcard.crt --key=wildcard.key

3. Deploy the Gateway

The Gateway represents the actual “entry point” or load balancer.

YAML

			
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
  name: my-gateway
  namespace: openshift-ingress
spec:
  gatewayClassName: openshift-default
  listeners:
  - name: https
    protocol: HTTPS
    port: 443
    hostname: "*.apps.mycluster.com"
    tls:
      mode: Terminate
      certificateRefs:
      - name: gwapi-wildcard

		

4. Create an HTTPRoute (Developer Task)

Now that the “door” (Gateway) is open, a developer in a different namespace can “attach” their application to it.

YAML

			
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: my-app-route
  namespace: my-app-project
spec:
  parentRefs:
  - name: my-gateway
    namespace: openshift-ingress
  hostnames:
  - "myapp.apps.mycluster.com"
  rules:
  - backendRefs:
    - name: my-app-service
      port: 8080

		

Summary Checklist for the Interview

If you are asked how to set this up in an interview, remember these four pillars:

Operator-Led: It’s managed by the Ingress Operator; no separate “Gateway Operator” is needed for the default Red Hat implementation.
Implementation: OpenShift uses Envoy (via a lightweight Istio control plane) as the engine behind the Gateway API.
Namespace: The Gateway object itself almost always lives in openshift-ingress.
Service Type: Creating a Gateway usually triggers the creation of a Service type: LoadBalancer automatically.

Ingress vs Service mesh

March 28, 2026March 28, 2026 techhadoop ai, cloud, devops, kubernetes, technology

Ingress and a service mesh solve different networking problems.

Ingress
Ingress is a Kubernetes API object for managing external access into the cluster, typically HTTP/HTTPS. It routes inbound requests based on hosts and paths to backend Services. Kubernetes now says Ingress is stable but frozen, and recommends the newer Gateway API for future development. (Kubernetes)

Service mesh
A service mesh is an infrastructure layer for service-to-service communication inside and around your app, adding things like traffic policy, observability, and zero-trust security without changing app code. In Istio, this includes traffic routing, retries, timeouts, fault injection, mTLS, authentication, and authorization. (Istio)

Practical difference
Think of it like this:

Ingress = the front door to your cluster
Service mesh = the road system and security checkpoints between services inside the cluster

Use Ingress when
You need:

a public endpoint for your app
host/path routing like api.example.com or /shop
TLS termination for incoming web traffic

That is the classic “internet → cluster → service” problem. (Kubernetes)

Use a service mesh when
You need:

service-to-service observability
mutual TLS between workloads
canary / weighted routing between versions
retries, timeouts, circuit breaking
policy and identity for east-west traffic
control over some outbound traffic too

Istio’s docs specifically describe percentage routing, version-aware routing, external service entries, retries, timeouts, and circuit breakers. (Istio)

Do they overlap?
A little. Both can influence traffic routing, but at different scopes:

Ingress mainly handles north-south traffic: outside users coming in
Service mesh mainly handles east-west traffic: service-to-service traffic inside the platform

A mesh can also handle ingress/egress via its own gateways, but that is a broader and heavier solution than plain Kubernetes Ingress. (Kubernetes)

Which should you choose?

For a simple web app exposing a few services: Ingress is usually enough.
For microservices that need security, tracing, traffic shaping, and resilience: service mesh is worth considering.
Many teams use both: one for external entry, one for internal communication.

One current note: for new Kubernetes edge-routing designs, Gateway API is the direction Kubernetes recommends over Ingress. (Kubernetes)

Here’s a concrete example.

Example app

Imagine an e-commerce app running on Kubernetes:

web-frontend
product-api
cart-api
checkout-api
payment-service
user-service

Customers come from the internet. The services call each other inside the cluster.

With Ingress only

Traffic flow:

Internet → Ingress controller → Kubernetes Service → Pods

Example:

shop.example.com goes to web-frontend
shop.example.com/api/* goes to product-api

What Ingress is doing here:

expose the app publicly
terminate TLS
route by host/path
maybe do some basic load balancing

So a request might go:

User opens https://shop.example.com
Ingress sends / to web-frontend
web-frontend calls cart-api
cart-api calls user-service
checkout-api calls payment-service

The key point: Ingress mostly helps with step 1, the outside-in entry point. It does not, by itself, give you rich control/security/telemetry for steps 3–5. Ingress is for external access, and the Kubernetes project notes the API is stable but frozen, with Gateway API recommended for newer traffic-management work. (Kubernetes)

With Ingress + service mesh

Now add a mesh like Istio.

Traffic flow becomes:

Internet → Ingress/Gateway → web-frontend → mesh-controlled service-to-service traffic

Now you still have an entry point, but inside the cluster the mesh handles communication between services.

What the mesh adds:

mTLS between services
retries/timeouts
canary routing
traffic splitting
telemetry/tracing
authz policies between workloads

Example:

checkout-api sends 95% of traffic to payment-service v1 and 5% to payment-service v2
calls from cart-api to user-service get a 2-second timeout and one retry
only checkout-api is allowed to call payment-service
all service-to-service traffic is encrypted with mutual TLS

Those are standard service-mesh capabilities described in Istio’s traffic-management and security docs. (Istio)

Simple diagram

Ingress only

			
[User on Internet]
        |
        v
   [Ingress]
        |
        v

		

[web-frontend]

| v [product-api] -> [cart-api] -> [checkout-api] -> [payment-service]

Ingress + service mesh

			
[User on Internet]
        |
        v
 [Ingress / Gateway]
        |
        v

		

[web-frontend]

| v ————————————————- |

Service Mesh inside cluster | | [product-api] <-> [cart-api] <-> [checkout-api]| | \ | / | | \ v / | | ——> [user-service] | | [payment-service] | | | | mTLS, retries, tracing, canaries, policy | ————————————————-

Real-world way teams choose

Use just Ingress when:

you have a small app
you mostly need public routing
internal service communication is simple
you do not need per-service security/policy

Add a service mesh when:

you have many microservices
debugging internal calls is hard
you need zero-trust service identity
you do canaries/traffic shaping often
you want consistent retries/timeouts/policies

One important 2026 note

For brand-new Kubernetes edge-routing setups, many teams are moving toward Gateway API instead of classic Ingress. Kubernetes recommends Gateway over Ingress for future-facing work, and Istio also supports Gateway API for traffic management. (Kubernetes)

Rule of thumb

Ingress/Gateway API: “How does traffic get into my cluster?”
Service mesh: “How do services inside my platform talk securely and reliably?”

Kubernetes networking

March 27, 2026 techhadoop ai, cloud, devops, kubernetes, technology

Kubernetes networking is often considered the most complex part of the system because it operates on a “flat network” model. In a traditional setup, you might worry about port conflicts or how to reach a specific VM. In Kubernetes, every Pod gets its own unique IP address, and every Pod can talk to every other Pod without NAT (Network Address Translation).

To understand how traffic flows, we break it down into four distinct “layers” of communication.

1. Pod-to-Pod Communication

Every Pod in a cluster has its own internal IP. Kubernetes mandates that Pods on one node must be able to reach Pods on another node without any special configuration.

The Container Network Interface (CNI): This is the plugin (like Calico, Cilium, or OpenShift SDN) that actually builds the “pipes” between nodes.
The Experience: From the perspective of a container, it feels like it’s on a standard Ethernet network. It doesn’t care if the target Pod is on the same physical server or one across the data center.

2. Pod-to-Service Communication

Pods are “ephemeral”—they die and get replaced constantly, and their IP addresses change every time. You can’t hardcode a Pod IP into your app.

The Service: A Service is a stable “virtual IP” (ClusterIP) that sits in front of a group of Pods.
Kube-Proxy: This is a process running on every node that watches the API server. When you try to hit a Service IP, kube-proxy intercepts that traffic and redirects it to one of the healthy backend Pods.
CoreDNS: Kubernetes includes a built-in DNS service. Instead of an IP, your app just connects to http://my-database-service.

3. External-to-Service (Ingress & Egress)

How do people outside the cluster reach your apps?

NodePort: Opens a specific port (30000–32767) on every single node. High maintenance and insecure for production.
LoadBalancer: Tells your cloud provider (AWS/Azure/GCP) to spin up a physical Load Balancer that points to your nodes.
Ingress/Route: As we discussed earlier, this is a Layer 7 “smart router” that lets you use hostnames (like api.example.com) to direct traffic to different services using a single entry point.

4. Network Policies (The Firewall)

By default, Kubernetes is “open”—any Pod can talk to any other Pod. In production, you usually want to restrict this for security.

NetworkPolicy: These are the firewall rules of Kubernetes. You can define rules like: “Only the ‘Frontend’ Pod is allowed to talk to the ‘Database’ Pod on port 5432.”
Implementation: These rules are enforced by the CNI (e.g., Calico or Cilium) at the packet level.

Summary of Components

Component	Level	Purpose
CNI	Layer 3 (IP)	Connects Pods across different nodes.
Service	Layer 4 (TCP/UDP)	Provides a stable IP/DNS for a group of Pods.
Ingress	Layer 7 (HTTP)	Handles routing, SSL, and hostnames.
NetworkPolicy	Security	Defines which Pods are allowed to talk to each other.

The Path of a Packet

If a user types example.com into their browser:

DNS resolves the name to the Ingress Controller’s IP.
The Ingress Controller looks at the host header and finds the correct Service.
The Service (via kube-proxy) picks a healthy Pod and sends the traffic there.
The CNI routes that packet across the internal network to the node where the Pod is living.

To understand how data actually moves between nodes, we have to look at the CNI (Container Network Interface).

Since every Pod has a unique IP but lives on a host (Node) with its own separate IP, the CNI’s job is to “tunnel” the Pod’s traffic across the physical network. The two most common ways it does this are Encapsulation (Overlay) and Direct Routing.

1. Encapsulation (The “Envelope” Method)

This is the most common approach (used by Flannel (VXLAN) and OpenShift SDN). It treats the physical network as a “carrier” for a private, virtual network.

How it works: When Pod A (on Node 1) sends a packet to Pod B (on Node 2), the CNI takes that entire packet and wraps it inside a new UDP packet.
The “Outer” Header: Points from Node 1’s IP to Node 2’s IP.
The “Inner” Header: Points from Pod A’s IP to Pod B’s IP.
Arrival: When the packet hits Node 2, the CNI “unwraps” the outer envelope and delivers the original inner packet to Pod B.

The Downside: This adds a small amount of overhead (usually about 50 bytes per packet) because of the extra headers. This is why you often see the MTU (Maximum Transmission Unit) set slightly lower in Kubernetes (e.g., 1450 instead of 1500).

2. Direct Routing (The “BGP” Method)

Used by Calico (in non-overlay mode) and Cilium, this method avoids the “envelope” entirely for better performance.

How it works: The nodes act like standard network routers. They use BGP (Border Gateway Protocol) to tell each other: “Hey, if you want to reach the 10.244.1.0/24 subnet, send those packets to me (Node 1).”
The Experience: Packets travel “naked” across the wire with no extra headers.
The Requirement: Your physical network routers must be able to handle these extra routes, or the nodes must all be on the same Layer 2 segment (the same VLAN/Switch).

3. The Role of the “veth” Pair

Regardless of how the data moves between nodes, getting data out of a container uses a Virtual Ethernet (veth) pair.

Think of a veth pair as a virtual patch cable:

One end is plugged into the container (usually named eth0).
The other end is plugged into the host’s network namespace (often named something like vethabc123).
The host end is usually connected to a Bridge (like cni0 or br0), which acts like a virtual switch for all Pods on that specific node.

4. Comparing Popular CNIs

CNI Plugin	Primary Method	Best Use Case
Flannel	VXLAN (Overlay)	Simple clusters; works almost anywhere.
Calico	BGP or IP-in-IP	High performance and advanced Network Policies.
Cilium	eBPF	Next-gen performance, deep security, and observability.
OpenShift SDN	VXLAN	Default for older OCP; very stable and integrated.
OVN-Kubernetes	Geneve (Overlay)	Modern OCP default; supports massive scale and Windows nodes.

Which one should you choose?

If you are on OpenShift 4.x, you are likely using OVN-Kubernetes. It’s powerful and handles complex routing for you.
If you are building a Vanilla K8s cluster and want the absolute fastest networking, Cilium is the current industry gold standard because it uses eBPF to bypass parts of the Linux kernel entirely.

To understand why eBPF (Extended Berkeley Packet Filter) is the “holy grail” of modern Kubernetes networking, we first have to look at how the “old way” (iptables) works.

1. The Old Way: iptables (The “List of Rules”)

For years, Kubernetes used iptables (a standard Linux kernel feature) to route traffic.

How it works: Imagine a giant printed list of instructions. Every time a packet arrives, the CPU has to read the list from top to bottom: “Is it for Service A? No. Service B? No. Service C? Yes.”
The Problem: As you add more Services and Pods, this list grows to thousands of lines. If a packet is destined for the 5,000th service on the list, the CPU has to perform 5,000 checks for every single packet.
Result: High latency and high CPU “overhead” just to move data.

2. The New Way: eBPF (The “Direct Shortcut”)

eBPF allows you to run small, sandboxed programs directly inside the Linux kernel without changing the kernel code.

How it works: Instead of a long list of rules, eBPF creates a high-speed “lookup table” (a Hash Map) in the kernel’s memory.
The Benefit: When a packet arrives, the eBPF program looks at the destination and instantly knows where it goes. It doesn’t matter if you have 10 services or 10,000—the lookup time is exactly the same (O(1) complexity).
Bypassing the Stack: eBPF can catch a packet the moment it hits the Network Interface Card (NIC) and send it straight to the Pod, bypassing almost the entire Linux networking stack.

3. Why Cilium + eBPF is a Game Changer

Cilium is the most popular CNI that uses eBPF. It provides three massive advantages over traditional networking:

Feature	iptables / Standard CNI	Cilium (eBPF)
Performance	Slows down as the cluster grows.	Consistently fast at any scale.
Observability	Hard to see “who is talking to who” without sidecars.	Hubble (Cilium’s UI) shows every flow, drop, and latency in real-time.
Security	IP-based filtering (hard to manage).	Identity-based filtering. It knows a packet belongs to “Service-Frontend” regardless of its IP.

4. Why OpenShift is Moving to OVN (Geneve)

While Cilium is the “shiny new toy,” Red Hat chose OVN-Kubernetes (using the Geneve protocol) as the default for OCP 4.

Scale: OVN is built on Open vSwitch, which was designed for massive telco-grade clouds.
Feature Parity: It handles complex things like “Egress IPs” (giving a specific namespace a static IP for exiting the cluster) and Hybrid networking (Linux + Windows nodes) much more maturely than basic eBPF implementations did a few years ago.

Summary: The “Speed” Evolution

iptables: Standard, but slow at scale.
IPVS: A middle ground that uses hash tables but is still tied to the old kernel stack.
eBPF (Cilium): The fastest possible way to move data in Linux today.

In OpenShift, the modern way to see these network flows is through the Network Observability Operator. This tool uses the eBPF technology we discussed to capture traffic data without slowing down your pods.

Here is how you can access and use these views.

1. Using the Web Console (The GUI Way)

Once the operator is installed, a new menu appears in your OpenShift Console.

Navigate to Observe -> Network Traffic in the Administrator perspective.
Overview Tab: This gives you a high-level “Sankey” diagram or graph showing which namespaces are talking to each other. It’s perfect for spotting “top talkers” (apps using the most bandwidth).
Traffic Flows Tab: This is like a “Wireshark-lite” for your cluster. You can see every individual connection, including:
- Source/Destination: Which pod is talking to which service.
- Byte Rate: How much data is moving.
- RTT (Latency): Exactly how many milliseconds a packet takes to travel between pods.
Topology Tab: This provides a visual map of your network. You can group by “Namespace” or “Node” to see how traffic crosses physical boundaries.

2. Using the CLI (The “oc netobserv” plugin)

If you prefer the terminal, there is a specific plugin called oc netobserv. This is incredibly useful for live debugging when you don’t want to leave your shell.

Capture live flows:

Bash

oc netobserv flows --protocol=TCP --port=80

This will stream live traffic data directly to your terminal.

Filter for specific issues:

You can filter by namespace or even look for dropped packets (great for debugging firewall/NetworkPolicy issues):

Bash

oc netobserv flows --namespace=my-app --action=Drop

3. The “Loki” Backend

Behind the scenes, the Network Observability Operator stores these flows in Loki (a log aggregation system). This allows you to “go back in time.”

If an application crashed at 2:00 AM, you can go to the Network Traffic page, set the time filter to 2:00 AM, and see if there was a sudden spike in traffic or if a connection was being blocked by a security policy at that exact moment.

4. Advanced Debugging: Packet Drops

One of the best features of the eBPF-based observer is Packet Drop tracking. Traditional tools often can’t tell you why a packet disappeared. With this tool, the kernel can report the exact reason:

Filter Drop: A NetworkPolicy blocked it.
TCP Timeout: The other side didn’t respond.
Congestion: The network interface was overloaded.

Summary: What can you find?

Security: Is my database receiving traffic from an unauthorized namespace?
Performance: Which pods have the highest latency (RTT)?
Cost: Which services are sending the most data to external (Internet) IPs?

In Kubernetes, a NetworkPolicy is your cluster’s internal firewall. By default, Kubernetes has a “non-isolated” policy—meaning every pod can talk to every other pod.

To secure your app, you should follow the “Principle of Least Privilege”: block everything, then specifically allow only what is necessary.

1. The “Default Deny” (The Foundation)

Before you write specific rules, it is a best practice to create a “Default Deny” policy for your namespace. This locks all doors so that nothing can enter or leave unless you explicitly say so.

YAML

			
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: my-secure-app
spec:
  podSelector: {} # Matches all pods in this namespace
  policyTypes:
  - Ingress
  - Egress

		

2. Allowing Specific Traffic (The “Rule”)

Now that everything is blocked, let’s say you have a Database pod and you only want your Frontend pod to talk to it on port 5432.

YAML

			
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-frontend-to-db
  namespace: my-secure-app
spec:
  podSelector:
    matchLabels:
      app: database # This policy applies to pods labeled 'app: database'
  policyTypes:
  - Ingress
  parent:
  ingress:
  - from:
    - podSelector:
        matchLabels:
          role: frontend # Only allow pods labeled 'role: frontend'
    ports:
    - protocol: TCP
      port: 5432

		

3. Three Ways to Target Traffic

You can control traffic based on three different criteria:

podSelector: Target pods within the same namespace (e.g., “Frontend to Backend”).
namespaceSelector: Target entire namespaces (e.g., “Allow everything from the ‘Monitoring’ namespace”).
ipBlock: Target specific IP ranges outside the cluster (e.g., “Allow traffic from our corporate VPN range 10.0.0.0/24”).

4. Troubleshooting NetworkPolicies

If you apply a policy and your app stops working, here is how to debug:

Check Labels: NetworkPolicies rely 100% on labels. If your Frontend pod is labeled app: front-end but your policy looks for role: frontend, it will fail silently.
The “Blind” Policy: Standard Kubernetes doesn’t “log” when a policy blocks a packet. This is why we use the Network Observability Operator (as we discussed) to see the “Drop” events.
CNI Support: Remember, the CNI (Calico, OVN, etc.) is what actually enforces these rules. If your CNI doesn’t support NetworkPolicies (like basic Flannel), the YAML will be accepted but it won’t actually block anything!

Summary: Ingress vs. Egress

Ingress: Controls traffic coming into the pod (Who can talk to me?).
Egress: Controls traffic leaving the pod (Who can I talk to?).

A Zero Trust architecture in Kubernetes means that no pod is trusted by default. Even if a pod is inside your cluster, it shouldn’t be allowed to talk to anything else unless you specifically permit it.

In this scenario, we have a 3-tier app: Frontend, Backend, and Database.

1. The “Lockdown” (Default Deny)

First, we apply this to the entire namespace. This ensures that any new pod you deploy in the future is “secure by default” and cannot communicate until you add a rule for it.

YAML

			
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: my-app-stack
spec:
  podSelector: {} # Matches ALL pods
  policyTypes:
  - Ingress
  - Egress

		

2. Tier 1: The Frontend

The Frontend needs to receive traffic from the Internet (via the Ingress Controller) and send traffic only to the Backend.

YAML

			
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: frontend-policy
  namespace: my-app-stack
spec:
  podSelector:
    matchLabels:
      tier: frontend
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          network.openshift.io/policy-group: ingress # Allows OpenShift Ingress Controller
  egress:
  - to:
    - podSelector:
        matchLabels:
          tier: backend # ONLY allowed to talk to Backend

		

3. Tier 2: The Backend

The Backend should only accept traffic from the Frontend and is only allowed to talk to the Database.

YAML

			
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: backend-policy
  namespace: my-app-stack
spec:
  podSelector:
    matchLabels:
      tier: backend
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          tier: frontend # ONLY accepts Frontend traffic
  egress:
  - to:
    - podSelector:
        matchLabels:
          tier: database # ONLY allowed to talk to DB

		

4. Tier 3: The Database

The Database is the most sensitive. It should never initiate a connection (no Egress) and only accept traffic from the Backend.

YAML

			
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: database-policy
  namespace: my-app-stack
spec:
  podSelector:
    matchLabels:
      tier: database
  policyTypes:
  - Ingress
  - Egress # We include Egress to ensure it's blocked by default
  ingress:
  - from:
    - podSelector:
        matchLabels:
          tier: backend
    ports:
    - protocol: TCP
      port: 5432 # Postgres port

		

Important: Don’t Forget DNS!

When you apply a “Default Deny” Egress policy, your pods can no longer talk to CoreDNS, which means they can’t resolve service names like http://backend-service.

To fix this, you must add one more policy to allow UDP Port 53 to the openshift-dns namespace:

YAML

  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: openshift-dns
    ports:
    - protocol: UDP
      port: 53

Summary of the Strategy

Labels are everything: If you typo tier: backend as tier: back-end, the wall stays up and the app breaks.
Layered Security: Even if a hacker compromises your Frontend pod, they cannot “scan” your network or reach your Database directly; they are stuck only being able to talk to the Backend.

To test your Zero Trust setup, we will perform a “Penetration Test” from inside the cluster. We’ll use a temporary debug pod to see if our firewall rules actually block unauthorized access.

1. The “Attacker” Pod

We will spin up a temporary pod with networking tools (like curl and dig) that has no labels. Since our “Default Deny” policy targets all pods, this pod should be completely isolated the moment it starts.

Bash

			
# Run a temporary alpine pod
oc run network-tester --image=alpine --restart=Never -- /bin/sh -c "sleep 3600"

2. Test 1: Can an unknown pod talk to the Database?

Even if this pod is in the same namespace, it should not be able to reach the database because it doesn’t have the tier: backend label.

Bash

			
# Try to connect to the database on port 5432
oc exec network-tester -- nc -zv database-service 5432

Expected Result: Connection timed out (The packet is dropped by the NetworkPolicy).

3. Test 2: Can the Frontend “Scan” the Database?

Now, let’s pretend a hacker compromised your Frontend pod. Can they bypass the Backend and talk directly to the Database?

Bash

			
# Exec into your existing Frontend pod
oc exec deployment/frontend -- nc -zv database-service 5432

Expected Result: Connection timed out.

Even though the Frontend is “trusted” to talk to the Backend, it is not trusted to talk to the Database.

4. Test 3: Can the Database reach the Internet? (Egress Test)

A common hacker tactic is to steal data and send it to an external server (Exfiltration). Let’s see if the Database can “phone home.”

Bash

			
# Try to ping Google from the Database pod
oc exec deployment/database -- curl -I google.com

Expected Result: Could not resolve host or Timeout.

Since we didn’t add an Egress rule for the Database, it is physically unable to send data out of its own pod.

5. How to see the “Deny” in real-time

If you want to prove the policy is working without just guessing based on timeouts, use the Network Observability tool we set up earlier:

Go to Observe -> Network Traffic.
Filter by Action: Deny or Action: Drop.
You will see a red entry showing:
- Source: network-tester
- Destination: database
- Reason: NetworkPolicy

6. Cleaning Up

When you’re done testing, don’t forget to remove the tester pod:

Bash

oc delete pod network-tester

Summary of the Lab

Test	Source	Target	Status	Why?
Unauthorized	Debug Pod	Database	BLOCKED	Missing `tier: backend` label.
Lateral Movement	Frontend	Database	BLOCKED	Policy only allows Frontend -> Backend.
Exfiltration	Database	Internet	BLOCKED	No Egress rules defined for DB.

Ingress

March 27, 2026March 27, 2026 techhadoop azure, cloud, devops, kubernetes, technology

In Kubernetes, Ingress is an API object that acts as a “smart router” for your cluster. While a standard Service (like a LoadBalancer) simply opens a hole in the firewall for one specific app, Ingress allows you to consolidate many services behind a single entry point and route traffic based on the URL or path.

Think of it as the receptionist of an office building: instead of every employee having their own front door, everyone uses one main entrance, and the receptionist directs visitors to the correct room based on who they are looking for.

1. How Ingress Works

There are two distinct parts required to make this work:

Ingress Resource: A YAML file where you define your “rules” (e.g., “Send all traffic for myapp.com/api to the api-service“).
Ingress Controller: The actual software (like NGINX, HAProxy, or Traefik) that sits at the edge of your cluster, reads those rules, and physically moves the traffic. Kubernetes does not come with a controller by default; you must install one.

2. Key Capabilities

Ingress is much more powerful than a simple Port or LoadBalancer because it operates at Layer 7 (HTTP/HTTPS).

Host-based Routing: Route blue.example.com to the Blue Service and green.example.com to the Green Service using a single IP.
Path-based Routing: Route example.com/login to the Auth service and example.com/search to the Search service.
SSL/TLS Termination: You can handle your SSL certificates at the Ingress level so your individual application pods don’t have to deal with encryption/decryption.
Name-based Virtual Hosting: Supporting multiple domain names on the same IP address.

3. Ingress vs. LoadBalancer vs. NodePort

Choosing how to expose your app is a common point of confusion. Here is the breakdown:

Method	Best For	Pros/Cons
NodePort	Testing/Dev	Opens a high-range port (30000+) on every node. Very insecure for production.
LoadBalancer	Single Apps	Standard way to get an external IP. Expensive if you have 50 apps, as you pay for 50 cloud load balancers.
Ingress	Production	Multiple apps share one Load Balancer. Extremely cost-effective and feature-rich.

4. What about OpenShift “Routes”?

Since you were asking about OCP earlier, it’s important to note that OpenShift has its own version called a Route.

Routes were created by Red Hat before the Kubernetes Ingress standard existed.
In modern OpenShift, an Ingress is automatically converted into a Route behind the scenes.
The main difference: Routes support “F5” and more complex HAProxy features natively, while Ingress is the industry-standard “portable” way to do it.

5. Basic Ingress Example

This YAML routes traffic to two different services based on the URL path:

YAML

			
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: main-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  rules:
  - host: myapp.example.com
    http:
      paths:
      - path: /billing
        pathType: Prefix
        backend:
          service:
            name: billing-service
            port:
              number: 80
      - path: /inventory
        pathType: Prefix
        backend:
          service:
            name: inventory-service
            port:
              number: 80

		

To set up HTTPS for an Ingress, you need a TLS Secret. This secret acts as a container for your SSL certificate and private key, which the Ingress Controller (like NGINX or HAProxy) uses to encrypt the traffic before it reaches your pods.

1. Create the TLS Secret

First, you need your certificate files (tls.crt and tls.key). You can create a Kubernetes Secret from these files using the CLI:

Bash

			
kubectl create secret tls my-app-tls \
  --cert=path/to/tls.crt \
  --key=path/to/tls.key

Note: If you are using OpenShift, you can often skip this manual step by using Service Serving Certificates, which allow OpenShift to generate and manage the certificates for you automatically.

2. Update the Ingress YAML

Now, you tell the Ingress resource to use that secret for a specific hostname.

YAML

			
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: secure-ingress
  annotations:
    # This annotation tells the controller to redirect HTTP to HTTPS
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
  tls:
  - hosts:
      - myapp.example.com
    secretName: my-app-tls # This must match the secret name created in Step 1
  rules:
  - host: myapp.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: my-web-service
            port:
              number: 80

		

3. Automation with Cert-Manager

Manually updating certificates before they expire is a headache. Most production clusters use Cert-Manager.

Cert-Manager is an operator that talks to certificate authorities like Let’s Encrypt. You simply add an annotation to your Ingress, and Cert-Manager handles the rest:

The “Magic” Annotation:

YAML

			
metadata:
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt-prod"

Once you add this, Cert-Manager will:

See the Ingress request.
Reach out to Let’s Encrypt to verify you own the domain.
Generate the tls.crt and tls.key.
Create the Secret for you and renew it every 90 days automatically.

Summary Checklist for HTTPS

Step	Action
1. Certificate	Obtain a CA-signed cert or use Let’s Encrypt.
2. Secret	Store the cert/key in a `kind: Secret` (type `kubernetes.io/tls`).
3. Ingress Spec	Add the `tls:` section to your Ingress YAML.
4. DNS	Ensure your domain points to the Ingress Controller’s IP.

To automate SSL certificates with Cert-Manager, you need a ClusterIssuer. This is a cluster-wide resource that tells Cert-Manager how to talk to a Certificate Authority (CA) like Let’s Encrypt.

Before you start, ensure the Cert-Manager Operator is installed in your cluster (in OpenShift, you can find this in the OperatorHub).

1. Create a ClusterIssuer (The “Account”)

This YAML defines your identity with Let’s Encrypt. It uses the ACME (Automated Certificate Management Environment) protocol.

YAML

			
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    # The ACME server address for Let's Encrypt production
    server: https://acme-v02.api.letsencrypt.org/directory
    # Your email address for expiration notices
    email: admin@yourdomain.com
    # Name of a secret used to store the ACME account private key
    privateKeySecretRef:
      name: letsencrypt-prod-account-key
    # Enable the HTTP-01 challenge provider
    solvers:
    - http01:
        ingress:
          class: nginx # Or 'openshift-default' depending on your ingress controller

		

2. Update your Ingress to “Request” the Cert

Once the ClusterIssuer is created, you don’t need to manually create secrets anymore. You just “tag” your Ingress with an annotation. Cert-Manager will see this, perform the challenge, and create the secret for you.

YAML

			
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: my-secure-app
  annotations:
    # THIS IS THE TRIGGER: It links the Ingress to your ClusterIssuer
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
  tls:
  - hosts:
    - app.yourdomain.com
    secretName: app-tls-cert # Cert-Manager will create this secret automatically
  rules:
  - host: app.yourdomain.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: my-service
            port:
              number: 80

		

3. How to verify it’s working

After you apply the Ingress, Cert-Manager creates a Certificate object and a Challenge object. You can track the progress:

Check the certificate status:kubectl get certificate(Look for READY: True)
Check the order status (if it’s stuck):kubectl get challenges
Check the secret:kubectl get secret app-tls-cert(If this exists, your site is now HTTPS!)

Why use Let’s Encrypt?

Cost: It is 100% free.
Trust: It is recognized by all major browsers (unlike self-signed certs).
No Maintenance: Cert-Manager automatically renews the cert 30 days before it expires.

A Small Warning:

Let’s Encrypt has rate limits. If you are just testing, use the “Staging” URL (https://acme-staging-v02.api.letsencrypt.org/directory) first. Browsers will show a warning for staging certs, but you won’t get blocked for hitting limit thresholds while debugging.

When Cert-Manager fails to issue a certificate, it usually gets stuck in the Challenge phase. Let’s look at how to diagnose and fix the most common “Let’s Encrypt” roadblocks.

1. The Troubleshooting Command Chain

If your certificate isn’t appearing, follow this hierarchy to find where the “handshake” broke:

Check the Certificate status:oc get certificateIf READY is False, move to the next step.
Check the Order:oc get orderThe Order is the request sent to Let’s Encrypt. Look at the STATE column.
Check the Challenge (The most important step):oc get challengesIf a challenge exists, it means Let’s Encrypt is trying to verify your domain but can’t.
Describe the Challenge for the error message:oc describe challenge <challenge-name>

2. Common Failure Reasons

A. The “I Can’t See You” (Firewall/Network)

Let’s Encrypt uses the HTTP-01 challenge. It tries to reach http://yourdomain.com/.well-known/acme-challenge/<TOKEN>.

The Problem: Your firewall, Security Group (AWS/Azure), or OpenShift Ingress Controller is blocking Port 80.
The Fix: Ensure Port 80 is open to the public internet. Let’s Encrypt cannot verify your domain over Port 443 (HTTPS) because the certificate doesn’t exist yet!

B. DNS Record Mismatch

The Problem: Your DNS A record or CNAME for app.yourdomain.com hasn’t propagated yet or is pointing to the wrong Load Balancer IP.
The Fix: Use dig app.yourdomain.com or nslookup to ensure the domain points exactly to your Ingress Controller’s external IP.

C. Rate Limiting

The Problem: You’ve tried to issue the same certificate too many times in one week (Let’s Encrypt has a limit of 5 duplicate certs per week).
The Fix: Switch your ClusterIssuer to use the Staging URL (mentioned in the previous step) until your configuration is 100% correct, then switch back to Production.

3. Dealing with Internal/Private Clusters

If your OpenShift cluster is behind a VPN and not accessible from the public internet, the HTTP-01 challenge will always fail because Let’s Encrypt can’t “see” your pods.

The Solution: DNS-01 Challenge

Instead of a web check, Cert-Manager proves ownership by adding a temporary TXT record to your DNS provider (Route53, Cloudflare, Azure DNS).

Example DNS-01 Issuer (Route53):

YAML

			
spec:
  acme:
    solvers:
    - dns01:
        aws-route53:
          region: us-east-1
          hostedZoneID: Z123456789

		

Summary Checklist

Is Port 80 open?
Does DNS point to the cluster?
Are you hitting Rate Limits?
Is your Ingress Class correct in the Issuer?

Persistent Volumes (PV) and Persistent Volume Claims (PVC)

March 27, 2026March 27, 2026 techhadoop ai, cloud, devops, kubernetes, technology

In Kubernetes, storage is handled separately from your application’s logic. To understand Persistent Volumes (PV) and Persistent Volume Claims (PVC), it helps to use the “Electricity” analogy:

PV (The Infrastructure): This is like the power plant and the grid. It’s the actual physical storage (a disk, a cloud drive, or a network share).
PVC (The Request): This is like the power outlet in your wall. Your application “plugs in” to the PVC to get what it needs without needing to know where the power plant is.

1. Persistent Volume (PV)

A PV is a piece of storage in the cluster that has been provisioned by an administrator or by a storage class. It is a cluster-level resource (like a Node) and exists independently of any individual Pod.

Capacity: How much space is available (e.g., 5Gi, 100Gi).
Access Modes: * ReadWriteOnce (RWO): Can be mounted by one node at a time.
- ReadOnlyMany (ROX): Many nodes can read it simultaneously.
- ReadWriteMany (RWX): Many nodes can read and write at the same time (requires specific hardware like NFS or ODF).
Reclaim Policy: What happens to the data when you delete the PVC? (Retain it for manual cleanup or Delete it immediately).

2. Persistent Volume Claim (PVC)

A PVC is a request for storage by a user. If a Pod needs a “hard drive,” it doesn’t look for a specific disk; it creates a PVC asking for “10Gi of storage with ReadWriteOnce access.”

The “Binding” Process: Kubernetes looks at all available PVs. If it finds a PV that matches the PVC’s request, it “binds” them together.
Namespace Scoped: Unlike PVs, PVCs live inside a specific Namespace.

3. Dynamic Provisioning (StorageClasses)

In modern clusters (like OpenShift), admins don’t manually create 100 different PVs. Instead, they use a StorageClass.

The user creates a PVC.
The StorageClass notices the request.
It automatically talks to the cloud provider (AWS/Azure/GCP) to create a new disk.
It automatically creates the PV and binds it to the PVC.

4. How a Pod uses it

Once the PVC is bound to a PV, you tell your Pod to use that “outlet.”

YAML

			
spec:
  containers:
    - name: my-db
      image: postgres
      volumeMounts:
      - mountPath: "/var/lib/postgresql/data"
        name: my-storage
  volumes:
    - name: my-storage
      persistentVolumeClaim:
        claimName: task-pv-claim # This matches the name of your PVC

		

Summary Comparison

Feature	Persistent Volume (PV)	Persistent Volume Claim (PVC)
Who creates it?	Administrator or Storage System	Developer / Application
Scope	Cluster-wide	Namespace-specific
Analogy	The actual Hard Drive	The request for a Hard Drive
Lifecycle	Exists even if no one uses it	Tied to the application’s needs

Here is a standard YAML example for a Persistent Volume Claim (PVC).

In this scenario, we aren’t manually creating a disk. Instead, we are telling OpenShift/Kubernetes: “I need 10Gi of fast storage. Please go talk to the cloud provider or storage backend and create it for me.”

1. The PVC Definition

This is the “request” for storage.

YAML

			
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: dynamic-pvc-example
  namespace: my-app-project
spec:
  storageClassName: gp3-csi # Or 'thin', 'ocs-storagecluster-ceph-rbd', etc.
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi

		

2. How the Binding Works

When you apply this YAML, the following chain reaction happens:

The Claim: You submit the PVC.
The Provisioner: The StorageClass (e.g., AWS EBS, Azure Disk, or OpenShift Data Foundation) sees the request.
The Asset: The storage backend creates a physical 10Gi volume.
The Volume: Kubernetes automatically creates a PersistentVolume (PV) object to represent that physical disk.
The Binding: The PVC status changes from Pending to Bound.

3. Attaching the PVC to a Pod

A PVC is useless until a Pod “claims” it. Here is how you mount that 10Gi disk into a container:

YAML

			
apiVersion: v1
kind: Pod
metadata:
  name: storage-test-pod
spec:
  containers:
  - name: nginx
    image: nginx
    volumeMounts:
    - name: my-data-volume
      mountPath: /usr/share/nginx/html # Where the disk appears inside the container
  volumes:
  - name: my-data-volume
    persistentVolumeClaim:
      claimName: dynamic-pvc-example # Must match the name in the PVC YAML

		

Important “Gotchas” with PVCs

Access Modes: * ReadWriteOnce (RWO): Most common. If Pod A is using the disk on Node 1, Pod B cannot use it if Pod B is on Node 2.
- ReadWriteMany (RWX): Required if you want multiple Pods across different nodes to share the same files (common for web servers sharing a shared uploads folder).
Expansion: Many modern StorageClasses allow you to increase the storage size in the PVC YAML after it’s created, and Kubernetes will expand the disk on the fly (provided the underlying storage supports it).
Sticky Nodes: If you use a cloud-based RWO disk (like AWS EBS), your Pod becomes “stuck” to the availability zone where that disk was created.

Checking for available StorageClasses is one of the most common tasks for an OpenShift administrator or developer. It tells you exactly what “flavors” of storage are available for your apps.

1. Using the CLI (Recommended)

Run the following command to see a list of all storage providers configured in your cluster:

Bash

oc get storageclass

(Or use the shorthand: oc get sc)

Example Output:

Plaintext

			
NAME                 PROVISIONER                RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
gp3-csi (default)    ebs.csi.aws.com            Delete          WaitForFirstConsumer   true                   45d
thin                 kubernetes.io/vsphere      Delete          Immediate              false                  102d
ocs-storage-rbd      openshift-storage.rbd...   Delete          Immediate              true                   12d

2. How to Read the Results

NAME: This is what you put in the storageClassName field of your PVC YAML.
(default): If you see this next to a name, it means any PVC that doesn’t specify a class will automatically get this one.
PROVISIONER: This tells you the underlying technology (e.g., AWS EBS, VMware vSphere, or Ceph/ODF).
RECLAIMPOLICY: * Delete: When you delete the PVC, the physical disk is also deleted.
- Retain: When you delete the PVC, the physical disk stays (so you can recover data manually).
VOLUMEBINDINGMODE:
- Immediate: The disk is created the second you create the PVC.
- WaitForFirstConsumer: The disk isn’t created until a Pod actually tries to use it. This is smarter because it ensures the disk is created in the same Availability Zone as the Pod.

3. Inspecting a Specific StorageClass

If you want to see the “fine print” (like encryption settings or IOPS), you can look at the YAML of a specific class:

Bash

oc get sc gp3-csi -o yaml

4. Using the Web Console

If you prefer the GUI:

Log in to the OpenShift Web Console.
Ensure you are in the Administrator perspective.
Navigate to Storage -> StorageClasses.
Here, you can see all classes, which one is the default, and even create new ones using a wizard.

Pro-Tip: If your cluster has no default storage class, your PVCs will stay in a Pending state forever unless you explicitly name one in your YAML.