Azure data ecosystem

In the Azure data ecosystem, these four services form the “Modern Data Stack.” They work together to move, store, process, and serve data. If you think of your data as water, this ecosystem is the plumbing, the reservoir, the filtration plant, and the tap.


1. ADLS Gen2 (The Reservoir)

Azure Data Lake Storage Gen2 is the foundation. It is a highly scalable, cost-effective storage space where you keep all your data—structured (tables), semi-structured (JSON/Logs), and unstructured (PDFs/Images).

  • Role: The single source of truth (Data Lake).
  • Key Feature: Hierarchical Namespace. Unlike standard “flat” cloud storage, it allows for folders and subfolders, which makes data access much faster for big data analytics.
  • 2026 Context: It serves as the “Bronze” (Raw) and “Silver” (Filtered) layers in a Medallion Architecture.

2. ADF (The Plumbing & Orchestrator)

Azure Data Factory is the glue. It doesn’t “own” the data; it moves it from point A to point B and tells other services when to start working.

  • Role: ETL/ELT Orchestration. It pulls data from on-premises servers or APIs and drops it into ADLS.
  • Key Feature: Low-code UI. You build “Pipelines” using a drag-and-drop interface.
  • Integration: It often has a “trigger” that tells Databricks: “I just finished moving the raw files to ADLS, now go clean them.”

3. Azure Databricks (The Filtration Plant)

Azure Databricks is where the heavy lifting happens. It is an Apache Spark-based platform used for massive-scale data processing, data science, and machine learning.

  • Role: Transformation & Analytics. It takes the messy data from ADLS and turns it into clean, aggregated “Gold” data.
  • Key Feature: Notebooks. Engineers write code (Python, SQL, Scala) in a collaborative environment.
  • 2026 Context: It is the primary engine for Vectorization in RAG systems—turning your internal documents into mathematical vectors for AI Search.

4. Azure SQL (The Tap)

Azure SQL Database (or Azure Synapse) is the final destination for business users. While ADLS is great for “big data,” it’s not the best for a quick dashboard or a mobile app.

  • Role: Data Serving. It stores the final, “Gold” level data that has been cleaned and structured.
  • Key Feature: High Performance for Queries. It is optimized for Power BI reports and standard business applications.
  • Usage: After Databricks cleans the data, it saves the final results into Azure SQL so the CEO can see a dashboard the next morning.

How they work together (The Flow)

StepServiceAction
1. IngestADFCopies logs from an on-prem server to the cloud.
2. StoreADLSHolds the raw .csv files in a “Raw” folder.
3. ProcessDatabricksReads the .csv, removes duplicates, and calculates monthly totals.
4. ServeAzure SQLThe cleaned totals are loaded into a SQL table.
5. VisualizePower BIConnects to Azure SQL to show a “Sales Revenue” chart.

Summary Table

ServicePrimary Skill NeededBest For…
ADFLogic / Drag-and-DropMoving data & scheduling tasks.
ADLSFolder OrganizationStoring massive amounts of any data type.
DatabricksPython / SQL / SparkComplex math, AI, and cleaning big data.
Azure SQLStandard SQLPowering apps and BI dashboards.

To explain the pipeline between these four, we use the Medallion Architecture. This is the industry-standard way to move data from a “raw” state to an “AI-ready” or “Business-ready” state.


Phase 1: Ingestion (The “Collector”)

  • Services: ADF + ADLS Gen2 (Bronze Folder)
  • The Action: ADF acts as the trigger. It connects to your external source (like an internal SAP system, a REST API, or a local SQL Server).
  • The Result: ADF “copies” the data exactly as it is—warts and all—into the Bronze container of your ADLS.
  • Why? You always keep a raw copy. If your logic fails later, you don’t have to go back to the source; you just restart from the Bronze folder.

Phase 2: Transformation (The “Refinery”)

  • Services: Databricks + ADLS Gen2 (Silver Folder)
  • The Action: ADF sends a signal to Databricks to start a “Job.” Databricks opens the raw files from the Bronze folder.
    • It filters out null values.
    • It fixes date formats (e.g., changing 01-03-26 to 2026-03-01).
    • It joins tables together.
  • The Result: Databricks writes this “clean” data into the Silver container of your ADLS, usually in Delta format (a high-performance version of Parquet).

Phase 3: Aggregation & Logic (The “Chef”)

  • Services: Databricks + ADLS Gen2 (Gold Folder)
  • The Action: Databricks runs a second set of logic. Instead of just cleaning data, it calculates things. It creates “Gold” tables like Monthly_Sales_Summary or Employee_Vector_Embeddings.
  • The Result: These high-value tables are stored in the Gold container. This data is now perfect.

Phase 4: Serving (The “Storefront”)

  • Services: Azure SQL
  • The Action: ADF runs one final “Copy Activity.” it takes the small, aggregated tables from the Gold folder in ADLS and pushes them into Azure SQL Database.
  • The Result: Your internal dashboard (Power BI) or your Chatbot’s metadata storage connects to Azure SQL. Because the data is already cleaned and summarized, the dashboard loads instantly.

The Complete Workflow Summary

StageData StateTool in ChargeWhere it Sits
IngestRaw / MessyADFADLS (Bronze)
CleanFiltered / StandardizedDatabricksADLS (Silver)
ComputeAggregated / Business LogicDatabricksADLS (Gold)
ServeFinal Tables / Ready for UIADFAzure SQL

How this connects to your RAG Chatbot:

In your specific case, Databricks is the MVP. It reads the internal PDFs from the Silver folder, uses an AI model to turn the text into Vectors, and then you can either store those vectors in Azure SQL (if they are small) or send them straight to Azure AI Search.

Building a data pipeline in Azure

Building a data pipeline in Azure using Azure Data Factory (ADF) and Azure Data Lake Storage (ADLS) is the “bread and butter” of modern cloud data engineering. Think of ADLS as your massive digital warehouse and ADF as the conveyor belts and robotic arms moving things around.

Here is the high-level workflow and the steps to get it running.


1. The Architecture

In a typical scenario, you move data from a source (like an on-premises SQL DB or an API) into ADLS, then process it.

Key Components:

  • Linked Services: Your “Connection Strings.” These store the credentials to talk to ADLS or your source.
  • Datasets: These point to specific folders or files within your Linked Service.
  • Pipelines: The logical grouping of activities (the workflow).
  • Activities: The individual actions (e.g., Copy Data, Databricks Notebook, Lookup).

2. Step-by-Step Implementation

Step 1: Set up the Storage (ADLS Gen2)

  1. In the Azure Portal, create a Storage Account.
  2. Crucial: Under the “Advanced” tab, ensure Hierarchical Namespace is enabled. This turns standard Blob storage into ADLS Gen2.
  3. Create a Container (e.g., raw-data).

Step 2: Create the Linked Service in ADF

  1. Open Azure Data Factory Studio.
  2. Go to the Manage tab (toolbox icon) > Linked Services > New.
  3. Search for Azure Data Lake Storage Gen2.
  4. Select your subscription and the storage account you created. Test the connection and click Create.

Step 3: Define your Datasets

You need a “Source” dataset (where data comes from) and a “Sink” dataset (where data goes).

  1. Go to the Author tab (pencil icon) > Datasets > New Dataset.
  2. Select Azure Data Lake Storage Gen2.
  3. Choose the format (Parquet and Delimited Text/CSV are most common).
  4. Point it to the specific file path in your ADLS container.

Step 4: Build the Pipeline

  1. In the Author tab, click the + icon > Pipeline.
  2. From the Activities menu, drag and drop the Copy Data activity onto the canvas.
  3. Source Tab: Select your source dataset.
  4. Sink Tab: Select your ADLS dataset.
  5. Mapping Tab: Click “Import Schemas” to ensure the columns align correctly.

3. Best Practices for ADLS Pipelines

  • Folder Structure: Use a “Medallion Architecture” (Bronze/Raw, Silver/Cleaned, Gold/Aggregated) within your ADLS containers to keep data organized.
  • Triggering: Don’t just run things manually. Use Schedule Triggers (time-based) or Storage Event Triggers (runs automatically when a file drops into ADLS).
  • Parameters: Avoid hardcoding file names. Use Parameters and Dynamic Content so one pipeline can handle multiple different files.

4. Example Formula for Dynamic Paths

If you want to organize your data by date automatically in ADLS, you can use a dynamic expression in the dataset path:

$$dataset().Directory = concat(‘raw/’, formatDateTime(utcNow(), ‘yyyy/MM/dd’))$$

This ensures that every time the pipeline runs, it creates a new folder for that day’s data.

Ingress and API Gateways

In the world of Kubernetes and OpenShift, both Ingress and API Gateways serve as the entry point for external traffic. While they overlap in functionality, they operate at different levels of the networking stack and offer different “intelligence” regarding how they handle requests.

Think of Ingress as a simple receptionist directing people to the right room, while an API Gateway is a concierge who also checks IDs, translates languages, and limits how many people enter at once.


1. What is Ingress?

Ingress is a native Kubernetes resource (Layer 7) that manages external access to services, typically HTTP and HTTPS.

  • Primary Job: Simple routing based on the URL path (e.g., /api) or the hostname (e.g., app.example.com).
  • Implementation: In OCP, this is usually handled by the OpenShift Ingress Controller (based on HAProxy) using Routes.
  • Pros: Lightweight, standard across Kubernetes, and built-in.
  • Cons: Limited “logic.” It’s hard to do complex things like rate limiting, authentication, or request transformation without custom annotations.

2. What is an API Gateway?

An API Gateway is a more sophisticated proxy that sits in front of your microservices to provide “cross-cutting concerns.”

  • Primary Job: API Management. It handles security, monitoring, and orchestration.
  • Key Features:
    • Authentication/Authorization: Validating JWT tokens or API keys before the request hits the service.
    • Rate Limiting: Ensuring one user doesn’t spam your backend.
    • Payload Transformation: Changing a XML request to JSON for a modern backend.
    • Circuit Breaking: Stopping traffic to a failing service to prevent a total system crash.
  • Examples: Kong, Tyk, Apigee, or the Red Hat 3scale API Management platform.

Key Comparison Table

FeatureIngress / RouteAPI Gateway
OSI LayerLayer 7 (HTTP/S)Layer 7 + Application Logic
Main GoalExpose services to the internetProtect and manage APIs
ComplexityLowHigh
SecurityBasic SSL/TLS terminationJWT, OAuth, mTLS, IP Whitelisting
Traffic ControlSimple Load BalancingRate Limiting, Quotas, Retries
CostUsually free (built into OCP)Often requires licensing or extra infra

When to use which?

  • Use Ingress/Routes when: You have a web application and just need to point a domain name to a service. It’s the “plumbing” of the cluster.
  • Use an API Gateway when: You are exposing APIs to third parties, need strict usage tracking (monetization), or want to centralize security logic so your developers don’t have to write auth code for every single microservice.

The “Modern” Middle Ground: Gateway API

There is a newer Kubernetes standard called the Gateway API. It is designed to replace Ingress by providing the power of an API Gateway (like header-based routing and traffic splitting) while remaining a standard part of the Kubernetes ecosystem. In OpenShift, you can enable the Gateway API through the Operator.

To help you see the evolution, here is how the “old” standard (Ingress) compares to the “new” standard (Gateway API).

1. The Traditional Ingress

Ingress is a single, “flat” resource. It’s simple but limited because the person who owns the app (the developer) and the person who owns the network (the admin) have to share the same file.

YAML

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: my-app-ingress
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
spec:
rules:
- host: app.example.com
http:
paths:
- path: /api
pathType: Prefix
backend:
service:
name: api-service
port:
number: 80
  • The Problem: If you want to do something fancy like a “Canary deployment” (sending 10% of traffic to a new version), you usually have to use messy, vendor-specific annotations.

2. The Modern Gateway API

The Gateway API breaks the configuration into pieces. This allows the Cluster Admin to define the entry point (the Gateway) and the Developer to define how their specific app is reached (the HTTPRoute).

The Admin’s Part (The Infrastructure):

YAML

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: external-gateway
spec:
gatewayClassName: openshift-default
listeners:
- name: http
protocol: HTTP
port: 80

The Developer’s Part (The Logic & Traffic Splitting):

YAML

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: my-app-route
spec:
parentRefs:
- name: external-gateway
hostnames:
- "app.example.com"
rules:
- matches:
- path: { type: PathPrefix, value: /api }
backendRefs:
- name: api-v1
port: 80
weight: 90 # 90% of traffic here
- name: api-v2
port: 80
weight: 10 # 10% of traffic to the new version!

Summary of Differences

FeatureIngressGateway API
StructureMonolithic (One file for everything)Role-based (Separated for Admin vs Dev)
Traffic SplittingRequires non-standard annotationsBuilt-in (Weights/Canary)
ExtensibilityLimitedHigh (Supports TCP, UDP, TLS, GRPC)
PortabilityHigh (but annotations are not)Very High (Standardized across vendors)

Why OpenShift is moving this way

OpenShift 4.12+ fully supports the Gateway API because it solves the “annotation hell” that occurred when users tried to make basic Ingress act like a full API Gateway. It gives you the power of a professional Gateway (like Kong or Istio) but stays within the native Kubernetes language.

In OpenShift 4.15 and later (reaching General Availability in 4.19), the Gateway API is managed by the Cluster Ingress Operator. Unlike standard Kubernetes where you might have to install many CRDs manually, OpenShift streamlines this by bundling the controller logic into its existing operators.

Here is the step-by-step process to enable and use it.


1. Enable the Gateway API CRDs

In newer versions of OCP, the CRDs are often present but “dormant” until a GatewayClass is created. The Ingress Operator watches for a specific controllerName to trigger the installation of the underlying proxy (which is Istio/Envoy in the Red Hat implementation).

Create the GatewayClass:

YAML

apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
name: openshift-default
spec:
controllerName: openshift.io/gateway-controller/v1

What happens next? The Ingress Operator will automatically detect this and start a deployment called istiod-openshift-gateway in the openshift-ingress namespace.


2. Set up a Wildcard Certificate (Required)

Unlike standard Routes, the Gateway API in OCP does not automatically generate a default certificate. You need to provide a TLS secret in the openshift-ingress namespace.

Bash

# Example: Creating a self-signed wildcard for testing
oc -n openshift-ingress create secret tls gwapi-wildcard \
--cert=wildcard.crt --key=wildcard.key

3. Deploy the Gateway

The Gateway represents the actual “entry point” or load balancer.

YAML

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: my-gateway
namespace: openshift-ingress
spec:
gatewayClassName: openshift-default
listeners:
- name: https
protocol: HTTPS
port: 443
hostname: "*.apps.mycluster.com"
tls:
mode: Terminate
certificateRefs:
- name: gwapi-wildcard

4. Create an HTTPRoute (Developer Task)

Now that the “door” (Gateway) is open, a developer in a different namespace can “attach” their application to it.

YAML

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: my-app-route
namespace: my-app-project
spec:
parentRefs:
- name: my-gateway
namespace: openshift-ingress
hostnames:
- "myapp.apps.mycluster.com"
rules:
- backendRefs:
- name: my-app-service
port: 8080

Summary Checklist for the Interview

If you are asked how to set this up in an interview, remember these four pillars:

  1. Operator-Led: It’s managed by the Ingress Operator; no separate “Gateway Operator” is needed for the default Red Hat implementation.
  2. Implementation: OpenShift uses Envoy (via a lightweight Istio control plane) as the engine behind the Gateway API.
  3. Namespace: The Gateway object itself almost always lives in openshift-ingress.
  4. Service Type: Creating a Gateway usually triggers the creation of a Service type: LoadBalancer automatically.

Ingress vs Service mesh

Ingress and a service mesh solve different networking problems.

Ingress
Ingress is a Kubernetes API object for managing external access into the cluster, typically HTTP/HTTPS. It routes inbound requests based on hosts and paths to backend Services. Kubernetes now says Ingress is stable but frozen, and recommends the newer Gateway API for future development. (Kubernetes)

Service mesh
A service mesh is an infrastructure layer for service-to-service communication inside and around your app, adding things like traffic policy, observability, and zero-trust security without changing app code. In Istio, this includes traffic routing, retries, timeouts, fault injection, mTLS, authentication, and authorization. (Istio)

Practical difference
Think of it like this:

  • Ingress = the front door to your cluster
  • Service mesh = the road system and security checkpoints between services inside the cluster

Use Ingress when
You need:

  • a public endpoint for your app
  • host/path routing like api.example.com or /shop
  • TLS termination for incoming web traffic

That is the classic “internet → cluster → service” problem. (Kubernetes)

Use a service mesh when
You need:

  • service-to-service observability
  • mutual TLS between workloads
  • canary / weighted routing between versions
  • retries, timeouts, circuit breaking
  • policy and identity for east-west traffic
  • control over some outbound traffic too

Istio’s docs specifically describe percentage routing, version-aware routing, external service entries, retries, timeouts, and circuit breakers. (Istio)

Do they overlap?
A little. Both can influence traffic routing, but at different scopes:

  • Ingress mainly handles north-south traffic: outside users coming in
  • Service mesh mainly handles east-west traffic: service-to-service traffic inside the platform

A mesh can also handle ingress/egress via its own gateways, but that is a broader and heavier solution than plain Kubernetes Ingress. (Kubernetes)

Which should you choose?

  • For a simple web app exposing a few services: Ingress is usually enough.
  • For microservices that need security, tracing, traffic shaping, and resilience: service mesh is worth considering.
  • Many teams use both: one for external entry, one for internal communication.

One current note: for new Kubernetes edge-routing designs, Gateway API is the direction Kubernetes recommends over Ingress. (Kubernetes)

Here’s a concrete example.

Example app

Imagine an e-commerce app running on Kubernetes:

  • web-frontend
  • product-api
  • cart-api
  • checkout-api
  • payment-service
  • user-service

Customers come from the internet. The services call each other inside the cluster.

With Ingress only

Traffic flow:

Internet → Ingress controller → Kubernetes Service → Pods

Example:

  • shop.example.com goes to web-frontend
  • shop.example.com/api/* goes to product-api

What Ingress is doing here:

  • expose the app publicly
  • terminate TLS
  • route by host/path
  • maybe do some basic load balancing

So a request might go:

  1. User opens https://shop.example.com
  2. Ingress sends / to web-frontend
  3. web-frontend calls cart-api
  4. cart-api calls user-service
  5. checkout-api calls payment-service

The key point: Ingress mostly helps with step 1, the outside-in entry point. It does not, by itself, give you rich control/security/telemetry for steps 3–5. Ingress is for external access, and the Kubernetes project notes the API is stable but frozen, with Gateway API recommended for newer traffic-management work. (Kubernetes)

With Ingress + service mesh

Now add a mesh like Istio.

Traffic flow becomes:

Internet → Ingress/Gateway → web-frontend → mesh-controlled service-to-service traffic

Now you still have an entry point, but inside the cluster the mesh handles communication between services.

What the mesh adds:

  • mTLS between services
  • retries/timeouts
  • canary routing
  • traffic splitting
  • telemetry/tracing
  • authz policies between workloads

Example:

  • checkout-api sends 95% of traffic to payment-service v1 and 5% to payment-service v2
  • calls from cart-api to user-service get a 2-second timeout and one retry
  • only checkout-api is allowed to call payment-service
  • all service-to-service traffic is encrypted with mutual TLS

Those are standard service-mesh capabilities described in Istio’s traffic-management and security docs. (Istio)

Simple diagram

Ingress only

[User on Internet]
|
v
[Ingress]
|
v

[web-frontend]

| v [product-api] -> [cart-api] -> [checkout-api] -> [payment-service]

Ingress + service mesh

[User on Internet]
|
v
[Ingress / Gateway]
|
v

[web-frontend]

| v ————————————————- |

Service Mesh inside cluster | | [product-api] <-> [cart-api] <-> [checkout-api]| | \ | / | | \ v / | | ——> [user-service] | | [payment-service] | | | | mTLS, retries, tracing, canaries, policy | ————————————————-

Real-world way teams choose

Use just Ingress when:

  • you have a small app
  • you mostly need public routing
  • internal service communication is simple
  • you do not need per-service security/policy

Add a service mesh when:

  • you have many microservices
  • debugging internal calls is hard
  • you need zero-trust service identity
  • you do canaries/traffic shaping often
  • you want consistent retries/timeouts/policies

One important 2026 note

For brand-new Kubernetes edge-routing setups, many teams are moving toward Gateway API instead of classic Ingress. Kubernetes recommends Gateway over Ingress for future-facing work, and Istio also supports Gateway API for traffic management. (Kubernetes)

Rule of thumb

  • Ingress/Gateway API: “How does traffic get into my cluster?”
  • Service mesh: “How do services inside my platform talk securely and reliably?”

Kubernetes networking

Kubernetes networking is often considered the most complex part of the system because it operates on a “flat network” model. In a traditional setup, you might worry about port conflicts or how to reach a specific VM. In Kubernetes, every Pod gets its own unique IP address, and every Pod can talk to every other Pod without NAT (Network Address Translation).

To understand how traffic flows, we break it down into four distinct “layers” of communication.


1. Pod-to-Pod Communication

Every Pod in a cluster has its own internal IP. Kubernetes mandates that Pods on one node must be able to reach Pods on another node without any special configuration.

  • The Container Network Interface (CNI): This is the plugin (like Calico, Cilium, or OpenShift SDN) that actually builds the “pipes” between nodes.
  • The Experience: From the perspective of a container, it feels like it’s on a standard Ethernet network. It doesn’t care if the target Pod is on the same physical server or one across the data center.

2. Pod-to-Service Communication

Pods are “ephemeral”—they die and get replaced constantly, and their IP addresses change every time. You can’t hardcode a Pod IP into your app.

  • The Service: A Service is a stable “virtual IP” (ClusterIP) that sits in front of a group of Pods.
  • Kube-Proxy: This is a process running on every node that watches the API server. When you try to hit a Service IP, kube-proxy intercepts that traffic and redirects it to one of the healthy backend Pods.
  • CoreDNS: Kubernetes includes a built-in DNS service. Instead of an IP, your app just connects to http://my-database-service.

3. External-to-Service (Ingress & Egress)

How do people outside the cluster reach your apps?

  • NodePort: Opens a specific port (30000–32767) on every single node. High maintenance and insecure for production.
  • LoadBalancer: Tells your cloud provider (AWS/Azure/GCP) to spin up a physical Load Balancer that points to your nodes.
  • Ingress/Route: As we discussed earlier, this is a Layer 7 “smart router” that lets you use hostnames (like api.example.com) to direct traffic to different services using a single entry point.

4. Network Policies (The Firewall)

By default, Kubernetes is “open”—any Pod can talk to any other Pod. In production, you usually want to restrict this for security.

  • NetworkPolicy: These are the firewall rules of Kubernetes. You can define rules like: “Only the ‘Frontend’ Pod is allowed to talk to the ‘Database’ Pod on port 5432.”
  • Implementation: These rules are enforced by the CNI (e.g., Calico or Cilium) at the packet level.

Summary of Components

ComponentLevelPurpose
CNILayer 3 (IP)Connects Pods across different nodes.
ServiceLayer 4 (TCP/UDP)Provides a stable IP/DNS for a group of Pods.
IngressLayer 7 (HTTP)Handles routing, SSL, and hostnames.
NetworkPolicySecurityDefines which Pods are allowed to talk to each other.

The Path of a Packet

If a user types example.com into their browser:

  1. DNS resolves the name to the Ingress Controller’s IP.
  2. The Ingress Controller looks at the host header and finds the correct Service.
  3. The Service (via kube-proxy) picks a healthy Pod and sends the traffic there.
  4. The CNI routes that packet across the internal network to the node where the Pod is living.

To understand how data actually moves between nodes, we have to look at the CNI (Container Network Interface).

Since every Pod has a unique IP but lives on a host (Node) with its own separate IP, the CNI’s job is to “tunnel” the Pod’s traffic across the physical network. The two most common ways it does this are Encapsulation (Overlay) and Direct Routing.


1. Encapsulation (The “Envelope” Method)

This is the most common approach (used by Flannel (VXLAN) and OpenShift SDN). It treats the physical network as a “carrier” for a private, virtual network.

  • How it works: When Pod A (on Node 1) sends a packet to Pod B (on Node 2), the CNI takes that entire packet and wraps it inside a new UDP packet.
  • The “Outer” Header: Points from Node 1’s IP to Node 2’s IP.
  • The “Inner” Header: Points from Pod A’s IP to Pod B’s IP.
  • Arrival: When the packet hits Node 2, the CNI “unwraps” the outer envelope and delivers the original inner packet to Pod B.

The Downside: This adds a small amount of overhead (usually about 50 bytes per packet) because of the extra headers. This is why you often see the MTU (Maximum Transmission Unit) set slightly lower in Kubernetes (e.g., 1450 instead of 1500).


2. Direct Routing (The “BGP” Method)

Used by Calico (in non-overlay mode) and Cilium, this method avoids the “envelope” entirely for better performance.

  • How it works: The nodes act like standard network routers. They use BGP (Border Gateway Protocol) to tell each other: “Hey, if you want to reach the 10.244.1.0/24 subnet, send those packets to me (Node 1).”
  • The Experience: Packets travel “naked” across the wire with no extra headers.
  • The Requirement: Your physical network routers must be able to handle these extra routes, or the nodes must all be on the same Layer 2 segment (the same VLAN/Switch).

3. The Role of the “veth” Pair

Regardless of how the data moves between nodes, getting data out of a container uses a Virtual Ethernet (veth) pair.

Think of a veth pair as a virtual patch cable:

  1. One end is plugged into the container (usually named eth0).
  2. The other end is plugged into the host’s network namespace (often named something like vethabc123).
  3. The host end is usually connected to a Bridge (like cni0 or br0), which acts like a virtual switch for all Pods on that specific node.

4. Comparing Popular CNIs

CNI PluginPrimary MethodBest Use Case
FlannelVXLAN (Overlay)Simple clusters; works almost anywhere.
CalicoBGP or IP-in-IPHigh performance and advanced Network Policies.
CiliumeBPFNext-gen performance, deep security, and observability.
OpenShift SDNVXLANDefault for older OCP; very stable and integrated.
OVN-KubernetesGeneve (Overlay)Modern OCP default; supports massive scale and Windows nodes.

Which one should you choose?

  • If you are on OpenShift 4.x, you are likely using OVN-Kubernetes. It’s powerful and handles complex routing for you.
  • If you are building a Vanilla K8s cluster and want the absolute fastest networking, Cilium is the current industry gold standard because it uses eBPF to bypass parts of the Linux kernel entirely.

To understand why eBPF (Extended Berkeley Packet Filter) is the “holy grail” of modern Kubernetes networking, we first have to look at how the “old way” (iptables) works.

1. The Old Way: iptables (The “List of Rules”)

For years, Kubernetes used iptables (a standard Linux kernel feature) to route traffic.

  • How it works: Imagine a giant printed list of instructions. Every time a packet arrives, the CPU has to read the list from top to bottom: “Is it for Service A? No. Service B? No. Service C? Yes.”
  • The Problem: As you add more Services and Pods, this list grows to thousands of lines. If a packet is destined for the 5,000th service on the list, the CPU has to perform 5,000 checks for every single packet.
  • Result: High latency and high CPU “overhead” just to move data.

2. The New Way: eBPF (The “Direct Shortcut”)

eBPF allows you to run small, sandboxed programs directly inside the Linux kernel without changing the kernel code.

  • How it works: Instead of a long list of rules, eBPF creates a high-speed “lookup table” (a Hash Map) in the kernel’s memory.
  • The Benefit: When a packet arrives, the eBPF program looks at the destination and instantly knows where it goes. It doesn’t matter if you have 10 services or 10,000—the lookup time is exactly the same (O(1) complexity).
  • Bypassing the Stack: eBPF can catch a packet the moment it hits the Network Interface Card (NIC) and send it straight to the Pod, bypassing almost the entire Linux networking stack.

3. Why Cilium + eBPF is a Game Changer

Cilium is the most popular CNI that uses eBPF. It provides three massive advantages over traditional networking:

Featureiptables / Standard CNICilium (eBPF)
PerformanceSlows down as the cluster grows.Consistently fast at any scale.
ObservabilityHard to see “who is talking to who” without sidecars.Hubble (Cilium’s UI) shows every flow, drop, and latency in real-time.
SecurityIP-based filtering (hard to manage).Identity-based filtering. It knows a packet belongs to “Service-Frontend” regardless of its IP.

4. Why OpenShift is Moving to OVN (Geneve)

While Cilium is the “shiny new toy,” Red Hat chose OVN-Kubernetes (using the Geneve protocol) as the default for OCP 4.

  • Scale: OVN is built on Open vSwitch, which was designed for massive telco-grade clouds.
  • Feature Parity: It handles complex things like “Egress IPs” (giving a specific namespace a static IP for exiting the cluster) and Hybrid networking (Linux + Windows nodes) much more maturely than basic eBPF implementations did a few years ago.

Summary: The “Speed” Evolution

  1. iptables: Standard, but slow at scale.
  2. IPVS: A middle ground that uses hash tables but is still tied to the old kernel stack.
  3. eBPF (Cilium): The fastest possible way to move data in Linux today.

In OpenShift, the modern way to see these network flows is through the Network Observability Operator. This tool uses the eBPF technology we discussed to capture traffic data without slowing down your pods.

Here is how you can access and use these views.


1. Using the Web Console (The GUI Way)

Once the operator is installed, a new menu appears in your OpenShift Console.

  1. Navigate to Observe -> Network Traffic in the Administrator perspective.
  2. Overview Tab: This gives you a high-level “Sankey” diagram or graph showing which namespaces are talking to each other. It’s perfect for spotting “top talkers” (apps using the most bandwidth).
  3. Traffic Flows Tab: This is like a “Wireshark-lite” for your cluster. You can see every individual connection, including:
    • Source/Destination: Which pod is talking to which service.
    • Byte Rate: How much data is moving.
    • RTT (Latency): Exactly how many milliseconds a packet takes to travel between pods.
  4. Topology Tab: This provides a visual map of your network. You can group by “Namespace” or “Node” to see how traffic crosses physical boundaries.

2. Using the CLI (The “oc netobserv” plugin)

If you prefer the terminal, there is a specific plugin called oc netobserv. This is incredibly useful for live debugging when you don’t want to leave your shell.

Capture live flows:

Bash

oc netobserv flows --protocol=TCP --port=80

This will stream live traffic data directly to your terminal.

Filter for specific issues:

You can filter by namespace or even look for dropped packets (great for debugging firewall/NetworkPolicy issues):

Bash

oc netobserv flows --namespace=my-app --action=Drop

3. The “Loki” Backend

Behind the scenes, the Network Observability Operator stores these flows in Loki (a log aggregation system). This allows you to “go back in time.”

If an application crashed at 2:00 AM, you can go to the Network Traffic page, set the time filter to 2:00 AM, and see if there was a sudden spike in traffic or if a connection was being blocked by a security policy at that exact moment.


4. Advanced Debugging: Packet Drops

One of the best features of the eBPF-based observer is Packet Drop tracking. Traditional tools often can’t tell you why a packet disappeared. With this tool, the kernel can report the exact reason:

  • Filter Drop: A NetworkPolicy blocked it.
  • TCP Timeout: The other side didn’t respond.
  • Congestion: The network interface was overloaded.

Summary: What can you find?

  • Security: Is my database receiving traffic from an unauthorized namespace?
  • Performance: Which pods have the highest latency (RTT)?
  • Cost: Which services are sending the most data to external (Internet) IPs?

In Kubernetes, a NetworkPolicy is your cluster’s internal firewall. By default, Kubernetes has a “non-isolated” policy—meaning every pod can talk to every other pod.

To secure your app, you should follow the “Principle of Least Privilege”: block everything, then specifically allow only what is necessary.


1. The “Default Deny” (The Foundation)

Before you write specific rules, it is a best practice to create a “Default Deny” policy for your namespace. This locks all doors so that nothing can enter or leave unless you explicitly say so.

YAML

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: my-secure-app
spec:
podSelector: {} # Matches all pods in this namespace
policyTypes:
- Ingress
- Egress

2. Allowing Specific Traffic (The “Rule”)

Now that everything is blocked, let’s say you have a Database pod and you only want your Frontend pod to talk to it on port 5432.

YAML

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-frontend-to-db
namespace: my-secure-app
spec:
podSelector:
matchLabels:
app: database # This policy applies to pods labeled 'app: database'
policyTypes:
- Ingress
parent:
ingress:
- from:
- podSelector:
matchLabels:
role: frontend # Only allow pods labeled 'role: frontend'
ports:
- protocol: TCP
port: 5432

3. Three Ways to Target Traffic

You can control traffic based on three different criteria:

  1. podSelector: Target pods within the same namespace (e.g., “Frontend to Backend”).
  2. namespaceSelector: Target entire namespaces (e.g., “Allow everything from the ‘Monitoring’ namespace”).
  3. ipBlock: Target specific IP ranges outside the cluster (e.g., “Allow traffic from our corporate VPN range 10.0.0.0/24”).

4. Troubleshooting NetworkPolicies

If you apply a policy and your app stops working, here is how to debug:

  • Check Labels: NetworkPolicies rely 100% on labels. If your Frontend pod is labeled app: front-end but your policy looks for role: frontend, it will fail silently.
  • The “Blind” Policy: Standard Kubernetes doesn’t “log” when a policy blocks a packet. This is why we use the Network Observability Operator (as we discussed) to see the “Drop” events.
  • CNI Support: Remember, the CNI (Calico, OVN, etc.) is what actually enforces these rules. If your CNI doesn’t support NetworkPolicies (like basic Flannel), the YAML will be accepted but it won’t actually block anything!

Summary: Ingress vs. Egress

  • Ingress: Controls traffic coming into the pod (Who can talk to me?).
  • Egress: Controls traffic leaving the pod (Who can I talk to?).

A Zero Trust architecture in Kubernetes means that no pod is trusted by default. Even if a pod is inside your cluster, it shouldn’t be allowed to talk to anything else unless you specifically permit it.

In this scenario, we have a 3-tier app: Frontend, Backend, and Database.


1. The “Lockdown” (Default Deny)

First, we apply this to the entire namespace. This ensures that any new pod you deploy in the future is “secure by default” and cannot communicate until you add a rule for it.

YAML

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: my-app-stack
spec:
podSelector: {} # Matches ALL pods
policyTypes:
- Ingress
- Egress

2. Tier 1: The Frontend

The Frontend needs to receive traffic from the Internet (via the Ingress Controller) and send traffic only to the Backend.

YAML

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: frontend-policy
namespace: my-app-stack
spec:
podSelector:
matchLabels:
tier: frontend
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
network.openshift.io/policy-group: ingress # Allows OpenShift Ingress Controller
egress:
- to:
- podSelector:
matchLabels:
tier: backend # ONLY allowed to talk to Backend

3. Tier 2: The Backend

The Backend should only accept traffic from the Frontend and is only allowed to talk to the Database.

YAML

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: backend-policy
namespace: my-app-stack
spec:
podSelector:
matchLabels:
tier: backend
policyTypes:
- Ingress
- Egress
ingress:
- from:
- podSelector:
matchLabels:
tier: frontend # ONLY accepts Frontend traffic
egress:
- to:
- podSelector:
matchLabels:
tier: database # ONLY allowed to talk to DB

4. Tier 3: The Database

The Database is the most sensitive. It should never initiate a connection (no Egress) and only accept traffic from the Backend.

YAML

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: database-policy
namespace: my-app-stack
spec:
podSelector:
matchLabels:
tier: database
policyTypes:
- Ingress
- Egress # We include Egress to ensure it's blocked by default
ingress:
- from:
- podSelector:
matchLabels:
tier: backend
ports:
- protocol: TCP
port: 5432 # Postgres port

Important: Don’t Forget DNS!

When you apply a “Default Deny” Egress policy, your pods can no longer talk to CoreDNS, which means they can’t resolve service names like http://backend-service.

To fix this, you must add one more policy to allow UDP Port 53 to the openshift-dns namespace:

YAML

  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: openshift-dns
    ports:
    - protocol: UDP
      port: 53


Summary of the Strategy

  • Labels are everything: If you typo tier: backend as tier: back-end, the wall stays up and the app breaks.
  • Layered Security: Even if a hacker compromises your Frontend pod, they cannot “scan” your network or reach your Database directly; they are stuck only being able to talk to the Backend.

To test your Zero Trust setup, we will perform a “Penetration Test” from inside the cluster. We’ll use a temporary debug pod to see if our firewall rules actually block unauthorized access.

1. The “Attacker” Pod

We will spin up a temporary pod with networking tools (like curl and dig) that has no labels. Since our “Default Deny” policy targets all pods, this pod should be completely isolated the moment it starts.

Bash

# Run a temporary alpine pod
oc run network-tester --image=alpine --restart=Never -- /bin/sh -c "sleep 3600"

2. Test 1: Can an unknown pod talk to the Database?

Even if this pod is in the same namespace, it should not be able to reach the database because it doesn’t have the tier: backend label.

Bash

# Try to connect to the database on port 5432
oc exec network-tester -- nc -zv database-service 5432

Expected Result: Connection timed out (The packet is dropped by the NetworkPolicy).


3. Test 2: Can the Frontend “Scan” the Database?

Now, let’s pretend a hacker compromised your Frontend pod. Can they bypass the Backend and talk directly to the Database?

Bash

# Exec into your existing Frontend pod
oc exec deployment/frontend -- nc -zv database-service 5432

Expected Result: Connection timed out.

Even though the Frontend is “trusted” to talk to the Backend, it is not trusted to talk to the Database.


4. Test 3: Can the Database reach the Internet? (Egress Test)

A common hacker tactic is to steal data and send it to an external server (Exfiltration). Let’s see if the Database can “phone home.”

Bash

# Try to ping Google from the Database pod
oc exec deployment/database -- curl -I google.com

Expected Result: Could not resolve host or Timeout.

Since we didn’t add an Egress rule for the Database, it is physically unable to send data out of its own pod.


5. How to see the “Deny” in real-time

If you want to prove the policy is working without just guessing based on timeouts, use the Network Observability tool we set up earlier:

  1. Go to Observe -> Network Traffic.
  2. Filter by Action: Deny or Action: Drop.
  3. You will see a red entry showing:
    • Source: network-tester
    • Destination: database
    • Reason: NetworkPolicy

6. Cleaning Up

When you’re done testing, don’t forget to remove the tester pod:

Bash

oc delete pod network-tester

Summary of the Lab

TestSourceTargetStatusWhy?
UnauthorizedDebug PodDatabaseBLOCKEDMissing tier: backend label.
Lateral MovementFrontendDatabaseBLOCKEDPolicy only allows Frontend -> Backend.
ExfiltrationDatabaseInternetBLOCKEDNo Egress rules defined for DB.

Ingress

In Kubernetes, Ingress is an API object that acts as a “smart router” for your cluster. While a standard Service (like a LoadBalancer) simply opens a hole in the firewall for one specific app, Ingress allows you to consolidate many services behind a single entry point and route traffic based on the URL or path.

Think of it as the receptionist of an office building: instead of every employee having their own front door, everyone uses one main entrance, and the receptionist directs visitors to the correct room based on who they are looking for.


1. How Ingress Works

There are two distinct parts required to make this work:

  1. Ingress Resource: A YAML file where you define your “rules” (e.g., “Send all traffic for myapp.com/api to the api-service“).
  2. Ingress Controller: The actual software (like NGINX, HAProxy, or Traefik) that sits at the edge of your cluster, reads those rules, and physically moves the traffic. Kubernetes does not come with a controller by default; you must install one.

2. Key Capabilities

Ingress is much more powerful than a simple Port or LoadBalancer because it operates at Layer 7 (HTTP/HTTPS).

  • Host-based Routing: Route blue.example.com to the Blue Service and green.example.com to the Green Service using a single IP.
  • Path-based Routing: Route example.com/login to the Auth service and example.com/search to the Search service.
  • SSL/TLS Termination: You can handle your SSL certificates at the Ingress level so your individual application pods don’t have to deal with encryption/decryption.
  • Name-based Virtual Hosting: Supporting multiple domain names on the same IP address.

3. Ingress vs. LoadBalancer vs. NodePort

Choosing how to expose your app is a common point of confusion. Here is the breakdown:

MethodBest ForPros/Cons
NodePortTesting/DevOpens a high-range port (30000+) on every node. Very insecure for production.
LoadBalancerSingle AppsStandard way to get an external IP. Expensive if you have 50 apps, as you pay for 50 cloud load balancers.
IngressProductionMultiple apps share one Load Balancer. Extremely cost-effective and feature-rich.

4. What about OpenShift “Routes”?

Since you were asking about OCP earlier, it’s important to note that OpenShift has its own version called a Route.

  • Routes were created by Red Hat before the Kubernetes Ingress standard existed.
  • In modern OpenShift, an Ingress is automatically converted into a Route behind the scenes.
  • The main difference: Routes support “F5” and more complex HAProxy features natively, while Ingress is the industry-standard “portable” way to do it.

5. Basic Ingress Example

This YAML routes traffic to two different services based on the URL path:

YAML

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: main-ingress
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
spec:
rules:
- host: myapp.example.com
http:
paths:
- path: /billing
pathType: Prefix
backend:
service:
name: billing-service
port:
number: 80
- path: /inventory
pathType: Prefix
backend:
service:
name: inventory-service
port:
number: 80

To set up HTTPS for an Ingress, you need a TLS Secret. This secret acts as a container for your SSL certificate and private key, which the Ingress Controller (like NGINX or HAProxy) uses to encrypt the traffic before it reaches your pods.


1. Create the TLS Secret

First, you need your certificate files (tls.crt and tls.key). You can create a Kubernetes Secret from these files using the CLI:

Bash

kubectl create secret tls my-app-tls \
--cert=path/to/tls.crt \
--key=path/to/tls.key

Note: If you are using OpenShift, you can often skip this manual step by using Service Serving Certificates, which allow OpenShift to generate and manage the certificates for you automatically.


2. Update the Ingress YAML

Now, you tell the Ingress resource to use that secret for a specific hostname.

YAML

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: secure-ingress
annotations:
# This annotation tells the controller to redirect HTTP to HTTPS
nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
tls:
- hosts:
- myapp.example.com
secretName: my-app-tls # This must match the secret name created in Step 1
rules:
- host: myapp.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: my-web-service
port:
number: 80

3. Automation with Cert-Manager

Manually updating certificates before they expire is a headache. Most production clusters use Cert-Manager.

Cert-Manager is an operator that talks to certificate authorities like Let’s Encrypt. You simply add an annotation to your Ingress, and Cert-Manager handles the rest:

The “Magic” Annotation:

YAML

metadata:
annotations:
cert-manager.io/cluster-issuer: "letsencrypt-prod"

Once you add this, Cert-Manager will:

  1. See the Ingress request.
  2. Reach out to Let’s Encrypt to verify you own the domain.
  3. Generate the tls.crt and tls.key.
  4. Create the Secret for you and renew it every 90 days automatically.

Summary Checklist for HTTPS

StepAction
1. CertificateObtain a CA-signed cert or use Let’s Encrypt.
2. SecretStore the cert/key in a kind: Secret (type kubernetes.io/tls).
3. Ingress SpecAdd the tls: section to your Ingress YAML.
4. DNSEnsure your domain points to the Ingress Controller’s IP.

To automate SSL certificates with Cert-Manager, you need a ClusterIssuer. This is a cluster-wide resource that tells Cert-Manager how to talk to a Certificate Authority (CA) like Let’s Encrypt.

Before you start, ensure the Cert-Manager Operator is installed in your cluster (in OpenShift, you can find this in the OperatorHub).


1. Create a ClusterIssuer (The “Account”)

This YAML defines your identity with Let’s Encrypt. It uses the ACME (Automated Certificate Management Environment) protocol.

YAML

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
# The ACME server address for Let's Encrypt production
server: https://acme-v02.api.letsencrypt.org/directory
# Your email address for expiration notices
email: admin@yourdomain.com
# Name of a secret used to store the ACME account private key
privateKeySecretRef:
name: letsencrypt-prod-account-key
# Enable the HTTP-01 challenge provider
solvers:
- http01:
ingress:
class: nginx # Or 'openshift-default' depending on your ingress controller

2. Update your Ingress to “Request” the Cert

Once the ClusterIssuer is created, you don’t need to manually create secrets anymore. You just “tag” your Ingress with an annotation. Cert-Manager will see this, perform the challenge, and create the secret for you.

YAML

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: my-secure-app
annotations:
# THIS IS THE TRIGGER: It links the Ingress to your ClusterIssuer
cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
tls:
- hosts:
- app.yourdomain.com
secretName: app-tls-cert # Cert-Manager will create this secret automatically
rules:
- host: app.yourdomain.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: my-service
port:
number: 80

3. How to verify it’s working

After you apply the Ingress, Cert-Manager creates a Certificate object and a Challenge object. You can track the progress:

  • Check the certificate status:kubectl get certificate(Look for READY: True)
  • Check the order status (if it’s stuck):kubectl get challenges
  • Check the secret:kubectl get secret app-tls-cert(If this exists, your site is now HTTPS!)

Why use Let’s Encrypt?

  1. Cost: It is 100% free.
  2. Trust: It is recognized by all major browsers (unlike self-signed certs).
  3. No Maintenance: Cert-Manager automatically renews the cert 30 days before it expires.

A Small Warning:

Let’s Encrypt has rate limits. If you are just testing, use the “Staging” URL (https://acme-staging-v02.api.letsencrypt.org/directory) first. Browsers will show a warning for staging certs, but you won’t get blocked for hitting limit thresholds while debugging.

When Cert-Manager fails to issue a certificate, it usually gets stuck in the Challenge phase. Let’s look at how to diagnose and fix the most common “Let’s Encrypt” roadblocks.


1. The Troubleshooting Command Chain

If your certificate isn’t appearing, follow this hierarchy to find where the “handshake” broke:

  • Check the Certificate status:oc get certificateIf READY is False, move to the next step.
  • Check the Order:oc get orderThe Order is the request sent to Let’s Encrypt. Look at the STATE column.
  • Check the Challenge (The most important step):oc get challengesIf a challenge exists, it means Let’s Encrypt is trying to verify your domain but can’t.
  • Describe the Challenge for the error message:oc describe challenge <challenge-name>

2. Common Failure Reasons

A. The “I Can’t See You” (Firewall/Network)

Let’s Encrypt uses the HTTP-01 challenge. It tries to reach http://yourdomain.com/.well-known/acme-challenge/<TOKEN>.

  • The Problem: Your firewall, Security Group (AWS/Azure), or OpenShift Ingress Controller is blocking Port 80.
  • The Fix: Ensure Port 80 is open to the public internet. Let’s Encrypt cannot verify your domain over Port 443 (HTTPS) because the certificate doesn’t exist yet!

B. DNS Record Mismatch

  • The Problem: Your DNS A record or CNAME for app.yourdomain.com hasn’t propagated yet or is pointing to the wrong Load Balancer IP.
  • The Fix: Use dig app.yourdomain.com or nslookup to ensure the domain points exactly to your Ingress Controller’s external IP.

C. Rate Limiting

  • The Problem: You’ve tried to issue the same certificate too many times in one week (Let’s Encrypt has a limit of 5 duplicate certs per week).
  • The Fix: Switch your ClusterIssuer to use the Staging URL (mentioned in the previous step) until your configuration is 100% correct, then switch back to Production.

3. Dealing with Internal/Private Clusters

If your OpenShift cluster is behind a VPN and not accessible from the public internet, the HTTP-01 challenge will always fail because Let’s Encrypt can’t “see” your pods.

The Solution: DNS-01 Challenge

Instead of a web check, Cert-Manager proves ownership by adding a temporary TXT record to your DNS provider (Route53, Cloudflare, Azure DNS).

Example DNS-01 Issuer (Route53):

YAML

spec:
acme:
solvers:
- dns01:
aws-route53:
region: us-east-1
hostedZoneID: Z123456789

Summary Checklist

  1. Is Port 80 open?
  2. Does DNS point to the cluster?
  3. Are you hitting Rate Limits?
  4. Is your Ingress Class correct in the Issuer?

Persistent Volumes (PV) and Persistent Volume Claims (PVC)

In Kubernetes, storage is handled separately from your application’s logic. To understand Persistent Volumes (PV) and Persistent Volume Claims (PVC), it helps to use the “Electricity” analogy:

  • PV (The Infrastructure): This is like the power plant and the grid. It’s the actual physical storage (a disk, a cloud drive, or a network share).
  • PVC (The Request): This is like the power outlet in your wall. Your application “plugs in” to the PVC to get what it needs without needing to know where the power plant is.

1. Persistent Volume (PV)

A PV is a piece of storage in the cluster that has been provisioned by an administrator or by a storage class. It is a cluster-level resource (like a Node) and exists independently of any individual Pod.

  • Capacity: How much space is available (e.g., 5Gi, 100Gi).
  • Access Modes: * ReadWriteOnce (RWO): Can be mounted by one node at a time.
    • ReadOnlyMany (ROX): Many nodes can read it simultaneously.
    • ReadWriteMany (RWX): Many nodes can read and write at the same time (requires specific hardware like NFS or ODF).
  • Reclaim Policy: What happens to the data when you delete the PVC? (Retain it for manual cleanup or Delete it immediately).

2. Persistent Volume Claim (PVC)

A PVC is a request for storage by a user. If a Pod needs a “hard drive,” it doesn’t look for a specific disk; it creates a PVC asking for “10Gi of storage with ReadWriteOnce access.”

  • The “Binding” Process: Kubernetes looks at all available PVs. If it finds a PV that matches the PVC’s request, it “binds” them together.
  • Namespace Scoped: Unlike PVs, PVCs live inside a specific Namespace.

3. Dynamic Provisioning (StorageClasses)

In modern clusters (like OpenShift), admins don’t manually create 100 different PVs. Instead, they use a StorageClass.

  1. The user creates a PVC.
  2. The StorageClass notices the request.
  3. It automatically talks to the cloud provider (AWS/Azure/GCP) to create a new disk.
  4. It automatically creates the PV and binds it to the PVC.

4. How a Pod uses it

Once the PVC is bound to a PV, you tell your Pod to use that “outlet.”

YAML

spec:
containers:
- name: my-db
image: postgres
volumeMounts:
- mountPath: "/var/lib/postgresql/data"
name: my-storage
volumes:
- name: my-storage
persistentVolumeClaim:
claimName: task-pv-claim # This matches the name of your PVC

Summary Comparison

FeaturePersistent Volume (PV)Persistent Volume Claim (PVC)
Who creates it?Administrator or Storage SystemDeveloper / Application
ScopeCluster-wideNamespace-specific
AnalogyThe actual Hard DriveThe request for a Hard Drive
LifecycleExists even if no one uses itTied to the application’s needs

Here is a standard YAML example for a Persistent Volume Claim (PVC).

In this scenario, we aren’t manually creating a disk. Instead, we are telling OpenShift/Kubernetes: “I need 10Gi of fast storage. Please go talk to the cloud provider or storage backend and create it for me.”

1. The PVC Definition

This is the “request” for storage.

YAML

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: dynamic-pvc-example
namespace: my-app-project
spec:
storageClassName: gp3-csi # Or 'thin', 'ocs-storagecluster-ceph-rbd', etc.
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi

2. How the Binding Works

When you apply this YAML, the following chain reaction happens:

  1. The Claim: You submit the PVC.
  2. The Provisioner: The StorageClass (e.g., AWS EBS, Azure Disk, or OpenShift Data Foundation) sees the request.
  3. The Asset: The storage backend creates a physical 10Gi volume.
  4. The Volume: Kubernetes automatically creates a PersistentVolume (PV) object to represent that physical disk.
  5. The Binding: The PVC status changes from Pending to Bound.

3. Attaching the PVC to a Pod

A PVC is useless until a Pod “claims” it. Here is how you mount that 10Gi disk into a container:

YAML

apiVersion: v1
kind: Pod
metadata:
name: storage-test-pod
spec:
containers:
- name: nginx
image: nginx
volumeMounts:
- name: my-data-volume
mountPath: /usr/share/nginx/html # Where the disk appears inside the container
volumes:
- name: my-data-volume
persistentVolumeClaim:
claimName: dynamic-pvc-example # Must match the name in the PVC YAML

Important “Gotchas” with PVCs

  • Access Modes: * ReadWriteOnce (RWO): Most common. If Pod A is using the disk on Node 1, Pod B cannot use it if Pod B is on Node 2.
    • ReadWriteMany (RWX): Required if you want multiple Pods across different nodes to share the same files (common for web servers sharing a shared uploads folder).
  • Expansion: Many modern StorageClasses allow you to increase the storage size in the PVC YAML after it’s created, and Kubernetes will expand the disk on the fly (provided the underlying storage supports it).
  • Sticky Nodes: If you use a cloud-based RWO disk (like AWS EBS), your Pod becomes “stuck” to the availability zone where that disk was created.

Checking for available StorageClasses is one of the most common tasks for an OpenShift administrator or developer. It tells you exactly what “flavors” of storage are available for your apps.

1. Using the CLI (Recommended)

Run the following command to see a list of all storage providers configured in your cluster:

Bash

oc get storageclass

(Or use the shorthand: oc get sc)

Example Output:

Plaintext

NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
gp3-csi (default) ebs.csi.aws.com Delete WaitForFirstConsumer true 45d
thin kubernetes.io/vsphere Delete Immediate false 102d
ocs-storage-rbd openshift-storage.rbd... Delete Immediate true 12d

2. How to Read the Results

  • NAME: This is what you put in the storageClassName field of your PVC YAML.
  • (default): If you see this next to a name, it means any PVC that doesn’t specify a class will automatically get this one.
  • PROVISIONER: This tells you the underlying technology (e.g., AWS EBS, VMware vSphere, or Ceph/ODF).
  • RECLAIMPOLICY: * Delete: When you delete the PVC, the physical disk is also deleted.
    • Retain: When you delete the PVC, the physical disk stays (so you can recover data manually).
  • VOLUMEBINDINGMODE:
    • Immediate: The disk is created the second you create the PVC.
    • WaitForFirstConsumer: The disk isn’t created until a Pod actually tries to use it. This is smarter because it ensures the disk is created in the same Availability Zone as the Pod.

3. Inspecting a Specific StorageClass

If you want to see the “fine print” (like encryption settings or IOPS), you can look at the YAML of a specific class:

Bash

oc get sc gp3-csi -o yaml

4. Using the Web Console

If you prefer the GUI:

  1. Log in to the OpenShift Web Console.
  2. Ensure you are in the Administrator perspective.
  3. Navigate to Storage -> StorageClasses.
  4. Here, you can see all classes, which one is the default, and even create new ones using a wizard.

Pro-Tip: If your cluster has no default storage class, your PVCs will stay in a Pending state forever unless you explicitly name one in your YAML.

Autoscaling in Kubernetes

Autoscaling in Kubernetes is the process of automatically adjusting your resources to match the current demand. Instead of guessing how many servers or how much memory you need, Kubernetes monitors your traffic and “flexes” the infrastructure in real-time.

There are three main “layers” of autoscaling. Think of them as a chain: if one layer can’t handle the load, the next one kicks in.


1. Horizontal Pod Autoscaler (HPA)

The Concept: Adding more “lanes” to the highway.

HPA is the most common form of scaling. It increases or decreases the number of pod replicas based on metrics like CPU usage, memory, or custom traffic data.

  • How it works: It checks your pods every 15 seconds. If the average CPU across all pods is above your target (e.g., 70%), it tells the Deployment to spin up more pods.
  • Best for: Stateless services like web APIs or microservices that can handle traffic by simply having more copies running.

2. Vertical Pod Autoscaler (VPA)

The Concept: Making the “cars” bigger.

VPA doesn’t add more pods; instead, it looks at a single pod and decides if it needs more CPU or Memory. It “right-sizes” your containers.

  • How it works: It observes your app’s actual usage over time. If a pod is constantly hitting its memory limit, VPA will recommend (or automatically apply) a higher limit.
  • The Catch: Currently, in most versions of Kubernetes, changing a pod’s size requires restarting the pod.
  • Best for: Stateful apps (like databases) that can’t easily be “split” into multiple copies, or apps where you aren’t sure what the resource limits should be.

3. Cluster Autoscaler (CA)

The Concept: Adding more “pavement” to the highway.

HPA and VPA scale Pods, but eventually, you will run out of physical space on your worker nodes (VMs). This is where the Cluster Autoscaler comes in.

  • How it works: It watches for “Pending” pods—pods that want to run but can’t because no node has enough free CPU/RAM. When it sees this, it calls your cloud provider (AWS, Azure, GCP) and asks for a new VM to be added to the cluster.
  • Downscaling: It also watches for underutilized nodes. If a node is mostly empty, it will move those pods elsewhere and delete the node to save money.

The “Scaling Chain” in Action

Imagine a sudden surge of users hits your website:

  1. HPA sees high CPU usage and creates 10 new Pods.
  2. The cluster is full, so those 10 Pods stay in Pending status.
  3. Cluster Autoscaler sees the Pending pods and provisions 2 new Worker Nodes.
  4. The Pods finally land on the new nodes, and your website stays online.

Comparison Summary

FeatureHPAVPACluster Autoscaler
What it scalesNumber of PodsSize of Pods (CPU/RAM)Number of Nodes (VMs)
Primary GoalHandle traffic spikesOptimize resource efficiencyProvide hardware capacity
ImpactFast, no downtimeUsually requires pod restartSlower (minutes to boot VM)

Pro-Tip: Never run HPA and VPA on the same metric (like CPU) for the same app. They will “fight” each other—HPA will try to add pods while VPA tries to make them bigger, leading to a “flapping” state where your app is constantly restarting.

To set up a Horizontal Pod Autoscaler (HPA), you need two things: a Deployment (your app) and an HPA resource that watches it.

Here is a breakdown of how to configure this in a way that actually works.

1. The Deployment

First, your pods must have resources.requests defined. If the HPA doesn’t know how much CPU a pod should use, it can’t calculate the percentage.

YAML

apiVersion: apps/v1
kind: Deployment
metadata:
name: php-apache
spec:
selector:
matchLabels:
run: php-apache
replicas: 1
template:
metadata:
labels:
run: php-apache
spec:
containers:
- name: php-apache
image: registry.k8s.io/hpa-example
ports:
- containerPort: 80
resources:
limits:
cpu: 500m
requests:
cpu: 200m # HPA uses this as the baseline

2. The HPA Resource

This YAML tells Kubernetes: “Keep the average CPU usage of these pods at 50%. If it goes higher, spin up more pods (up to 10). If it goes lower, scale back down to 1.”

YAML

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: php-apache-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php-apache
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50

3. How to Apply and Test

You can apply these using oc apply -f <filename>.yaml (in OpenShift) or kubectl apply.

Once applied, you can watch the autoscaler in real-time:

  • View status: oc get hpa
  • Watch it live: oc get hpa php-apache-hpa --watch

The Calculation Logic:

The HPA uses a specific formula to decide how many replicas to run:

$$\text{Desired Replicas} = \lceil \text{Current Replicas} \times \frac{\text{Current Metric Value}}{\text{Desired Metric Value}} \rceil$$

Quick Tip: If you are using OpenShift, you can also do this instantly via the CLI without a YAML file:

oc autoscale deployment/php-apache --cpu-percent=50 --min=1 --max=10

To make your autoscaling more robust, you can combine CPU and Memory metrics in a single HPA. Kubernetes will look at both and scale based on whichever one hits the limit first.

Here is the updated YAML including both resource types and a “Scale Down” stabilization period to prevent your cluster from “flapping” (rapidly adding and removing pods).

1. Advanced HPA YAML (CPU + Memory)

YAML

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: advanced-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: advanced-app
minReplicas: 2
maxReplicas: 15
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleDown:
stabilizationWindowSeconds: 300 # Wait 5 mins before scaling down to ensure traffic is actually gone
policies:
- type: Percent
value: 10
periodSeconds: 60

2. Scaling on Custom Metrics (e.g., HTTP Requests)

Sometimes CPU doesn’t tell the whole story. If your app is waiting on a database, CPU might stay low while users experience lag. In these cases, you can scale based on Requests Per Second (RPS).

To use this, you must have the Prometheus Adapter installed (which comes standard in OpenShift’s monitoring stack).

YAML

  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: 500 # Scale up if pods average more than 500 requests/sec


Pro-Tips for Memory Scaling

  1. Memory is “Sticky”: Unlike CPU, which drops the moment a process finishes, many runtimes (like Java/JVM or Node.js) do not immediately release memory back to the OS.
  2. The Danger: If your app doesn’t have a good Garbage Collector configuration, the HPA might see high memory usage, spin up 10 pods, and never scale back down because the memory stays “reserved” by the app.
  3. The Fix: Always ensure your memory.requests in the Deployment are set to what the app actually needs to start, not its peak limit.

Summary Table: Which metric to use?

ScenarioRecommended MetricWhy?
Calculation heavyCPUDirectly maps to processing power.
Caching/Large DataMemoryPrevents OOM (Out of Memory) kills.
Web APIsRequests Per SecondScaled based on actual user load.
Message QueueQueue DepthScales based on “work to be done.”

When an HPA isn’t behaving as expected—maybe it’s not scaling up during a spike, or it’s “stuck” at the minimum replicas—you need to look at the Controller Manager’s internal logic.

Here is how you can perform a “health check” on your HPA’s decision-making process.


1. The “Describe” Command (Most Useful)

The describe command provides a chronological log of every scaling action and, more importantly, why a request failed.

Bash

oc describe hpa advanced-app-hpa

What to look for in the “Events” section:

  • SuccessfulRescale: The HPA successfully changed the replica count.
  • FailedComputeMetricsReplicas: Usually means the HPA can’t talk to the Metrics Server (check if your pods have resources.requests defined!).
  • FailedGetResourceMetric: The pods might be crashing or “Unready,” so the HPA can’t pull their CPU/Memory usage.

2. Checking the “Conditions”

In the output of the describe command, look for the Conditions section. It tells you the current “brain state” of the autoscaler:

ConditionStatusMeaning
AbleToScaleTrueThe HPA is healthy and can talk to the Deployment.
ScalingActiveTrueMetrics are being received and scaling logic is running.
ScalingLimitedTrueWarning: You’ve hit your maxReplicas or minReplicas. It wants to scale further but you’ve capped it.

3. Real-time Metric Monitoring

If you want to see exactly what numbers the HPA is seeing right now compared to your target, use:

Bash

oc get hpa advanced-app-hpa -w

Example Output:

Plaintext

NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
advanced-app-hpa Deployment/advanced-app 75%/60%, 40%/80% 2 15 5 10m

In this example, CPU is at 75% (above the 60% target), so it has already scaled to 5 replicas.


4. Debugging Common “Stuck” Scenarios

Scenario A: “Target: “

If the TARGETS column shows <unknown>, it almost always means:

  1. Missing Requests: You forgot to set resources.requests in your Deployment YAML.
  2. Metrics Server Down: The cluster-wide metrics service is having issues.
  3. Labels Mismatch: The HPA selector doesn’t match the Deployment labels.

Scenario B: High CPU but No Scaling

Check if the pods are in a Ready state. HPA ignores “Unready” pods to prevent scaling up based on the high CPU usage often seen during a container’s startup/boot phase.


Pro-Tip: The “Cooldown” Period

If you just stopped a load test and the pods are still running, don’t panic! By default, Kubernetes has a 5-minute stabilization window for scaling down. This prevents the “Flapping” effect where pods are deleted and then immediately recreated because of a small traffic blip.

MCP + Kubernetes Management

Here’s a breakdown of this topic across all three domains:


MCP + Kubernetes Management

What it looks like: An LLM agent connects to a Kubernetes MCP server that exposes kubectl operations as tools. The agent can then:

  • list_pods(namespace) → find failing pods
  • get_pod_logs(pod, namespace) → fetch logs
  • describe_deployment(name) → inspect rollout status
  • scale_deployment(name, replicas) → auto-scale
  • apply_manifest(yaml) → deploy changes

Real implementations:

  • kubectl-ai — natural language to kubectl commands
  • Robusta — AI-powered Kubernetes troubleshooting with MCP support
  • k8s-mcp-server — open-source MCP server wrapping the Kubernetes API
  • OpenShift + ACM — Red Hat is building AI-assisted cluster management leveraging MCP for tool standardization

Example agent workflow:

User: “Why is the payments service degraded?”

Agent →  list_pods(namespace=”payments”)

      →  get_pod_logs(pod=”payments-7f9b”, tail=100)

      →  describe_deployment(“payments”)

      →  LLM reasons: “OOMKilled — memory limit too low”

      →  Proposes: patch_deployment(memory_limit=”1Gi”)

      →  HITL: “Approve this change?” → Engineer approves

      →  apply_patch() → monitors rollout → confirms healthy


MCP + Terraform Pipelines

What it looks like: A Terraform MCP server exposes infrastructure operations. The agent can plan, review, and apply infrastructure changes conversationally.

MCP tools exposed:

  • terraform_plan(module, vars) → generate and review a plan
  • terraform_apply(plan_id) → apply approved changes
  • terraform_state_show(resource) → inspect current state
  • terraform_output(name) → read output values
  • detect_drift() → compare actual vs declared state

Key use cases:

  • Drift detection agent: continuously checks for infrastructure drift and auto-raises PRs to correct it
  • Cost optimization agent: analyzes Terraform state, identifies oversized resources, proposes rightsizing
  • Compliance agent: scans Terraform plans against OPA/Sentinel policies before apply
  • PR review agent: reviews Terraform PRs, flags security misconfigs, suggests improvements

Example pipeline:

PR opened with Terraform changes

       │

       ▼

MCP Terraform Agent

  ├── terraform_plan() → generates plan

  ├── scan_security(plan) → checks for open security groups, no encryption

  ├── estimate_cost(plan) → computes monthly cost delta

  ├── LLM summarizes: “This adds an unencrypted S3 bucket costing ~$12/mo”

  └── Posts review comment to PR with findings + recommendations


📊MCP + Infrastructure Observability

What it looks like: Observability tools (Prometheus, Grafana, Loki, Datadog) are wrapped as MCP servers. The agent queries them in natural language and correlates signals across tools autonomously.

MCP tools exposed:

  • query_prometheus(promql, time_range) → fetch metrics
  • search_logs(query, service, time_range) → Loki/Elasticsearch
  • get_traces(service, error_only) → Jaeger/Tempo
  • list_active_alerts() → current firing alerts
  • get_dashboard(name) → Grafana snapshot
  • create_annotation(text, time) → mark events on dashboards

Key use cases:

  • Natural language observability: “Show me error rate for the checkout service in the last 30 mins” — no PromQL needed
  • Automated RCA: agent correlates metrics + logs + traces to pinpoint root cause
  • Alert noise reduction: agent groups related alerts, suppresses duplicates, and writes a single incident summary
  • Capacity planning: agent queries historical metrics, detects trends, forecasts when resources will be exhausted

🔗 How MCP Ties It All Together

The power of MCP is that a single agent can hold tools from all three domains simultaneously:

┌─────────────────────────────────────────────────────┐

│                   LLM Agent                         │

│              (Claude / GPT-4o)                      │

└────────────────────┬────────────────────────────────┘

                     │ MCP

        ┌────────────┼────────────┐

        ▼            ▼            ▼

┌──────────────┐ ┌──────────┐ ┌──────────────────┐

│  Kubernetes  │ │Terraform │ │  Observability   │

│  MCP Server  │ │ MCP Server│ │   MCP Server     │

│  (kubectl,   │ │(plan,    │ │(Prometheus, Loki,│

│   Helm, ACM) │ │ apply,   │ │ Grafana, Jaeger) │

└──────────────┘ │ drift)   │ └──────────────────┘

                 └──────────┘

End-to-end scenario:

  1. Observability MCP detects CPU spike on node pool
  2. Agent queries Terraform MCP → finds node group is at max capacity
  3. Agent queries Kubernetes MCP → confirms pods are pending due to insufficient nodes
  4. Agent generates Terraform plan to scale node group from 3→5 nodes
  5. HITL approval → Terraform apply → Kubernetes confirms new nodes joined
  6. Agent posts incident summary to Slack with full audit trail

API – response time

Here are fast, reliable ways to measure client-side API response time (and break it down) — from your laptop or from an EKS pod.

1) One-shot timing (curl)

This prints DNS, TCP, TLS, TTFB, and Total in one go:

curl -s -o /dev/null -w '
{ "http_code":%{http_code},
  "remote_ip":"%{remote_ip}",
  "dns":%{time_namelookup},
  "tcp":%{time_connect},
  "tls":%{time_appconnect},
  "ttfb":%{time_starttransfer},
  "total":%{time_total},
  "size":%{size_download},
  "speed":%{speed_download}
}
' https://api.example.com/path

Fields

  • dns: DNS lookup
  • tcp: TCP connect
  • tls: TLS handshake (0 if HTTP)
  • ttfb: time to first byte (request→first response byte)
  • total: full download time

2) From EKS (ephemeral pod)

Run N samples and capture a CSV:

kubectl run curl --rm -it --image=curlimages/curl:8.8.0 -- \
sh -c 'for i in $(seq 1 50); do \
  curl -s -o /dev/null -w "%{time_namelookup},%{time_connect},%{time_appconnect},%{time_starttransfer},%{time_total}\n" \
  https://api.example.com/health; \
done' > timings.csv

Open timings.csv and look at columns: dns,tcp,tls,ttfb,total. Large ttfb means slow upstream/app; big tls means handshake issues; big gap total - ttfb means payload/download time.

3) Separate proxy vs upstream (Kong in the path)

Kong adds latency headers you can read on the client:

curl -i https://api.example.com/path | sed -n 's/^\(x-kong-.*latency\): \(.*\)$/\1: \2/p'
# x-kong-proxy-latency: <ms>   (Kong → upstream start)
# x-kong-upstream-latency: <ms> (Upstream processing)

These help you see if delay is at the gateway or in the service.

4) Quick load/percentiles (pick one)

  • hey hey -z 30s -c 20 https://api.example.com/path
  • vegeta echo "GET https://api.example.com/path" | vegeta attack -rate=20 -duration=30s | vegeta report
  • k6 (scriptable) // save as test.js import http from 'k6/http'; import { check } from 'k6'; export const options = { vus: 20, duration: '30s', thresholds: { http_req_duration: ['p(95)<300'] } }; export default () => { const r = http.get('https://api.example.com/path'); check(r, { '200': (res)=>res.status===200 }); }; Run: k6 run test.js

5) App-level timers (optional)

Add a Server-Timing header from the API to expose your own phase timings (DB, cache, etc.). Then the client can read those headers to correlate.

6) Common gotchas

  • Proxies can add latency; test both with and without proxy (NO_PROXY / --proxy).
  • Auth: measure with real headers/tokens; 401/403 will skew.
  • SNI/Host: if hitting by IP, use --resolve host:443:IP -H "Host: host" so cert/routing is correct.
  • Warmup: discard first few samples (JIT, caches, TLS session reuse).

If you want, share a few curl -w outputs from local vs EKS and I’ll pinpoint where the time is going (DNS/TLS/TTFB/payload).