While Kubernetes Networking handles the “plumbing” (getting a packet from A to B), a Service Mesh handles the “logic” of that communication.
As you move from 5 microservices to 500, managing things like retries, security, and tracking where a request went becomes impossible to code into every app. A Service Mesh pulls that logic out of your code and moves it into the infrastructure.
1. The Core Concept: The Sidecar Pattern
A Service Mesh doesn’t change your application code. Instead, it “injects” a tiny, high-performance proxy (usually Envoy) into every Pod. This is called a Sidecar.
Before Service Mesh: App A talks directly to App B. If the connection fails, App A’s code must decide whether to retry.
With Service Mesh: App A talks to its local Sidecar. The Sidecar talks to App B’s Sidecar. The Sidecars handle the encryption, retries, and logging automatically.
2. Architecture: Data Plane vs. Control Plane
A Service Mesh is split into two functional parts:
The Data Plane: The collection of all the Sidecar proxies. They do the “heavy lifting” of intercepting every single packet moving between services.
The Control Plane: The “brain” (e.g., Istiod in Istio). It doesn’t touch the packets. Instead, it provides the UI and API for the administrator to say, “Ensure all traffic is encrypted,” and then it pushes those rules out to all the Sidecars.
3. What does it actually do? (The “Big Three”)
Feature
What it solves
Observability
Provides a “Service Graph” showing exactly which apps are talking, their error rates, and latency without adding a single line of logging code.
Security (mTLS)
Automatically encrypts all internal traffic. Even if a hacker gets into the network, they can’t “sniff” the data because every pod has its own unique certificate.
Traffic Control
Allows for Canary Deployments (e.g., “Send 5% of users to version 2.0, and 95% to version 1.0”) and Circuit Breaking (stopping traffic to a failing pod so it doesn’t crash the whole system).
4. Service Mesh vs. Ingress
This is a very common point of confusion.
Ingress (North-South): Manages traffic coming into the cluster from the outside world (the Internet).
Service Mesh (East-West): Manages traffic moving between services already inside the cluster.
Note: Some modern meshes (like Istio) can actually do both, replacing your traditional Ingress Controller with an “Ingress Gateway.”
5. Popular Service Mesh Tools
Istio: The most powerful and feature-rich, but also the most complex to manage. (Used by OpenShift as “OpenShift Service Mesh”).
Linkerd: Focused on being incredibly fast, light, and “just working” out of the box.
Consul: Great if you have a mix of Kubernetes and traditional Virtual Machines (VMs) that need to talk to each other.
When do you actually need one?
You probably don’t need a Service Mesh if you only have 2 or 3 services. The “Service Mesh Tax” is the extra CPU and Memory each sidecar consumes. You should consider a mesh when:
You have a strict security requirement for encryption (mTLS) everywhere.
You are struggling to debug “where the latency is” in a long chain of microservices.
You need advanced deployment strategies like A/B testing or Canary releases.
Since Mutual TLS (mTLS) is the “killer feature” of most service meshes, let’s look at how it turns your cluster into a fortress without you changing a single line of code.
1. How mTLS Works (The “Identity Card” Method)
In a standard cluster, Pod A talks to Pod B over plain text. If a hacker intercepts the traffic, they see everything. With a Service Mesh, the Control Plane acts as a Certificate Authority (CA).
Identity: Every Pod is issued a unique SVID (SPIFFE Verifiable Identity Document) in the form of a certificate.
The Handshake: When Pod A tries to talk to Pod B, their Sidecars (Envoy) step in.
Mutual Trust: Pod A presents its “ID card” to Pod B, and Pod B presents its “ID card” to Pod A. They both verify that the certificates are valid and signed by the cluster’s CA.
Encryption: Once they trust each other, they create an encrypted tunnel. The actual applications (the “App Containers”) still think they are talking plain HTTP, but the “wire” between them is fully encrypted.
2. The Service Graph (Visualizing the Mesh)
One of the coolest parts of using a service mesh (like Istio in OpenShift) is Kiali. Kiali is a management console that draws a real-time map of your microservices.
Green Lines: Traffic is flowing perfectly (200 OK).
Red/Yellow Lines: High error rates (500 errors) or high latency.
Circles: These represent your versions. You can see, for example, that v1 is getting 90% of traffic and v2 is getting 10%.
3. Traffic Shifting (The “Canary” Deployment)
Imagine you have a new version of your Backend (v2) and you’re nervous about a full rollout. Instead of a “Big Bang” update, you can use a VirtualService to shift traffic slowly.
The Logic:
Send 90% of traffic to v1.
Send 10% of traffic to v2.
If the error rate for v2 stays low, move to 50/50, then 100%.
4. Summary: Why do developers love (and hate) Service Mesh?
Pros
Cons
Security: Instant encryption for compliance (HIPAA/PCI).
Complexity: It’s “another thing” to manage and debug.
Retries: The mesh handles timeouts and retries automatically.
Resource Heavy: Each sidecar takes ~50MB RAM and some CPU.
Traceability: You can see exactly where a request slowed down.
Latency: Every packet has to hop through two proxies (Sidecars).
Is it right for you?
If you are running OpenShift, the “OpenShift Service Mesh” (based on Istio) is built-in and supported. It’s the easiest way to try it out because the installation is handled by an Operator.
In OpenShift (and standard Istio), “Sidecar Injection” is the magic moment where your Pod stops being a single container and becomes part of the Mesh. Instead of manually adding the proxy to your YAML, the Control Plane watches for new Pods and “mutates” them on the fly.
Here is how to enable it and verify it’s working.
1. Label the Namespace
The Service Mesh is polite; it won’t touch your Pods unless you ask it to. You must label your namespace to tell the Sidecar Injector to pay attention.
Bash
# Replace 'my-app-project' with your namespace
oc label namespace my-app-project istio-injection=enabled
Note for OpenShift Users: If you are using the official OpenShift Service Mesh (OSSM), you instead add your namespace to a resource called a ServiceMeshMemberRoll (SMMR) in the istio-system namespace.
2. Trigger the Injection
The Mesh cannot inject a sidecar into a Pod that is already running. You must restart your pods so the “Mutating Admission Webhook” can intercept the creation process.
Once the pods restart, look at the READY column in your pod list. This is the clearest sign of a Service Mesh in action.
Bash
oc get pods -n my-app-project
What you should see:
Plaintext
NAME READY STATUS RESTARTS AGE
frontend-7f8d9b6d-x4z2 2/2 Running 0 30s
1/1: Standard Kubernetes (Just your app).
2/2: Service Mesh Active (Your app + the Envoy Proxy sidecar).
4. Inspecting the Sidecar
If you want to see what is actually inside that second container, you can describe the pod:
Bash
oc describe pod <pod-name>
You will see a new container named istio-proxy. This container is the one handling all the mTLS encryption, telemetry, and routing rules we discussed.
5. Troubleshooting Injection
If your pod still says 1/1 after a restart, check these three things:
Labels: Ensure the namespace label istio-injection=enabled is exactly right.
Resource Requests: If your cluster is very low on memory, the sidecar might fail to start (it usually needs about 50Mi to 128Mi of RAM).
Privileged Pods: In OpenShift, pods running as root or with high privileges sometimes have security policies (SCCs) that conflict with the sidecar’s network interception.
Kubernetes networking is often considered the most complex part of the system because it operates on a “flat network” model. In a traditional setup, you might worry about port conflicts or how to reach a specific VM. In Kubernetes, every Pod gets its own unique IP address, and every Pod can talk to every other Pod without NAT (Network Address Translation).
To understand how traffic flows, we break it down into four distinct “layers” of communication.
1. Pod-to-Pod Communication
Every Pod in a cluster has its own internal IP. Kubernetes mandates that Pods on one node must be able to reach Pods on another node without any special configuration.
The Container Network Interface (CNI): This is the plugin (like Calico, Cilium, or OpenShift SDN) that actually builds the “pipes” between nodes.
The Experience: From the perspective of a container, it feels like it’s on a standard Ethernet network. It doesn’t care if the target Pod is on the same physical server or one across the data center.
2. Pod-to-Service Communication
Pods are “ephemeral”—they die and get replaced constantly, and their IP addresses change every time. You can’t hardcode a Pod IP into your app.
The Service: A Service is a stable “virtual IP” (ClusterIP) that sits in front of a group of Pods.
Kube-Proxy: This is a process running on every node that watches the API server. When you try to hit a Service IP, kube-proxy intercepts that traffic and redirects it to one of the healthy backend Pods.
CoreDNS: Kubernetes includes a built-in DNS service. Instead of an IP, your app just connects to http://my-database-service.
3. External-to-Service (Ingress & Egress)
How do people outside the cluster reach your apps?
NodePort: Opens a specific port (30000–32767) on every single node. High maintenance and insecure for production.
LoadBalancer: Tells your cloud provider (AWS/Azure/GCP) to spin up a physical Load Balancer that points to your nodes.
Ingress/Route: As we discussed earlier, this is a Layer 7 “smart router” that lets you use hostnames (like api.example.com) to direct traffic to different services using a single entry point.
4. Network Policies (The Firewall)
By default, Kubernetes is “open”—any Pod can talk to any other Pod. In production, you usually want to restrict this for security.
NetworkPolicy: These are the firewall rules of Kubernetes. You can define rules like: “Only the ‘Frontend’ Pod is allowed to talk to the ‘Database’ Pod on port 5432.”
Implementation: These rules are enforced by the CNI (e.g., Calico or Cilium) at the packet level.
Summary of Components
Component
Level
Purpose
CNI
Layer 3 (IP)
Connects Pods across different nodes.
Service
Layer 4 (TCP/UDP)
Provides a stable IP/DNS for a group of Pods.
Ingress
Layer 7 (HTTP)
Handles routing, SSL, and hostnames.
NetworkPolicy
Security
Defines which Pods are allowed to talk to each other.
The Path of a Packet
If a user types example.com into their browser:
DNS resolves the name to the Ingress Controller’s IP.
The Ingress Controller looks at the host header and finds the correct Service.
The Service (via kube-proxy) picks a healthy Pod and sends the traffic there.
The CNI routes that packet across the internal network to the node where the Pod is living.
To understand how data actually moves between nodes, we have to look at the CNI (Container Network Interface).
Since every Pod has a unique IP but lives on a host (Node) with its own separate IP, the CNI’s job is to “tunnel” the Pod’s traffic across the physical network. The two most common ways it does this are Encapsulation (Overlay) and Direct Routing.
1. Encapsulation (The “Envelope” Method)
This is the most common approach (used by Flannel (VXLAN) and OpenShift SDN). It treats the physical network as a “carrier” for a private, virtual network.
How it works: When Pod A (on Node 1) sends a packet to Pod B (on Node 2), the CNI takes that entire packet and wraps it inside a new UDP packet.
The “Outer” Header: Points from Node 1’s IP to Node 2’s IP.
The “Inner” Header: Points from Pod A’s IP to Pod B’s IP.
Arrival: When the packet hits Node 2, the CNI “unwraps” the outer envelope and delivers the original inner packet to Pod B.
The Downside: This adds a small amount of overhead (usually about 50 bytes per packet) because of the extra headers. This is why you often see the MTU (Maximum Transmission Unit) set slightly lower in Kubernetes (e.g., 1450 instead of 1500).
2. Direct Routing (The “BGP” Method)
Used by Calico (in non-overlay mode) and Cilium, this method avoids the “envelope” entirely for better performance.
How it works: The nodes act like standard network routers. They use BGP (Border Gateway Protocol) to tell each other: “Hey, if you want to reach the 10.244.1.0/24 subnet, send those packets to me (Node 1).”
The Experience: Packets travel “naked” across the wire with no extra headers.
The Requirement: Your physical network routers must be able to handle these extra routes, or the nodes must all be on the same Layer 2 segment (the same VLAN/Switch).
3. The Role of the “veth” Pair
Regardless of how the data moves between nodes, getting data out of a container uses a Virtual Ethernet (veth) pair.
Think of a veth pair as a virtual patch cable:
One end is plugged into the container (usually named eth0).
The other end is plugged into the host’s network namespace (often named something like vethabc123).
The host end is usually connected to a Bridge (like cni0 or br0), which acts like a virtual switch for all Pods on that specific node.
4. Comparing Popular CNIs
CNI Plugin
Primary Method
Best Use Case
Flannel
VXLAN (Overlay)
Simple clusters; works almost anywhere.
Calico
BGP or IP-in-IP
High performance and advanced Network Policies.
Cilium
eBPF
Next-gen performance, deep security, and observability.
OpenShift SDN
VXLAN
Default for older OCP; very stable and integrated.
OVN-Kubernetes
Geneve (Overlay)
Modern OCP default; supports massive scale and Windows nodes.
Which one should you choose?
If you are on OpenShift 4.x, you are likely using OVN-Kubernetes. It’s powerful and handles complex routing for you.
If you are building a Vanilla K8s cluster and want the absolute fastest networking, Cilium is the current industry gold standard because it uses eBPF to bypass parts of the Linux kernel entirely.
To understand why eBPF (Extended Berkeley Packet Filter) is the “holy grail” of modern Kubernetes networking, we first have to look at how the “old way” (iptables) works.
1. The Old Way: iptables (The “List of Rules”)
For years, Kubernetes used iptables (a standard Linux kernel feature) to route traffic.
How it works: Imagine a giant printed list of instructions. Every time a packet arrives, the CPU has to read the list from top to bottom: “Is it for Service A? No. Service B? No. Service C? Yes.”
The Problem: As you add more Services and Pods, this list grows to thousands of lines. If a packet is destined for the 5,000th service on the list, the CPU has to perform 5,000 checks for every single packet.
Result: High latency and high CPU “overhead” just to move data.
2. The New Way: eBPF (The “Direct Shortcut”)
eBPF allows you to run small, sandboxed programs directly inside the Linux kernel without changing the kernel code.
How it works: Instead of a long list of rules, eBPF creates a high-speed “lookup table” (a Hash Map) in the kernel’s memory.
The Benefit: When a packet arrives, the eBPF program looks at the destination and instantly knows where it goes. It doesn’t matter if you have 10 services or 10,000—the lookup time is exactly the same (O(1) complexity).
Bypassing the Stack: eBPF can catch a packet the moment it hits the Network Interface Card (NIC) and send it straight to the Pod, bypassing almost the entire Linux networking stack.
3. Why Cilium + eBPF is a Game Changer
Cilium is the most popular CNI that uses eBPF. It provides three massive advantages over traditional networking:
Feature
iptables / Standard CNI
Cilium (eBPF)
Performance
Slows down as the cluster grows.
Consistently fast at any scale.
Observability
Hard to see “who is talking to who” without sidecars.
Hubble (Cilium’s UI) shows every flow, drop, and latency in real-time.
Security
IP-based filtering (hard to manage).
Identity-based filtering. It knows a packet belongs to “Service-Frontend” regardless of its IP.
4. Why OpenShift is Moving to OVN (Geneve)
While Cilium is the “shiny new toy,” Red Hat chose OVN-Kubernetes (using the Geneve protocol) as the default for OCP 4.
Scale: OVN is built on Open vSwitch, which was designed for massive telco-grade clouds.
Feature Parity: It handles complex things like “Egress IPs” (giving a specific namespace a static IP for exiting the cluster) and Hybrid networking (Linux + Windows nodes) much more maturely than basic eBPF implementations did a few years ago.
Summary: The “Speed” Evolution
iptables: Standard, but slow at scale.
IPVS: A middle ground that uses hash tables but is still tied to the old kernel stack.
eBPF (Cilium): The fastest possible way to move data in Linux today.
In OpenShift, the modern way to see these network flows is through the Network Observability Operator. This tool uses the eBPF technology we discussed to capture traffic data without slowing down your pods.
Here is how you can access and use these views.
1. Using the Web Console (The GUI Way)
Once the operator is installed, a new menu appears in your OpenShift Console.
Navigate to Observe -> Network Traffic in the Administrator perspective.
Overview Tab: This gives you a high-level “Sankey” diagram or graph showing which namespaces are talking to each other. It’s perfect for spotting “top talkers” (apps using the most bandwidth).
Traffic Flows Tab: This is like a “Wireshark-lite” for your cluster. You can see every individual connection, including:
Source/Destination: Which pod is talking to which service.
Byte Rate: How much data is moving.
RTT (Latency): Exactly how many milliseconds a packet takes to travel between pods.
Topology Tab: This provides a visual map of your network. You can group by “Namespace” or “Node” to see how traffic crosses physical boundaries.
2. Using the CLI (The “oc netobserv” plugin)
If you prefer the terminal, there is a specific plugin called oc netobserv. This is incredibly useful for live debugging when you don’t want to leave your shell.
Capture live flows:
Bash
oc netobserv flows --protocol=TCP --port=80
This will stream live traffic data directly to your terminal.
Filter for specific issues:
You can filter by namespace or even look for dropped packets (great for debugging firewall/NetworkPolicy issues):
Behind the scenes, the Network Observability Operator stores these flows in Loki (a log aggregation system). This allows you to “go back in time.”
If an application crashed at 2:00 AM, you can go to the Network Traffic page, set the time filter to 2:00 AM, and see if there was a sudden spike in traffic or if a connection was being blocked by a security policy at that exact moment.
4. Advanced Debugging: Packet Drops
One of the best features of the eBPF-based observer is Packet Drop tracking. Traditional tools often can’t tell you why a packet disappeared. With this tool, the kernel can report the exact reason:
Filter Drop: A NetworkPolicy blocked it.
TCP Timeout: The other side didn’t respond.
Congestion: The network interface was overloaded.
Summary: What can you find?
Security: Is my database receiving traffic from an unauthorized namespace?
Performance: Which pods have the highest latency (RTT)?
Cost: Which services are sending the most data to external (Internet) IPs?
In Kubernetes, a NetworkPolicy is your cluster’s internal firewall. By default, Kubernetes has a “non-isolated” policy—meaning every pod can talk to every other pod.
To secure your app, you should follow the “Principle of Least Privilege”: block everything, then specifically allow only what is necessary.
1. The “Default Deny” (The Foundation)
Before you write specific rules, it is a best practice to create a “Default Deny” policy for your namespace. This locks all doors so that nothing can enter or leave unless you explicitly say so.
YAML
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: my-secure-app
spec:
podSelector: {} # Matches all pods in this namespace
policyTypes:
- Ingress
- Egress
2. Allowing Specific Traffic (The “Rule”)
Now that everything is blocked, let’s say you have a Database pod and you only want your Frontend pod to talk to it on port 5432.
YAML
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-frontend-to-db
namespace: my-secure-app
spec:
podSelector:
matchLabels:
app: database # This policy applies to pods labeled 'app: database'
policyTypes:
- Ingress
parent:
ingress:
- from:
- podSelector:
matchLabels:
role: frontend # Only allow pods labeled 'role: frontend'
ports:
- protocol: TCP
port: 5432
3. Three Ways to Target Traffic
You can control traffic based on three different criteria:
podSelector: Target pods within the same namespace (e.g., “Frontend to Backend”).
namespaceSelector: Target entire namespaces (e.g., “Allow everything from the ‘Monitoring’ namespace”).
ipBlock: Target specific IP ranges outside the cluster (e.g., “Allow traffic from our corporate VPN range 10.0.0.0/24”).
4. Troubleshooting NetworkPolicies
If you apply a policy and your app stops working, here is how to debug:
Check Labels: NetworkPolicies rely 100% on labels. If your Frontend pod is labeled app: front-end but your policy looks for role: frontend, it will fail silently.
The “Blind” Policy: Standard Kubernetes doesn’t “log” when a policy blocks a packet. This is why we use the Network Observability Operator (as we discussed) to see the “Drop” events.
CNI Support: Remember, the CNI (Calico, OVN, etc.) is what actually enforces these rules. If your CNI doesn’t support NetworkPolicies (like basic Flannel), the YAML will be accepted but it won’t actually block anything!
Summary: Ingress vs. Egress
Ingress: Controls traffic coming into the pod (Who can talk to me?).
Egress: Controls traffic leaving the pod (Who can I talk to?).
A Zero Trust architecture in Kubernetes means that no pod is trusted by default. Even if a pod is inside your cluster, it shouldn’t be allowed to talk to anything else unless you specifically permit it.
In this scenario, we have a 3-tier app: Frontend, Backend, and Database.
1. The “Lockdown” (Default Deny)
First, we apply this to the entire namespace. This ensures that any new pod you deploy in the future is “secure by default” and cannot communicate until you add a rule for it.
YAML
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: my-app-stack
spec:
podSelector: {} # Matches ALL pods
policyTypes:
- Ingress
- Egress
2. Tier 1: The Frontend
The Frontend needs to receive traffic from the Internet (via the Ingress Controller) and send traffic only to the Backend.
The Backend should only accept traffic from the Frontend and is only allowed to talk to the Database.
YAML
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: backend-policy
namespace: my-app-stack
spec:
podSelector:
matchLabels:
tier: backend
policyTypes:
- Ingress
- Egress
ingress:
- from:
- podSelector:
matchLabels:
tier: frontend # ONLY accepts Frontend traffic
egress:
- to:
- podSelector:
matchLabels:
tier: database # ONLY allowed to talk to DB
4. Tier 3: The Database
The Database is the most sensitive. It should never initiate a connection (no Egress) and only accept traffic from the Backend.
YAML
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: database-policy
namespace: my-app-stack
spec:
podSelector:
matchLabels:
tier: database
policyTypes:
- Ingress
- Egress # We include Egress to ensure it's blocked by default
ingress:
- from:
- podSelector:
matchLabels:
tier: backend
ports:
- protocol: TCP
port: 5432 # Postgres port
Important: Don’t Forget DNS!
When you apply a “Default Deny” Egress policy, your pods can no longer talk to CoreDNS, which means they can’t resolve service names like http://backend-service.
To fix this, you must add one more policy to allow UDP Port 53 to the openshift-dns namespace:
Labels are everything: If you typo tier: backend as tier: back-end, the wall stays up and the app breaks.
Layered Security: Even if a hacker compromises your Frontend pod, they cannot “scan” your network or reach your Database directly; they are stuck only being able to talk to the Backend.
To test your Zero Trust setup, we will perform a “Penetration Test” from inside the cluster. We’ll use a temporary debug pod to see if our firewall rules actually block unauthorized access.
1. The “Attacker” Pod
We will spin up a temporary pod with networking tools (like curl and dig) that has no labels. Since our “Default Deny” policy targets all pods, this pod should be completely isolated the moment it starts.
Bash
# Run a temporary alpine pod
oc run network-tester --image=alpine --restart=Never -- /bin/sh -c "sleep 3600"
2. Test 1: Can an unknown pod talk to the Database?
Even if this pod is in the same namespace, it should not be able to reach the database because it doesn’t have the tier: backend label.
In Kubernetes, Ingress is an API object that acts as a “smart router” for your cluster. While a standard Service (like a LoadBalancer) simply opens a hole in the firewall for one specific app, Ingress allows you to consolidate many services behind a single entry point and route traffic based on the URL or path.
Think of it as the receptionist of an office building: instead of every employee having their own front door, everyone uses one main entrance, and the receptionist directs visitors to the correct room based on who they are looking for.
1. How Ingress Works
There are two distinct parts required to make this work:
Ingress Resource: A YAML file where you define your “rules” (e.g., “Send all traffic for myapp.com/api to the api-service“).
Ingress Controller: The actual software (like NGINX, HAProxy, or Traefik) that sits at the edge of your cluster, reads those rules, and physically moves the traffic. Kubernetes does not come with a controller by default; you must install one.
2. Key Capabilities
Ingress is much more powerful than a simple Port or LoadBalancer because it operates at Layer 7 (HTTP/HTTPS).
Host-based Routing: Route blue.example.com to the Blue Service and green.example.com to the Green Service using a single IP.
Path-based Routing: Route example.com/login to the Auth service and example.com/search to the Search service.
SSL/TLS Termination: You can handle your SSL certificates at the Ingress level so your individual application pods don’t have to deal with encryption/decryption.
Name-based Virtual Hosting: Supporting multiple domain names on the same IP address.
3. Ingress vs. LoadBalancer vs. NodePort
Choosing how to expose your app is a common point of confusion. Here is the breakdown:
Method
Best For
Pros/Cons
NodePort
Testing/Dev
Opens a high-range port (30000+) on every node. Very insecure for production.
LoadBalancer
Single Apps
Standard way to get an external IP. Expensive if you have 50 apps, as you pay for 50 cloud load balancers.
Ingress
Production
Multiple apps share one Load Balancer. Extremely cost-effective and feature-rich.
4. What about OpenShift “Routes”?
Since you were asking about OCP earlier, it’s important to note that OpenShift has its own version called a Route.
Routes were created by Red Hat before the Kubernetes Ingress standard existed.
In modern OpenShift, an Ingress is automatically converted into a Route behind the scenes.
The main difference: Routes support “F5” and more complex HAProxy features natively, while Ingress is the industry-standard “portable” way to do it.
5. Basic Ingress Example
This YAML routes traffic to two different services based on the URL path:
YAML
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: main-ingress
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
spec:
rules:
- host: myapp.example.com
http:
paths:
- path: /billing
pathType: Prefix
backend:
service:
name: billing-service
port:
number: 80
- path: /inventory
pathType: Prefix
backend:
service:
name: inventory-service
port:
number: 80
To set up HTTPS for an Ingress, you need a TLS Secret. This secret acts as a container for your SSL certificate and private key, which the Ingress Controller (like NGINX or HAProxy) uses to encrypt the traffic before it reaches your pods.
1. Create the TLS Secret
First, you need your certificate files (tls.crt and tls.key). You can create a Kubernetes Secret from these files using the CLI:
Bash
kubectl create secret tls my-app-tls \
--cert=path/to/tls.crt \
--key=path/to/tls.key
Note: If you are using OpenShift, you can often skip this manual step by using Service Serving Certificates, which allow OpenShift to generate and manage the certificates for you automatically.
2. Update the Ingress YAML
Now, you tell the Ingress resource to use that secret for a specific hostname.
YAML
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: secure-ingress
annotations:
# This annotation tells the controller to redirect HTTP to HTTPS
nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
tls:
- hosts:
- myapp.example.com
secretName: my-app-tls # This must match the secret name created in Step 1
rules:
- host: myapp.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: my-web-service
port:
number: 80
3. Automation with Cert-Manager
Manually updating certificates before they expire is a headache. Most production clusters use Cert-Manager.
Cert-Manager is an operator that talks to certificate authorities like Let’s Encrypt. You simply add an annotation to your Ingress, and Cert-Manager handles the rest:
Reach out to Let’s Encrypt to verify you own the domain.
Generate the tls.crt and tls.key.
Create the Secret for you and renew it every 90 days automatically.
Summary Checklist for HTTPS
Step
Action
1. Certificate
Obtain a CA-signed cert or use Let’s Encrypt.
2. Secret
Store the cert/key in a kind: Secret (type kubernetes.io/tls).
3. Ingress Spec
Add the tls: section to your Ingress YAML.
4. DNS
Ensure your domain points to the Ingress Controller’s IP.
To automate SSL certificates with Cert-Manager, you need a ClusterIssuer. This is a cluster-wide resource that tells Cert-Manager how to talk to a Certificate Authority (CA) like Let’s Encrypt.
Before you start, ensure the Cert-Manager Operator is installed in your cluster (in OpenShift, you can find this in the OperatorHub).
1. Create a ClusterIssuer (The “Account”)
This YAML defines your identity with Let’s Encrypt. It uses the ACME (Automated Certificate Management Environment) protocol.
YAML
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
# The ACME server address for Let's Encrypt production
# Name of a secret used to store the ACME account private key
privateKeySecretRef:
name: letsencrypt-prod-account-key
# Enable the HTTP-01 challenge provider
solvers:
- http01:
ingress:
class: nginx # Or 'openshift-default' depending on your ingress controller
2. Update your Ingress to “Request” the Cert
Once the ClusterIssuer is created, you don’t need to manually create secrets anymore. You just “tag” your Ingress with an annotation. Cert-Manager will see this, perform the challenge, and create the secret for you.
YAML
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: my-secure-app
annotations:
# THIS IS THE TRIGGER: It links the Ingress to your ClusterIssuer
secretName: app-tls-cert # Cert-Manager will create this secret automatically
rules:
- host: app.yourdomain.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: my-service
port:
number: 80
3. How to verify it’s working
After you apply the Ingress, Cert-Manager creates a Certificate object and a Challenge object. You can track the progress:
Check the certificate status:kubectl get certificate(Look for READY: True)
Check the order status (if it’s stuck):kubectl get challenges
Check the secret:kubectl get secret app-tls-cert(If this exists, your site is now HTTPS!)
Why use Let’s Encrypt?
Cost: It is 100% free.
Trust: It is recognized by all major browsers (unlike self-signed certs).
No Maintenance: Cert-Manager automatically renews the cert 30 days before it expires.
A Small Warning:
Let’s Encrypt has rate limits. If you are just testing, use the “Staging” URL (https://acme-staging-v02.api.letsencrypt.org/directory) first. Browsers will show a warning for staging certs, but you won’t get blocked for hitting limit thresholds while debugging.
When Cert-Manager fails to issue a certificate, it usually gets stuck in the Challenge phase. Let’s look at how to diagnose and fix the most common “Let’s Encrypt” roadblocks.
1. The Troubleshooting Command Chain
If your certificate isn’t appearing, follow this hierarchy to find where the “handshake” broke:
Check the Certificate status:oc get certificateIf READY is False, move to the next step.
Check the Order:oc get orderThe Order is the request sent to Let’s Encrypt. Look at the STATE column.
Check the Challenge (The most important step):oc get challengesIf a challenge exists, it means Let’s Encrypt is trying to verify your domain but can’t.
Describe the Challenge for the error message:oc describe challenge <challenge-name>
2. Common Failure Reasons
A. The “I Can’t See You” (Firewall/Network)
Let’s Encrypt uses the HTTP-01 challenge. It tries to reach http://yourdomain.com/.well-known/acme-challenge/<TOKEN>.
The Problem: Your firewall, Security Group (AWS/Azure), or OpenShift Ingress Controller is blocking Port 80.
The Fix: Ensure Port 80 is open to the public internet. Let’s Encrypt cannot verify your domain over Port 443 (HTTPS) because the certificate doesn’t exist yet!
B. DNS Record Mismatch
The Problem: Your DNS A record or CNAME for app.yourdomain.com hasn’t propagated yet or is pointing to the wrong Load Balancer IP.
The Fix: Use dig app.yourdomain.com or nslookup to ensure the domain points exactly to your Ingress Controller’s external IP.
C. Rate Limiting
The Problem: You’ve tried to issue the same certificate too many times in one week (Let’s Encrypt has a limit of 5 duplicate certs per week).
The Fix: Switch your ClusterIssuer to use the Staging URL (mentioned in the previous step) until your configuration is 100% correct, then switch back to Production.
3. Dealing with Internal/Private Clusters
If your OpenShift cluster is behind a VPN and not accessible from the public internet, the HTTP-01 challenge will always fail because Let’s Encrypt can’t “see” your pods.
The Solution: DNS-01 Challenge
Instead of a web check, Cert-Manager proves ownership by adding a temporary TXT record to your DNS provider (Route53, Cloudflare, Azure DNS).
In Kubernetes, storage is handled separately from your application’s logic. To understand Persistent Volumes (PV) and Persistent Volume Claims (PVC), it helps to use the “Electricity” analogy:
PV (The Infrastructure): This is like the power plant and the grid. It’s the actual physical storage (a disk, a cloud drive, or a network share).
PVC (The Request): This is like the power outlet in your wall. Your application “plugs in” to the PVC to get what it needs without needing to know where the power plant is.
1. Persistent Volume (PV)
A PV is a piece of storage in the cluster that has been provisioned by an administrator or by a storage class. It is a cluster-level resource (like a Node) and exists independently of any individual Pod.
Capacity: How much space is available (e.g., 5Gi, 100Gi).
Access Modes: * ReadWriteOnce (RWO): Can be mounted by one node at a time.
ReadOnlyMany (ROX): Many nodes can read it simultaneously.
ReadWriteMany (RWX): Many nodes can read and write at the same time (requires specific hardware like NFS or ODF).
Reclaim Policy: What happens to the data when you delete the PVC? (Retain it for manual cleanup or Delete it immediately).
2. Persistent Volume Claim (PVC)
A PVC is a request for storage by a user. If a Pod needs a “hard drive,” it doesn’t look for a specific disk; it creates a PVC asking for “10Gi of storage with ReadWriteOnce access.”
The “Binding” Process: Kubernetes looks at all available PVs. If it finds a PV that matches the PVC’s request, it “binds” them together.
Namespace Scoped: Unlike PVs, PVCs live inside a specific Namespace.
3. Dynamic Provisioning (StorageClasses)
In modern clusters (like OpenShift), admins don’t manually create 100 different PVs. Instead, they use a StorageClass.
The user creates a PVC.
The StorageClass notices the request.
It automatically talks to the cloud provider (AWS/Azure/GCP) to create a new disk.
It automatically creates the PV and binds it to the PVC.
4. How a Pod uses it
Once the PVC is bound to a PV, you tell your Pod to use that “outlet.”
YAML
spec:
containers:
- name: my-db
image: postgres
volumeMounts:
- mountPath: "/var/lib/postgresql/data"
name: my-storage
volumes:
- name: my-storage
persistentVolumeClaim:
claimName: task-pv-claim # This matches the name of your PVC
Summary Comparison
Feature
Persistent Volume (PV)
Persistent Volume Claim (PVC)
Who creates it?
Administrator or Storage System
Developer / Application
Scope
Cluster-wide
Namespace-specific
Analogy
The actual Hard Drive
The request for a Hard Drive
Lifecycle
Exists even if no one uses it
Tied to the application’s needs
Here is a standard YAML example for a Persistent Volume Claim (PVC).
In this scenario, we aren’t manually creating a disk. Instead, we are telling OpenShift/Kubernetes: “I need 10Gi of fast storage. Please go talk to the cloud provider or storage backend and create it for me.”
1. The PVC Definition
This is the “request” for storage.
YAML
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: dynamic-pvc-example
namespace: my-app-project
spec:
storageClassName: gp3-csi # Or 'thin', 'ocs-storagecluster-ceph-rbd', etc.
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
2. How the Binding Works
When you apply this YAML, the following chain reaction happens:
The Claim: You submit the PVC.
The Provisioner: The StorageClass (e.g., AWS EBS, Azure Disk, or OpenShift Data Foundation) sees the request.
The Asset: The storage backend creates a physical 10Gi volume.
The Volume: Kubernetes automatically creates a PersistentVolume (PV) object to represent that physical disk.
The Binding: The PVC status changes from Pending to Bound.
3. Attaching the PVC to a Pod
A PVC is useless until a Pod “claims” it. Here is how you mount that 10Gi disk into a container:
YAML
apiVersion: v1
kind: Pod
metadata:
name: storage-test-pod
spec:
containers:
- name: nginx
image: nginx
volumeMounts:
- name: my-data-volume
mountPath: /usr/share/nginx/html # Where the disk appears inside the container
volumes:
- name: my-data-volume
persistentVolumeClaim:
claimName: dynamic-pvc-example # Must match the name in the PVC YAML
Important “Gotchas” with PVCs
Access Modes: * ReadWriteOnce (RWO): Most common. If Pod A is using the disk on Node 1, Pod B cannot use it if Pod B is on Node 2.
ReadWriteMany (RWX): Required if you want multiple Pods across different nodes to share the same files (common for web servers sharing a shared uploads folder).
Expansion: Many modern StorageClasses allow you to increase the storage size in the PVC YAML after it’s created, and Kubernetes will expand the disk on the fly (provided the underlying storage supports it).
Sticky Nodes: If you use a cloud-based RWO disk (like AWS EBS), your Pod becomes “stuck” to the availability zone where that disk was created.
Checking for available StorageClasses is one of the most common tasks for an OpenShift administrator or developer. It tells you exactly what “flavors” of storage are available for your apps.
1. Using the CLI (Recommended)
Run the following command to see a list of all storage providers configured in your cluster:
Bash
oc get storageclass
(Or use the shorthand: oc get sc)
Example Output:
Plaintext
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
NAME: This is what you put in the storageClassName field of your PVC YAML.
(default): If you see this next to a name, it means any PVC that doesn’t specify a class will automatically get this one.
PROVISIONER: This tells you the underlying technology (e.g., AWS EBS, VMware vSphere, or Ceph/ODF).
RECLAIMPOLICY: * Delete: When you delete the PVC, the physical disk is also deleted.
Retain: When you delete the PVC, the physical disk stays (so you can recover data manually).
VOLUMEBINDINGMODE:
Immediate: The disk is created the second you create the PVC.
WaitForFirstConsumer: The disk isn’t created until a Pod actually tries to use it. This is smarter because it ensures the disk is created in the same Availability Zone as the Pod.
3. Inspecting a Specific StorageClass
If you want to see the “fine print” (like encryption settings or IOPS), you can look at the YAML of a specific class:
Bash
oc get sc gp3-csi -o yaml
4. Using the Web Console
If you prefer the GUI:
Log in to the OpenShift Web Console.
Ensure you are in the Administrator perspective.
Navigate to Storage -> StorageClasses.
Here, you can see all classes, which one is the default, and even create new ones using a wizard.
Pro-Tip: If your cluster has no default storage class, your PVCs will stay in a Pending state forever unless you explicitly name one in your YAML.
Autoscaling in Kubernetes is the process of automatically adjusting your resources to match the current demand. Instead of guessing how many servers or how much memory you need, Kubernetes monitors your traffic and “flexes” the infrastructure in real-time.
There are three main “layers” of autoscaling. Think of them as a chain: if one layer can’t handle the load, the next one kicks in.
1. Horizontal Pod Autoscaler (HPA)
The Concept: Adding more “lanes” to the highway.
HPA is the most common form of scaling. It increases or decreases the number of pod replicas based on metrics like CPU usage, memory, or custom traffic data.
How it works: It checks your pods every 15 seconds. If the average CPU across all pods is above your target (e.g., 70%), it tells the Deployment to spin up more pods.
Best for: Stateless services like web APIs or microservices that can handle traffic by simply having more copies running.
2. Vertical Pod Autoscaler (VPA)
The Concept: Making the “cars” bigger.
VPA doesn’t add more pods; instead, it looks at a single pod and decides if it needs more CPU or Memory. It “right-sizes” your containers.
How it works: It observes your app’s actual usage over time. If a pod is constantly hitting its memory limit, VPA will recommend (or automatically apply) a higher limit.
The Catch: Currently, in most versions of Kubernetes, changing a pod’s size requires restarting the pod.
Best for: Stateful apps (like databases) that can’t easily be “split” into multiple copies, or apps where you aren’t sure what the resource limits should be.
3. Cluster Autoscaler (CA)
The Concept: Adding more “pavement” to the highway.
HPA and VPA scale Pods, but eventually, you will run out of physical space on your worker nodes (VMs). This is where the Cluster Autoscaler comes in.
How it works: It watches for “Pending” pods—pods that want to run but can’t because no node has enough free CPU/RAM. When it sees this, it calls your cloud provider (AWS, Azure, GCP) and asks for a new VM to be added to the cluster.
Downscaling: It also watches for underutilized nodes. If a node is mostly empty, it will move those pods elsewhere and delete the node to save money.
The “Scaling Chain” in Action
Imagine a sudden surge of users hits your website:
HPA sees high CPU usage and creates 10 new Pods.
The cluster is full, so those 10 Pods stay in Pending status.
Cluster Autoscaler sees the Pending pods and provisions 2 new Worker Nodes.
The Pods finally land on the new nodes, and your website stays online.
Comparison Summary
Feature
HPA
VPA
Cluster Autoscaler
What it scales
Number of Pods
Size of Pods (CPU/RAM)
Number of Nodes (VMs)
Primary Goal
Handle traffic spikes
Optimize resource efficiency
Provide hardware capacity
Impact
Fast, no downtime
Usually requires pod restart
Slower (minutes to boot VM)
Pro-Tip: Never run HPA and VPA on the same metric (like CPU) for the same app. They will “fight” each other—HPA will try to add pods while VPA tries to make them bigger, leading to a “flapping” state where your app is constantly restarting.
To set up a Horizontal Pod Autoscaler (HPA), you need two things: a Deployment (your app) and an HPA resource that watches it.
Here is a breakdown of how to configure this in a way that actually works.
1. The Deployment
First, your pods must have resources.requests defined. If the HPA doesn’t know how much CPU a pod should use, it can’t calculate the percentage.
YAML
apiVersion: apps/v1
kind: Deployment
metadata:
name: php-apache
spec:
selector:
matchLabels:
run: php-apache
replicas: 1
template:
metadata:
labels:
run: php-apache
spec:
containers:
- name: php-apache
image: registry.k8s.io/hpa-example
ports:
- containerPort: 80
resources:
limits:
cpu: 500m
requests:
cpu: 200m # HPA uses this as the baseline
2. The HPA Resource
This YAML tells Kubernetes: “Keep the average CPU usage of these pods at 50%. If it goes higher, spin up more pods (up to 10). If it goes lower, scale back down to 1.”
YAML
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: php-apache-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php-apache
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
3. How to Apply and Test
You can apply these using oc apply -f <filename>.yaml (in OpenShift) or kubectl apply.
Once applied, you can watch the autoscaler in real-time:
View status:oc get hpa
Watch it live:oc get hpa php-apache-hpa --watch
The Calculation Logic:
The HPA uses a specific formula to decide how many replicas to run:
To make your autoscaling more robust, you can combine CPU and Memory metrics in a single HPA. Kubernetes will look at both and scale based on whichever one hits the limit first.
Here is the updated YAML including both resource types and a “Scale Down” stabilization period to prevent your cluster from “flapping” (rapidly adding and removing pods).
1. Advanced HPA YAML (CPU + Memory)
YAML
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: advanced-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: advanced-app
minReplicas: 2
maxReplicas: 15
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleDown:
stabilizationWindowSeconds: 300 # Wait 5 mins before scaling down to ensure traffic is actually gone
policies:
- type: Percent
value: 10
periodSeconds: 60
2. Scaling on Custom Metrics (e.g., HTTP Requests)
Sometimes CPU doesn’t tell the whole story. If your app is waiting on a database, CPU might stay low while users experience lag. In these cases, you can scale based on Requests Per Second (RPS).
To use this, you must have the Prometheus Adapter installed (which comes standard in OpenShift’s monitoring stack).
YAML
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: 500 # Scale up if pods average more than 500 requests/sec
Pro-Tips for Memory Scaling
Memory is “Sticky”: Unlike CPU, which drops the moment a process finishes, many runtimes (like Java/JVM or Node.js) do not immediately release memory back to the OS.
The Danger: If your app doesn’t have a good Garbage Collector configuration, the HPA might see high memory usage, spin up 10 pods, and never scale back down because the memory stays “reserved” by the app.
The Fix: Always ensure your memory.requests in the Deployment are set to what the app actually needs to start, not its peak limit.
Summary Table: Which metric to use?
Scenario
Recommended Metric
Why?
Calculation heavy
CPU
Directly maps to processing power.
Caching/Large Data
Memory
Prevents OOM (Out of Memory) kills.
Web APIs
Requests Per Second
Scaled based on actual user load.
Message Queue
Queue Depth
Scales based on “work to be done.”
When an HPA isn’t behaving as expected—maybe it’s not scaling up during a spike, or it’s “stuck” at the minimum replicas—you need to look at the Controller Manager’s internal logic.
Here is how you can perform a “health check” on your HPA’s decision-making process.
1. The “Describe” Command (Most Useful)
The describe command provides a chronological log of every scaling action and, more importantly, why a request failed.
Bash
oc describe hpa advanced-app-hpa
What to look for in the “Events” section:
SuccessfulRescale: The HPA successfully changed the replica count.
FailedComputeMetricsReplicas: Usually means the HPA can’t talk to the Metrics Server (check if your pods have resources.requests defined!).
FailedGetResourceMetric: The pods might be crashing or “Unready,” so the HPA can’t pull their CPU/Memory usage.
2. Checking the “Conditions”
In the output of the describe command, look for the Conditions section. It tells you the current “brain state” of the autoscaler:
Condition
Status
Meaning
AbleToScale
True
The HPA is healthy and can talk to the Deployment.
ScalingActive
True
Metrics are being received and scaling logic is running.
ScalingLimited
True
Warning: You’ve hit your maxReplicas or minReplicas. It wants to scale further but you’ve capped it.
3. Real-time Metric Monitoring
If you want to see exactly what numbers the HPA is seeing right now compared to your target, use:
Bash
oc get hpa advanced-app-hpa -w
Example Output:
Plaintext
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
In this example, CPU is at 75% (above the 60% target), so it has already scaled to 5 replicas.
4. Debugging Common “Stuck” Scenarios
Scenario A: “Target: “
If the TARGETS column shows <unknown>, it almost always means:
Missing Requests: You forgot to set resources.requests in your Deployment YAML.
Metrics Server Down: The cluster-wide metrics service is having issues.
Labels Mismatch: The HPA selector doesn’t match the Deployment labels.
Scenario B: High CPU but No Scaling
Check if the pods are in a Ready state. HPA ignores “Unready” pods to prevent scaling up based on the high CPU usage often seen during a container’s startup/boot phase.
Pro-Tip: The “Cooldown” Period
If you just stopped a load test and the pods are still running, don’t panic! By default, Kubernetes has a 5-minute stabilization window for scaling down. This prevents the “Flapping” effect where pods are deleted and then immediately recreated because of a small traffic blip.
Auditing a Model Context Protocol (MCP) server in 2026 requires a shift from traditional web auditing to Agentic Security Auditing. Since an LLM acts as the user of your server, you must audit not just the code, but the “instructions” and “boundaries” presented to the AI.
Here is the professional workflow for conducting a comprehensive MCP server audit.
1. Discovery & Tool Inspection
The first step is to see exactly what the AI sees. A malicious agent or a prompt injection can only exploit what is exposed in the tool definitions.
Use the MCP Inspector: Run npx @modelcontextprotocol/inspector to launch a local GUI. Connect your server and inspect the Tools tab.
Audit Tool Descriptions: Check if the descriptions are too “helpful.”
Bad: “This tool runs any bash command.”
Good: “This tool lists files in the /public directory only.”
Schema Strictness: Ensure every tool uses strict JSON Schema. AI agents are prone to “hallucinating” extra arguments; your server should reject any input that doesn’t perfectly match the schema.
2. Static Analysis (The “Code” Audit)
Since most MCP servers are written in TypeScript or Python, use standard security scanners with MCP-specific rules.
Dependency Check: Use npm audit or pip-audit. MCP is a new ecosystem; many early community servers use outdated, vulnerable libraries.
Path Traversal Check: This is the #1 vulnerability in MCP (found in 80% of filesystem-based servers).
Audit Task: Search your code for fs.readFile or open(). Ensure user-provided paths are sanitized using path.resolve and checked against a “Root” directory.
Command Injection: If your tool executes shell commands (e.g., a Git or Docker tool), ensure inputs are passed as arrays, never as strings.
Vulnerable:exec("git log " + user_input)
Secure:spawn("git", ["log", user_input])
3. Runtime & Behavioral Auditing
In 2026, we use eBPF-based monitoring or MCP Gateways to watch what the server actually does during a session.
Sandbox Verification: Run the server in a restricted Docker container. Audit the Dockerfile to ensure it runs as a non-root user (USER node or USER python).
Network Egress Audit: Does your server need to talk to the internet? If it’s a “Local File” tool, use firewall rules (or Docker network flags) to block all outgoing traffic. This prevents “Data Exfiltration” where an AI is tricked into sending your files to a remote server.
AIVSS Scoring: Use the AI Vulnerability Scoring System (AIVSS) to rank findings. A “Prompt Injection” that leads to a file read is a High; a “Prompt Injection” that leads to a shell execution is Critical.
4. The 2026 Audit Checklist
If you are performing a formal audit, ensure you can check “Yes” to all of the following:
Category
Audit Check
Authentication
Does the server require a token for every request (especially for HTTP transports)?
Sanitization
Are all LLM-generated arguments validated against a regex or allowlist?
Least Privilege
Does the server only have access to the specific folders/APIs it needs?
Human-in-Loop
Are “Write” or “Delete” actions flagged to require manual user approval in the client?
Logging
Does the server log the User ID, Tool Name, and Arguments for every call?
5. Automated Auditing Tools
To speed up the process, you can use these 2026-standard tools:
mcpserver-audit: A GitHub-hosted tool that scans MCP source code for common dangerous patterns (like unparameterized SQL or open shell calls).
Trivy / Docker Scout: For scanning the container image where your MCP server lives.
Semgrep (MCP Ruleset): Use specialized Semgrep rules designed to find “AI Injection” points in Model Context Protocol implementations.
Multi-Layered Test Plan.
To perform a professional audit of an MCP server in 2026, you should follow a Multi-Layered Test Plan. Since MCP servers act as “Resource Servers” in an agentic ecosystem, your audit must verify that a compromised or malicious AI cannot “break out” of its intended scope.
Here is a 5-step Security Test Plan for an MCP server.
1. Static Analysis: “The Code Review”
Before running the server, scan the source code for common “agent-trap” patterns.
Check for shell=True (Python) or exec() (Node.js): These are the most common entry points for Remote Code Execution (RCE).
Test: Ensure all CLI tools use argument arrays instead of string concatenation.
Path Traversal Audit: Look for any tool that takes a path or filename as an argument.
Test: Verify that the code uses path.resolve() and checks if the resulting path starts with an allowed root directory.
Common Fail: Using simple string .startsWith() without resolving symlinks first (CVE-2025-53109).
2. Manifest & Metadata Audit
The LLM “sees” your server through its JSON-RPC manifest. If your tool descriptions are vague, the LLM might misuse them.
Tool Naming: Ensure tool names use snake_case (e.g., get_user_data) for optimal tokenization and clarity.
Prompt Injection Resilience: Check if tool descriptions include “Safety instructions.”
Example: “This tool reads files. Safety: Never read files ending in .env or .pem.“
Annotations: Verify that “destructive” tools (delete, update, send) are marked with destructiveHint: true. This triggers a mandatory confirmation popup in modern MCP clients like Cursor or Claude Desktop.
3. Dynamic “Fuzzing” (The AI Stress Test)
In 2026, we use tools like mcp-sec-audit to “fuzz” the server. This involves sending nonsensical or malicious JSON-RPC payloads to see how the server reacts.
Test Scenario
Payload Example
Expected Result
Path Traversal
{"path": "../../../etc/passwd"}
403 Forbidden or Error: Invalid Path
Command Injection
{"cmd": "ls; rm -rf /"}
The server should treat ; as a literal string, not a command separator.
Resource Exhaustion
Calling read_file 100 times in 1 second.
Server should trigger Rate Limiting.
4. Sandbox & Infrastructure Audit
An MCP server should never “run naked” on your host machine.
Docker Isolation: Audit the Dockerfile. It should use a distroless or minimal image (like alpine) and a non-root user.
Network Egress: Use iptables or Docker network policies to block the MCP server from reaching the internet unless its specific function requires it (e.g., a “Web Search” tool).
Memory/CPU Limits: Ensure the container has cpus: 0.5 and memory: 512mb limits to prevent a “Looping AI” from crashing your host.
5. OAuth 2.1 & Identity Verification
If your MCP server is shared over a network (HTTP transport), it must follow the June 2025 MCP Auth Spec.
PKCE Implementation: Verify that the server requires Proof Key for Code Exchange (PKCE) for all client connections. This prevents “Authorization Code Interception.”
Scope Enforcement: If a user only authorized the read_only scope, ensure the server rejects calls to delete_record even if the token is valid.
Audit Logging: Every tool call must be logged with:
The user_id who initiated it.
The agent_id that generated the call.
The exact arguments used.
Pro-Tooling for 2026
MCP Inspector: Use npx @modelcontextprotocol/inspector for a manual “sanity check” of your tools.
Snyk / Trivy: Run these against your MCP server’s repository to catch vulnerable 3rd-party dependencies.
Would you like me to help you write a “Safety Wrapper” in Python or TypeScript that automatically validates all file paths before your MCP server processes them?
The Model Context Protocol (MCP) is a powerful “USB-C for AI,” but because it allows LLMs to execute code and access private data, it introduces unique security risks.
In 2026, security for MCP has moved beyond simple API keys to a Zero Trust architecture. Here are the best practices for securing your MCP implementation.
1. The “Human-in-the-Loop” (HITL) Requirement
The most critical defense is ensuring an AI never executes “side-effect” actions (writing, deleting, or sending data) without manual approval.
Tiered Permissions: Classify tools into read-only (safe) and sensitive (requires approval).
Explicit Confirmation: The MCP client must display the full command and all arguments to the user before execution. Never allow the AI to “hide” parameters.
“Don’t Ask Again” Risks: Avoid persistent “allowlists” for bash commands or file writes; instead, scope approvals to a single session or specific directory.
2. Secure Architecture & Isolation
Running an MCP server directly on your host machine is a major risk. If the AI is tricked into running a malicious command, it has the same permissions as you.
Containerization: Always run MCP servers in a Docker container or a WebAssembly (Wasm) runtime. This prevents “Path Traversal” attacks where an AI might try to read your ~/.ssh/ folder.
Least Privilege: Use a dedicated, unprivileged service account to run the server. If the tool only needs to read one folder, do not give it access to the entire drive.
Network Egress: Block the MCP server from accessing the public internet unless it’s strictly necessary for that tool’s function.
3. Defense Against Injection Attacks
MCP is vulnerable to Indirect Prompt Injection, where a malicious instruction is hidden inside data the AI reads (like a poisoned webpage or email).
Tool Description Sanitization: Attackers can “poison” tool descriptions to trick the AI into exfiltrating data. Regularly audit the descriptions of third-party MCP servers.
Input Validation: Treat all inputs from the LLM as untrusted. Use strict typing (Pydantic/Zod) and regex patterns to ensure the AI isn’t passing malicious flags to a bash command.
Semantic Rate Limiting: Use an MCP Gateway to kill connections if an agent attempts to call a “Read File” tool hundreds of times in a few seconds—a classic sign of data exfiltration.
4. Identity & Authentication (2026 Standards)
For remote or enterprise MCP setups, static API keys are no longer sufficient.
OAuth 2.1 + PKCE: This is the mandated standard for HTTP-based MCP. It ensures that tokens are bound to specific users and cannot be easily intercepted.
Token Scoping: Never use a single “Master Key.” Issue short-lived tokens that are scoped only to the specific MCP tools the user needs.
Separation of Roles: Keep your Authorization Server (which identifies the user) separate from your Resource Server (the MCP server). This makes auditing easier and prevents a breach of one from compromising the other.
5. Supply Chain Security
The “Rug Pull” is a common 2026 threat where a popular open-source MCP server is updated with malicious code (e.g., a BCC field added to an email tool).
Pin Versions: Never pull the latest version of an MCP server in production. Pin to a specific, audited version or hash.
Vetted Registries: Only use servers from trusted sources like the Official MCP Catalog or internally vetted company registries.
Audit Logs: Log every tool invocation, including who requested it, what the arguments were, and what the output was.
Summary Checklist for Developers
Risk
Mitigation
Data Exfiltration
Disable network access for local tools; use PII redaction.
Command Injection
Use argument arrays (parameterized) instead of shell strings.
Unauthorized Access
Implement OAuth 2.1 with scope-based tool control.
Lateral Movement
Sandbox servers in Docker/Wasm; limit filesystem access.
What it looks like: A Terraform MCP server exposes infrastructure operations. The agent can plan, review, and apply infrastructure changes conversationally.
MCP tools exposed:
terraform_plan(module, vars) → generate and review a plan
terraform_apply(plan_id) → apply approved changes
terraform_state_show(resource) → inspect current state
terraform_output(name) → read output values
detect_drift() → compare actual vs declared state
Key use cases:
Drift detection agent: continuously checks for infrastructure drift and auto-raises PRs to correct it
└── Posts review comment to PR with findings + recommendations
📊MCP + Infrastructure Observability
What it looks like: Observability tools (Prometheus, Grafana, Loki, Datadog) are wrapped as MCP servers. The agent queries them in natural language and correlates signals across tools autonomously.
Here are comprehensive Docker interview questions organized by level:
🟢 Beginner Level
Concepts
Q1: What is Docker and why is it used?
Docker is an open-source containerization platform that packages applications and their dependencies into lightweight, portable containers — ensuring they run consistently across any environment (dev, staging, production).
Q2: What is the difference between a container and a virtual machine?
Container
Virtual Machine
OS
Shares host OS kernel
Has its own OS
Size
Lightweight (MBs)
Heavy (GBs)
Startup
Seconds
Minutes
Isolation
Process-level
Full hardware-level
Performance
Near-native
Overhead
Q3: What is a Docker image vs a Docker container?
Image — A read-only blueprint/template used to create containers
Container — A running instance of an image
Q4: What is a Dockerfile?
A text file containing step-by-step instructions to build a Docker image automatically.
Q5: What is Docker Hub?
A public cloud-based registry where Docker images are stored, shared, and distributed.
Basic Commands
Q6: What are the most common Docker commands?
docker build -t myapp . # Build image
docker run -d -p 8080:80 myapp # Run container
docker ps # List running containers
docker ps -a # List all containers
docker stop <container_id> # Stop container
docker rm <container_id> # Remove container
docker images # List images
docker rmi <image_id> # Remove image
docker logs <container_id> # View logs
docker exec -it <id> /bin/bash # Enter container shell
Q7: What is the difference between CMD and ENTRYPOINT?
CMD
ENTRYPOINT
Purpose
Default command, easily overridden
Fixed command, always executes
Override
Yes, at runtime
Only with –entrypoint flag
Use case
Flexible defaults
Enforced commands
ENTRYPOINT [“python”] # always runs python
CMD [“app.py”] # default arg, can be overridden
Q8: What is the difference between COPY and ADD?
COPY — Simply copies files from host to container (preferred)
ADD — Same as COPY but also supports URLs and auto-extracts tar files
🟡 Intermediate Level
Networking
Q9: What are Docker network types?
Network
Description
Use Case
bridge
Default, isolated network
Single host containers
host
Shares host network stack
High performance needs
none
No networking
Fully isolated containers
overlay
Multi-host networking
Docker Swarm / distributed apps
docker network create my-network
docker run –network my-network myapp
Q10: How do containers communicate with each other?
Containers on the same custom bridge network can communicate using their container name as hostname.
# Both containers on same network can reach each other by name
docker run –network my-net –name db postgres
docker run –network my-net –name app myapp # app can reach “db”
Volumes & Storage
Q11: What is the difference between volumes, bind mounts, and tmpfs?
Type
Description
Use Case
Volume
Managed by Docker
Persistent data (databases)
Bind Mount
Maps host directory to container
Development, live code reload
tmpfs
Stored in memory only
Sensitive/temporary data
docker run -v myvolume:/data myapp # volume
docker run -v /host/path:/container myapp # bind mount
Q12: How do you persist data in Docker?
Use named volumes — data persists even after the container is removed.
docker volume create mydata
docker run -v mydata:/app/data myapp
Docker Compose
Q13: What is Docker Compose and when do you use it?
Docker Compose defines and runs multi-container applications using a single docker-compose.yml file.
version: “3.8”
services:
app:
build: .
ports:
– “8080:80”
depends_on:
– db
environment:
– DB_HOST=db
db:
image: postgres:15
volumes:
– pgdata:/var/lib/postgresql/data
environment:
– POSTGRES_PASSWORD=secret
volumes:
pgdata:
docker-compose up -d # Start all services
docker-compose down # Stop and remove
docker-compose logs -f # Follow logs
Q14: What is the difference between docker-compose up and docker-compose start?
up — Creates and starts containers (builds if needed)
Short answer: Yes—use the full chain (leaf + intermediates), not just the leaf. Don’t include the root CA in the chain you send.
Here’s how it applies in the three common Kong TLS cases:
Clients → Kong (mTLS client-auth)
The client must present its leaf cert + intermediate(s) during the handshake.
Kong must trust the issuing CA (configure trusted CA(s) for client verification).
If you only send the leaf, you’ll hit errors like “unable to get local issuer certificate.”
Example (client side):
# build a fullchain for the client cert (no root)
cat client.crt intermediate.crt > client-fullchain.crt
# test against Kong (mTLS)
curl --cert client-fullchain.crt --key client.key https://kong.example.com/secure
Kong → Upstream (mTLS to your backend)
In Kong, create a Certificate whose cert field is full chain (leaf + intermediates) and key is the private key.
Attach it to the service via client_certificate.
Ensure the upstream trusts the issuing CA.
Kong (DB mode, gist):
# upload cert+key (cert must be full chain)
POST /certificates
{ "cert": "<PEM fullchain>", "key": "<PEM key>" }
# bind to service
PATCH /services/{id}
{ "client_certificate": "<certificate_id>" }
Kong’s server cert (TLS termination at Kong)
Serve a full chain so browsers/clients validate without needing to have the intermediate locally.
If using Kong Ingress, put the full chain in tls.crt of the Kubernetes secret.
Quick checks & common pitfalls
Do not include the root CA in the chain you send.
Order matters:leaf first, then each intermediate in order up to (but excluding) the root.
If you see “No required SSL certificate was sent” → the client didn’t present a cert at all.
If you see “certificate verify failed” / “unable to get local issuer certificate” → chain or trust store problem (usually missing intermediate).
If you tell me which direction you’re doing mTLS (client→Kong or Kong→upstream) and how you deploy (Docker / KIC / bare-metal), I’ll give you the exact Kong config block to drop in.