Kubernetes networking

Kubernetes networking is often considered the most complex part of the system because it operates on a “flat network” model. In a traditional setup, you might worry about port conflicts or how to reach a specific VM. In Kubernetes, every Pod gets its own unique IP address, and every Pod can talk to every other Pod without NAT (Network Address Translation).

To understand how traffic flows, we break it down into four distinct “layers” of communication.


1. Pod-to-Pod Communication

Every Pod in a cluster has its own internal IP. Kubernetes mandates that Pods on one node must be able to reach Pods on another node without any special configuration.

  • The Container Network Interface (CNI): This is the plugin (like Calico, Cilium, or OpenShift SDN) that actually builds the “pipes” between nodes.
  • The Experience: From the perspective of a container, it feels like it’s on a standard Ethernet network. It doesn’t care if the target Pod is on the same physical server or one across the data center.

2. Pod-to-Service Communication

Pods are “ephemeral”—they die and get replaced constantly, and their IP addresses change every time. You can’t hardcode a Pod IP into your app.

  • The Service: A Service is a stable “virtual IP” (ClusterIP) that sits in front of a group of Pods.
  • Kube-Proxy: This is a process running on every node that watches the API server. When you try to hit a Service IP, kube-proxy intercepts that traffic and redirects it to one of the healthy backend Pods.
  • CoreDNS: Kubernetes includes a built-in DNS service. Instead of an IP, your app just connects to http://my-database-service.

3. External-to-Service (Ingress & Egress)

How do people outside the cluster reach your apps?

  • NodePort: Opens a specific port (30000–32767) on every single node. High maintenance and insecure for production.
  • LoadBalancer: Tells your cloud provider (AWS/Azure/GCP) to spin up a physical Load Balancer that points to your nodes.
  • Ingress/Route: As we discussed earlier, this is a Layer 7 “smart router” that lets you use hostnames (like api.example.com) to direct traffic to different services using a single entry point.

4. Network Policies (The Firewall)

By default, Kubernetes is “open”—any Pod can talk to any other Pod. In production, you usually want to restrict this for security.

  • NetworkPolicy: These are the firewall rules of Kubernetes. You can define rules like: “Only the ‘Frontend’ Pod is allowed to talk to the ‘Database’ Pod on port 5432.”
  • Implementation: These rules are enforced by the CNI (e.g., Calico or Cilium) at the packet level.

Summary of Components

ComponentLevelPurpose
CNILayer 3 (IP)Connects Pods across different nodes.
ServiceLayer 4 (TCP/UDP)Provides a stable IP/DNS for a group of Pods.
IngressLayer 7 (HTTP)Handles routing, SSL, and hostnames.
NetworkPolicySecurityDefines which Pods are allowed to talk to each other.

The Path of a Packet

If a user types example.com into their browser:

  1. DNS resolves the name to the Ingress Controller’s IP.
  2. The Ingress Controller looks at the host header and finds the correct Service.
  3. The Service (via kube-proxy) picks a healthy Pod and sends the traffic there.
  4. The CNI routes that packet across the internal network to the node where the Pod is living.

To understand how data actually moves between nodes, we have to look at the CNI (Container Network Interface).

Since every Pod has a unique IP but lives on a host (Node) with its own separate IP, the CNI’s job is to “tunnel” the Pod’s traffic across the physical network. The two most common ways it does this are Encapsulation (Overlay) and Direct Routing.


1. Encapsulation (The “Envelope” Method)

This is the most common approach (used by Flannel (VXLAN) and OpenShift SDN). It treats the physical network as a “carrier” for a private, virtual network.

  • How it works: When Pod A (on Node 1) sends a packet to Pod B (on Node 2), the CNI takes that entire packet and wraps it inside a new UDP packet.
  • The “Outer” Header: Points from Node 1’s IP to Node 2’s IP.
  • The “Inner” Header: Points from Pod A’s IP to Pod B’s IP.
  • Arrival: When the packet hits Node 2, the CNI “unwraps” the outer envelope and delivers the original inner packet to Pod B.

The Downside: This adds a small amount of overhead (usually about 50 bytes per packet) because of the extra headers. This is why you often see the MTU (Maximum Transmission Unit) set slightly lower in Kubernetes (e.g., 1450 instead of 1500).


2. Direct Routing (The “BGP” Method)

Used by Calico (in non-overlay mode) and Cilium, this method avoids the “envelope” entirely for better performance.

  • How it works: The nodes act like standard network routers. They use BGP (Border Gateway Protocol) to tell each other: “Hey, if you want to reach the 10.244.1.0/24 subnet, send those packets to me (Node 1).”
  • The Experience: Packets travel “naked” across the wire with no extra headers.
  • The Requirement: Your physical network routers must be able to handle these extra routes, or the nodes must all be on the same Layer 2 segment (the same VLAN/Switch).

3. The Role of the “veth” Pair

Regardless of how the data moves between nodes, getting data out of a container uses a Virtual Ethernet (veth) pair.

Think of a veth pair as a virtual patch cable:

  1. One end is plugged into the container (usually named eth0).
  2. The other end is plugged into the host’s network namespace (often named something like vethabc123).
  3. The host end is usually connected to a Bridge (like cni0 or br0), which acts like a virtual switch for all Pods on that specific node.

4. Comparing Popular CNIs

CNI PluginPrimary MethodBest Use Case
FlannelVXLAN (Overlay)Simple clusters; works almost anywhere.
CalicoBGP or IP-in-IPHigh performance and advanced Network Policies.
CiliumeBPFNext-gen performance, deep security, and observability.
OpenShift SDNVXLANDefault for older OCP; very stable and integrated.
OVN-KubernetesGeneve (Overlay)Modern OCP default; supports massive scale and Windows nodes.

Which one should you choose?

  • If you are on OpenShift 4.x, you are likely using OVN-Kubernetes. It’s powerful and handles complex routing for you.
  • If you are building a Vanilla K8s cluster and want the absolute fastest networking, Cilium is the current industry gold standard because it uses eBPF to bypass parts of the Linux kernel entirely.

To understand why eBPF (Extended Berkeley Packet Filter) is the “holy grail” of modern Kubernetes networking, we first have to look at how the “old way” (iptables) works.

1. The Old Way: iptables (The “List of Rules”)

For years, Kubernetes used iptables (a standard Linux kernel feature) to route traffic.

  • How it works: Imagine a giant printed list of instructions. Every time a packet arrives, the CPU has to read the list from top to bottom: “Is it for Service A? No. Service B? No. Service C? Yes.”
  • The Problem: As you add more Services and Pods, this list grows to thousands of lines. If a packet is destined for the 5,000th service on the list, the CPU has to perform 5,000 checks for every single packet.
  • Result: High latency and high CPU “overhead” just to move data.

2. The New Way: eBPF (The “Direct Shortcut”)

eBPF allows you to run small, sandboxed programs directly inside the Linux kernel without changing the kernel code.

  • How it works: Instead of a long list of rules, eBPF creates a high-speed “lookup table” (a Hash Map) in the kernel’s memory.
  • The Benefit: When a packet arrives, the eBPF program looks at the destination and instantly knows where it goes. It doesn’t matter if you have 10 services or 10,000—the lookup time is exactly the same (O(1) complexity).
  • Bypassing the Stack: eBPF can catch a packet the moment it hits the Network Interface Card (NIC) and send it straight to the Pod, bypassing almost the entire Linux networking stack.

3. Why Cilium + eBPF is a Game Changer

Cilium is the most popular CNI that uses eBPF. It provides three massive advantages over traditional networking:

Featureiptables / Standard CNICilium (eBPF)
PerformanceSlows down as the cluster grows.Consistently fast at any scale.
ObservabilityHard to see “who is talking to who” without sidecars.Hubble (Cilium’s UI) shows every flow, drop, and latency in real-time.
SecurityIP-based filtering (hard to manage).Identity-based filtering. It knows a packet belongs to “Service-Frontend” regardless of its IP.

4. Why OpenShift is Moving to OVN (Geneve)

While Cilium is the “shiny new toy,” Red Hat chose OVN-Kubernetes (using the Geneve protocol) as the default for OCP 4.

  • Scale: OVN is built on Open vSwitch, which was designed for massive telco-grade clouds.
  • Feature Parity: It handles complex things like “Egress IPs” (giving a specific namespace a static IP for exiting the cluster) and Hybrid networking (Linux + Windows nodes) much more maturely than basic eBPF implementations did a few years ago.

Summary: The “Speed” Evolution

  1. iptables: Standard, but slow at scale.
  2. IPVS: A middle ground that uses hash tables but is still tied to the old kernel stack.
  3. eBPF (Cilium): The fastest possible way to move data in Linux today.

In OpenShift, the modern way to see these network flows is through the Network Observability Operator. This tool uses the eBPF technology we discussed to capture traffic data without slowing down your pods.

Here is how you can access and use these views.


1. Using the Web Console (The GUI Way)

Once the operator is installed, a new menu appears in your OpenShift Console.

  1. Navigate to Observe -> Network Traffic in the Administrator perspective.
  2. Overview Tab: This gives you a high-level “Sankey” diagram or graph showing which namespaces are talking to each other. It’s perfect for spotting “top talkers” (apps using the most bandwidth).
  3. Traffic Flows Tab: This is like a “Wireshark-lite” for your cluster. You can see every individual connection, including:
    • Source/Destination: Which pod is talking to which service.
    • Byte Rate: How much data is moving.
    • RTT (Latency): Exactly how many milliseconds a packet takes to travel between pods.
  4. Topology Tab: This provides a visual map of your network. You can group by “Namespace” or “Node” to see how traffic crosses physical boundaries.

2. Using the CLI (The “oc netobserv” plugin)

If you prefer the terminal, there is a specific plugin called oc netobserv. This is incredibly useful for live debugging when you don’t want to leave your shell.

Capture live flows:

Bash

oc netobserv flows --protocol=TCP --port=80

This will stream live traffic data directly to your terminal.

Filter for specific issues:

You can filter by namespace or even look for dropped packets (great for debugging firewall/NetworkPolicy issues):

Bash

oc netobserv flows --namespace=my-app --action=Drop

3. The “Loki” Backend

Behind the scenes, the Network Observability Operator stores these flows in Loki (a log aggregation system). This allows you to “go back in time.”

If an application crashed at 2:00 AM, you can go to the Network Traffic page, set the time filter to 2:00 AM, and see if there was a sudden spike in traffic or if a connection was being blocked by a security policy at that exact moment.


4. Advanced Debugging: Packet Drops

One of the best features of the eBPF-based observer is Packet Drop tracking. Traditional tools often can’t tell you why a packet disappeared. With this tool, the kernel can report the exact reason:

  • Filter Drop: A NetworkPolicy blocked it.
  • TCP Timeout: The other side didn’t respond.
  • Congestion: The network interface was overloaded.

Summary: What can you find?

  • Security: Is my database receiving traffic from an unauthorized namespace?
  • Performance: Which pods have the highest latency (RTT)?
  • Cost: Which services are sending the most data to external (Internet) IPs?

In Kubernetes, a NetworkPolicy is your cluster’s internal firewall. By default, Kubernetes has a “non-isolated” policy—meaning every pod can talk to every other pod.

To secure your app, you should follow the “Principle of Least Privilege”: block everything, then specifically allow only what is necessary.


1. The “Default Deny” (The Foundation)

Before you write specific rules, it is a best practice to create a “Default Deny” policy for your namespace. This locks all doors so that nothing can enter or leave unless you explicitly say so.

YAML

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: my-secure-app
spec:
podSelector: {} # Matches all pods in this namespace
policyTypes:
- Ingress
- Egress

2. Allowing Specific Traffic (The “Rule”)

Now that everything is blocked, let’s say you have a Database pod and you only want your Frontend pod to talk to it on port 5432.

YAML

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-frontend-to-db
namespace: my-secure-app
spec:
podSelector:
matchLabels:
app: database # This policy applies to pods labeled 'app: database'
policyTypes:
- Ingress
parent:
ingress:
- from:
- podSelector:
matchLabels:
role: frontend # Only allow pods labeled 'role: frontend'
ports:
- protocol: TCP
port: 5432

3. Three Ways to Target Traffic

You can control traffic based on three different criteria:

  1. podSelector: Target pods within the same namespace (e.g., “Frontend to Backend”).
  2. namespaceSelector: Target entire namespaces (e.g., “Allow everything from the ‘Monitoring’ namespace”).
  3. ipBlock: Target specific IP ranges outside the cluster (e.g., “Allow traffic from our corporate VPN range 10.0.0.0/24”).

4. Troubleshooting NetworkPolicies

If you apply a policy and your app stops working, here is how to debug:

  • Check Labels: NetworkPolicies rely 100% on labels. If your Frontend pod is labeled app: front-end but your policy looks for role: frontend, it will fail silently.
  • The “Blind” Policy: Standard Kubernetes doesn’t “log” when a policy blocks a packet. This is why we use the Network Observability Operator (as we discussed) to see the “Drop” events.
  • CNI Support: Remember, the CNI (Calico, OVN, etc.) is what actually enforces these rules. If your CNI doesn’t support NetworkPolicies (like basic Flannel), the YAML will be accepted but it won’t actually block anything!

Summary: Ingress vs. Egress

  • Ingress: Controls traffic coming into the pod (Who can talk to me?).
  • Egress: Controls traffic leaving the pod (Who can I talk to?).

A Zero Trust architecture in Kubernetes means that no pod is trusted by default. Even if a pod is inside your cluster, it shouldn’t be allowed to talk to anything else unless you specifically permit it.

In this scenario, we have a 3-tier app: Frontend, Backend, and Database.


1. The “Lockdown” (Default Deny)

First, we apply this to the entire namespace. This ensures that any new pod you deploy in the future is “secure by default” and cannot communicate until you add a rule for it.

YAML

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: my-app-stack
spec:
podSelector: {} # Matches ALL pods
policyTypes:
- Ingress
- Egress

2. Tier 1: The Frontend

The Frontend needs to receive traffic from the Internet (via the Ingress Controller) and send traffic only to the Backend.

YAML

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: frontend-policy
namespace: my-app-stack
spec:
podSelector:
matchLabels:
tier: frontend
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
network.openshift.io/policy-group: ingress # Allows OpenShift Ingress Controller
egress:
- to:
- podSelector:
matchLabels:
tier: backend # ONLY allowed to talk to Backend

3. Tier 2: The Backend

The Backend should only accept traffic from the Frontend and is only allowed to talk to the Database.

YAML

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: backend-policy
namespace: my-app-stack
spec:
podSelector:
matchLabels:
tier: backend
policyTypes:
- Ingress
- Egress
ingress:
- from:
- podSelector:
matchLabels:
tier: frontend # ONLY accepts Frontend traffic
egress:
- to:
- podSelector:
matchLabels:
tier: database # ONLY allowed to talk to DB

4. Tier 3: The Database

The Database is the most sensitive. It should never initiate a connection (no Egress) and only accept traffic from the Backend.

YAML

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: database-policy
namespace: my-app-stack
spec:
podSelector:
matchLabels:
tier: database
policyTypes:
- Ingress
- Egress # We include Egress to ensure it's blocked by default
ingress:
- from:
- podSelector:
matchLabels:
tier: backend
ports:
- protocol: TCP
port: 5432 # Postgres port

Important: Don’t Forget DNS!

When you apply a “Default Deny” Egress policy, your pods can no longer talk to CoreDNS, which means they can’t resolve service names like http://backend-service.

To fix this, you must add one more policy to allow UDP Port 53 to the openshift-dns namespace:

YAML

  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: openshift-dns
    ports:
    - protocol: UDP
      port: 53


Summary of the Strategy

  • Labels are everything: If you typo tier: backend as tier: back-end, the wall stays up and the app breaks.
  • Layered Security: Even if a hacker compromises your Frontend pod, they cannot “scan” your network or reach your Database directly; they are stuck only being able to talk to the Backend.

To test your Zero Trust setup, we will perform a “Penetration Test” from inside the cluster. We’ll use a temporary debug pod to see if our firewall rules actually block unauthorized access.

1. The “Attacker” Pod

We will spin up a temporary pod with networking tools (like curl and dig) that has no labels. Since our “Default Deny” policy targets all pods, this pod should be completely isolated the moment it starts.

Bash

# Run a temporary alpine pod
oc run network-tester --image=alpine --restart=Never -- /bin/sh -c "sleep 3600"

2. Test 1: Can an unknown pod talk to the Database?

Even if this pod is in the same namespace, it should not be able to reach the database because it doesn’t have the tier: backend label.

Bash

# Try to connect to the database on port 5432
oc exec network-tester -- nc -zv database-service 5432

Expected Result: Connection timed out (The packet is dropped by the NetworkPolicy).


3. Test 2: Can the Frontend “Scan” the Database?

Now, let’s pretend a hacker compromised your Frontend pod. Can they bypass the Backend and talk directly to the Database?

Bash

# Exec into your existing Frontend pod
oc exec deployment/frontend -- nc -zv database-service 5432

Expected Result: Connection timed out.

Even though the Frontend is “trusted” to talk to the Backend, it is not trusted to talk to the Database.


4. Test 3: Can the Database reach the Internet? (Egress Test)

A common hacker tactic is to steal data and send it to an external server (Exfiltration). Let’s see if the Database can “phone home.”

Bash

# Try to ping Google from the Database pod
oc exec deployment/database -- curl -I google.com

Expected Result: Could not resolve host or Timeout.

Since we didn’t add an Egress rule for the Database, it is physically unable to send data out of its own pod.


5. How to see the “Deny” in real-time

If you want to prove the policy is working without just guessing based on timeouts, use the Network Observability tool we set up earlier:

  1. Go to Observe -> Network Traffic.
  2. Filter by Action: Deny or Action: Drop.
  3. You will see a red entry showing:
    • Source: network-tester
    • Destination: database
    • Reason: NetworkPolicy

6. Cleaning Up

When you’re done testing, don’t forget to remove the tester pod:

Bash

oc delete pod network-tester

Summary of the Lab

TestSourceTargetStatusWhy?
UnauthorizedDebug PodDatabaseBLOCKEDMissing tier: backend label.
Lateral MovementFrontendDatabaseBLOCKEDPolicy only allows Frontend -> Backend.
ExfiltrationDatabaseInternetBLOCKEDNo Egress rules defined for DB.

Leave a comment