Great—let’s go through another very common real-world issue and include a simple visual to make it click.
Scenario
Service works via pod IP, but fails via ClusterIP (service name/IP)
Environment:
frontend→ callingbackend- Direct call works:
curl http://10.128.2.15:8080 ✅ - Service call fails:
curl http://backend-service ❌
What this means (important insight)
If pod IP works but service fails, then:
Pod networking (OVN routing) is working
Problem is in service load-balancing layer inside OVN-Kubernetes
Mental model (diagram)
Interpretation:
- Pod → Pod = direct routing (works)
- Pod → Service = goes through OVN load balancer (broken here)
Step-by-step debugging
Step 1: Confirm endpoints exist
oc get endpoints backend-service
If EMPTY:
Root cause = wrong labels
Example:
# Service selectorselector: app: backend
But pod has:
labels: app: api ❌ mismatch
Fix labels → service starts working instantly
Step 2: Verify service definition
oc get svc backend-service -o yaml
Check:
- correct
port - correct
targetPort
Common mistake:
port: 80targetPort: 8080 ✅ must match container port
Step 3: Test ClusterIP directly
curl <ClusterIP>:<port>
Results:
- ❌ fails → OVN load balancer issue
- ✅ works → DNS issue instead
Step 4: Check DNS (don’t skip this)
From pod:
nslookup backend-service
If fails:
→ Not OVN
→ Check:
oc get pods -n openshift-dns
Step 5: Inspect OVN load balancer
On a node:
oc debug node/<node>chroot /host
Then:
ovn-nbctl lb-list
You should see something like:
VIP: 172.30.0.10:80 → 10.128.2.15:8080
If missing:
OVN didn’t program the service
Step 6: Check OVN logs
oc logs -n openshift-ovn-kubernetes <ovnkube-master>
Look for:
- load balancer sync errors
- endpoint update failures
Step 7: Check kube-proxy replacement
In OpenShift Container Platform, OVN replaces kube-proxy.
So if service routing is broken:
It’s handled by OVN, not iptables
Real root causes (from production)
1. Label mismatch (MOST COMMON)
- Service selector doesn’t match pod
→ no endpoints → service dead
2. Wrong port/targetPort
- Service pointing to wrong container port
→ connection refused
3. OVN load balancer not programmed
- OVN DB out of sync
→ ClusterIP has no backend mapping
4. NetworkPolicy blocking service traffic
- Pod allows direct IP but blocks service path
(less common but happens)
5. DNS issue (misdiagnosed often)
- Service name fails, ClusterIP works
Fast debugging logic (this is gold)
When pod IP works but service fails:
- Endpoints exist?
- ❌ → labels problem
- ClusterIP works?
- ❌ → OVN load balancing
- DNS works?
- ❌ → DNS issue
Pro tip (what experts do fast)
From a debug pod:
oc run debug --image=busybox -it --rm -- sh
Run:
nslookup backend-servicecurl <ClusterIP>curl <pod-IP>
This instantly isolates:
- DNS
- service
- networking
Key takeaway
- Pod IP = routing layer (OVN switching)
- Service IP = OVN load balancer layer
- If one works and the other doesn’t → you know exactly where to look