Here’s a compact OCP OVN troubleshooting cheat sheet you can keep beside your terminal. It focuses on the fastest way to separate OVN datapath issues from service, DNS, and namespace policy problems. In modern OpenShift, OVN-Kubernetes is the network plugin, it enforces Kubernetes NetworkPolicy, OpenShift supports namespace-scoped EgressFirewall for OVN-Kubernetes, and the DNS Operator manages cluster DNS. (Red Hat Documentation)
1-page mental model
Pod -> Pod IP = OVN routing / switching pathPod -> ClusterIP = OVN service load-balancing pathPod -> DNS name = DNS first, then service/pod pathPod -> Internet = egress policy / EgressFirewall / EgressIP / routingOnly one node broken = usually node-local OVN/OVS or host networkOnly one namespace = usually NetworkPolicy / EgressFirewall / EgressIP
This split is useful because Kubernetes treats Services, DNS, and NetworkPolicies as distinct layers, while OpenShift adds OVN-specific egress controls on top. (Kubernetes)
Core first checks
oc get co networkoc get pods -n openshift-ovn-kubernetesoc get pods -n openshift-dnsoc get pods -A -o wide
If the network operator or OVN pods are unhealthy, start there. Red Hat’s OVN troubleshooting docs specifically call out checking OVN-Kubernetes health, logs, and pod connectivity, and OpenShift DNS health depends on the DNS Operator / dns-default pods being healthy. (Red Hat Documentation)
Diagram
+----------------------+
| Source Pod |
+----------+-----------+
|
+-----------------+------------------+
| | |
v v v
Pod IP path ClusterIP path DNS name path
(OVN routing) (OVN LB/service) (DNS -> Service/Pod)
| | |
v v v
Node/OVS/OVN Endpoints/LB resolv.conf / DNS pods
\____________________ _____________________/
\/
External destination
(NetworkPolicy / EgressFirewall /
EgressIP / host routing)
Kubernetes service discovery relies on DNS records for Services and Pods, while policy enforcement and namespace-scoped egress controls are separate concerns. (Kubernetes)
A. Pod cannot reach another pod
Use this when direct pod IP traffic fails.
oc exec -it <src-pod> -- curl http://<dst-pod-ip>:<port>oc get networkpolicy -Aoc get pods -n openshift-ovn-kubernetes -o wideoc logs -n openshift-ovn-kubernetes <ovnkube-node-pod>oc debug node/<node>chroot /hostovs-vsctl show
Most likely causes:
NetworkPolicyis blocking pod-to-pod trafficovnkube-nodeis unhealthy on one node- OVS state on a node is broken
- node route / MTU issue
Kubernetes NetworkPolicy can restrict pod-to-pod and pod-to-external traffic, and Red Hat’s OVN docs point to checking OVN logs and node-level connectivity when debugging packet path issues. (Kubernetes)
B. Pod IP works, Service/ClusterIP fails
This usually means OVN service load-balancing or Service wiring, not basic routing.
oc exec -it <src-pod> -- curl http://<dst-pod-ip>:<port>oc exec -it <src-pod> -- curl http://<service-name>:<port>oc get svc <service> -o yamloc get endpoints <service>
Interpretation:
- pod IP works, ClusterIP fails → check Service, endpoints, OVN LB programming
- endpoints empty → wrong selector / no backing pods
- ClusterIP works but name fails → DNS issue
Kubernetes Services depend on selectors and endpoints, and DNS gives the name-to-Service mapping. (Kubernetes)
C. DNS works for some pods, not others
Start here:
oc exec -it <bad-pod> -- cat /etc/resolv.confoc exec -it <bad-pod> -- nslookup kubernetes.defaultoc exec -it <bad-pod> -- nslookup <service>.<namespace>oc get pods -n openshift-dns -o wideoc logs -n openshift-dns <dns-default-pod>
Most likely causes:
- wrong namespace assumption on short names
- bad pod DNS config
dns-defaultunhealthy- only some nodes cannot reach DNS pods
Kubernetes documents that pod DNS behavior depends on namespace search paths and pod DNS configuration, and OpenShift documents that DNS is managed by the DNS Operator with CoreDNS pods in openshift-dns. (Kubernetes)
D. Traffic works on one node but not another
That usually points to a node-local issue.
oc get pods -A -o wideoc get pods -n openshift-ovn-kubernetes -o wideoc logs -n openshift-ovn-kubernetes <ovnkube-node-on-bad-node>oc debug node/<bad-node>chroot /hostovs-vsctl showip routeip link
Most likely causes:
ovnkube-nodedegraded on one worker- stale/broken OVS on that node
- host NIC, route, or MTU mismatch
Red Hat’s OVN troubleshooting guidance focuses on readiness, logs, and connectivity checks, and node-scoped breakage usually means the problem is below the namespace/app layer. (Red Hat Documentation)
E. Egress works in some namespaces but not others
Think namespace policy first.
oc exec -n <ns> deploy/<app> -- nslookup example.comoc exec -n <ns> deploy/<app> -- curl -I https://93.184.216.34oc get networkpolicy -n <ns> -o yamloc get egressfirewall -n <ns> -o yamloc get egressip
Most likely causes:
- egress
NetworkPolicy - namespace
EgressFirewall - broken/missing
EgressIP - DNS issue being mistaken for egress failure
Kubernetes NetworkPolicy supports egress restrictions, OpenShift EgressFirewall is namespace-scoped for OVN-Kubernetes, and OpenShift supports assigning egress IPs to a namespace or specific pods. (Kubernetes)
Fast decision tree
1. Does direct pod IP work? No -> OVN routing / policy / node / OVS Yes -> continue2. Does ClusterIP work? No -> Service / endpoints / OVN LB Yes -> continue3. Does DNS name work? No -> DNS path / resolv.conf / DNS pods / namespace lookup Yes -> continue4. Does external IP work? No -> egress policy / EgressFirewall / EgressIP / routing Yes -> app-layer issue likely
This sequence mirrors how Kubernetes separates pod routing, Services, DNS, and policy, and it lines up well with OpenShift’s OVN troubleshooting flow. (Kubernetes)
Command pack
# Healthoc get co networkoc get pods -n openshift-ovn-kubernetesoc get pods -n openshift-dns# Placementoc get pods -A -o wide# Policiesoc get networkpolicy -Aoc get egressfirewall -Aoc get egressip# Service wiringoc get svc,endpoints -A# Node debugoc debug node/<node>chroot /hostovs-vsctl showip routeip link
These are the highest-yield commands for narrowing the issue to the correct layer in OpenShift and Kubernetes networking. (Red Hat Documentation)
Rule of thumb
One node broken -> node-local OVN/OVS/host networkOne namespace broken -> policy / EgressFirewall / EgressIPPod IP broken -> routing / policyService only broken -> endpoints / service LBName only broken -> DNSExternal only broken -> egress controls / routing
That’s the shortest reliable way to avoid chasing the wrong subsystem. (Kubernetes)
I can turn this into a clean PDF or a one-page DOCX cheat sheet with a nicer diagram.