Quick OVN Networking Troubleshooting Tips

Here’s a compact OCP OVN troubleshooting cheat sheet you can keep beside your terminal. It focuses on the fastest way to separate OVN datapath issues from service, DNS, and namespace policy problems. In modern OpenShift, OVN-Kubernetes is the network plugin, it enforces Kubernetes NetworkPolicy, OpenShift supports namespace-scoped EgressFirewall for OVN-Kubernetes, and the DNS Operator manages cluster DNS. (Red Hat Documentation)

1-page mental model

Pod -> Pod IP = OVN routing / switching path
Pod -> ClusterIP = OVN service load-balancing path
Pod -> DNS name = DNS first, then service/pod path
Pod -> Internet = egress policy / EgressFirewall / EgressIP / routing
Only one node broken = usually node-local OVN/OVS or host network
Only one namespace = usually NetworkPolicy / EgressFirewall / EgressIP

This split is useful because Kubernetes treats Services, DNS, and NetworkPolicies as distinct layers, while OpenShift adds OVN-specific egress controls on top. (Kubernetes)

Core first checks

oc get co network
oc get pods -n openshift-ovn-kubernetes
oc get pods -n openshift-dns
oc get pods -A -o wide

If the network operator or OVN pods are unhealthy, start there. Red Hat’s OVN troubleshooting docs specifically call out checking OVN-Kubernetes health, logs, and pod connectivity, and OpenShift DNS health depends on the DNS Operator / dns-default pods being healthy. (Red Hat Documentation)

Diagram

                    +----------------------+
                    |      Source Pod      |
                    +----------+-----------+
                               |
             +-----------------+------------------+
             |                 |                  |
             v                 v                  v
        Pod IP path      ClusterIP path      DNS name path
  (OVN routing)      (OVN LB/service)   (DNS -> Service/Pod)
             |                 |                  |
             v                 v                  v
  Node/OVS/OVN          Endpoints/LB   resolv.conf / DNS pods

             \____________________  _____________________/
                                  \/
                           External destination
                    (NetworkPolicy / EgressFirewall /
                         EgressIP / host routing)

Kubernetes service discovery relies on DNS records for Services and Pods, while policy enforcement and namespace-scoped egress controls are separate concerns. (Kubernetes)

A. Pod cannot reach another pod

Use this when direct pod IP traffic fails.

oc exec -it <src-pod> -- curl http://<dst-pod-ip>:<port>
oc get networkpolicy -A
oc get pods -n openshift-ovn-kubernetes -o wide
oc logs -n openshift-ovn-kubernetes <ovnkube-node-pod>
oc debug node/<node>
chroot /host
ovs-vsctl show

Most likely causes:

  • NetworkPolicy is blocking pod-to-pod traffic
  • ovnkube-node is unhealthy on one node
  • OVS state on a node is broken
  • node route / MTU issue

Kubernetes NetworkPolicy can restrict pod-to-pod and pod-to-external traffic, and Red Hat’s OVN docs point to checking OVN logs and node-level connectivity when debugging packet path issues. (Kubernetes)

B. Pod IP works, Service/ClusterIP fails

This usually means OVN service load-balancing or Service wiring, not basic routing.

oc exec -it <src-pod> -- curl http://<dst-pod-ip>:<port>
oc exec -it <src-pod> -- curl http://<service-name>:<port>
oc get svc <service> -o yaml
oc get endpoints <service>

Interpretation:

  • pod IP works, ClusterIP fails → check Service, endpoints, OVN LB programming
  • endpoints empty → wrong selector / no backing pods
  • ClusterIP works but name fails → DNS issue

Kubernetes Services depend on selectors and endpoints, and DNS gives the name-to-Service mapping. (Kubernetes)

C. DNS works for some pods, not others

Start here:

oc exec -it <bad-pod> -- cat /etc/resolv.conf
oc exec -it <bad-pod> -- nslookup kubernetes.default
oc exec -it <bad-pod> -- nslookup <service>.<namespace>
oc get pods -n openshift-dns -o wide
oc logs -n openshift-dns <dns-default-pod>

Most likely causes:

  • wrong namespace assumption on short names
  • bad pod DNS config
  • dns-default unhealthy
  • only some nodes cannot reach DNS pods

Kubernetes documents that pod DNS behavior depends on namespace search paths and pod DNS configuration, and OpenShift documents that DNS is managed by the DNS Operator with CoreDNS pods in openshift-dns. (Kubernetes)

D. Traffic works on one node but not another

That usually points to a node-local issue.

oc get pods -A -o wide
oc get pods -n openshift-ovn-kubernetes -o wide
oc logs -n openshift-ovn-kubernetes <ovnkube-node-on-bad-node>
oc debug node/<bad-node>
chroot /host
ovs-vsctl show
ip route
ip link

Most likely causes:

  • ovnkube-node degraded on one worker
  • stale/broken OVS on that node
  • host NIC, route, or MTU mismatch

Red Hat’s OVN troubleshooting guidance focuses on readiness, logs, and connectivity checks, and node-scoped breakage usually means the problem is below the namespace/app layer. (Red Hat Documentation)

E. Egress works in some namespaces but not others

Think namespace policy first.

oc exec -n <ns> deploy/<app> -- nslookup example.com
oc exec -n <ns> deploy/<app> -- curl -I https://93.184.216.34
oc get networkpolicy -n <ns> -o yaml
oc get egressfirewall -n <ns> -o yaml
oc get egressip

Most likely causes:

  • egress NetworkPolicy
  • namespace EgressFirewall
  • broken/missing EgressIP
  • DNS issue being mistaken for egress failure

Kubernetes NetworkPolicy supports egress restrictions, OpenShift EgressFirewall is namespace-scoped for OVN-Kubernetes, and OpenShift supports assigning egress IPs to a namespace or specific pods. (Kubernetes)

Fast decision tree

1. Does direct pod IP work?
No -> OVN routing / policy / node / OVS
Yes -> continue
2. Does ClusterIP work?
No -> Service / endpoints / OVN LB
Yes -> continue
3. Does DNS name work?
No -> DNS path / resolv.conf / DNS pods / namespace lookup
Yes -> continue
4. Does external IP work?
No -> egress policy / EgressFirewall / EgressIP / routing
Yes -> app-layer issue likely

This sequence mirrors how Kubernetes separates pod routing, Services, DNS, and policy, and it lines up well with OpenShift’s OVN troubleshooting flow. (Kubernetes)

Command pack

# Health
oc get co network
oc get pods -n openshift-ovn-kubernetes
oc get pods -n openshift-dns
# Placement
oc get pods -A -o wide
# Policies
oc get networkpolicy -A
oc get egressfirewall -A
oc get egressip
# Service wiring
oc get svc,endpoints -A
# Node debug
oc debug node/<node>
chroot /host
ovs-vsctl show
ip route
ip link

These are the highest-yield commands for narrowing the issue to the correct layer in OpenShift and Kubernetes networking. (Red Hat Documentation)

Rule of thumb

One node broken -> node-local OVN/OVS/host network
One namespace broken -> policy / EgressFirewall / EgressIP
Pod IP broken -> routing / policy
Service only broken -> endpoints / service LB
Name only broken -> DNS
External only broken -> egress controls / routing

That’s the shortest reliable way to avoid chasing the wrong subsystem. (Kubernetes)

I can turn this into a clean PDF or a one-page DOCX cheat sheet with a nicer diagram.

Leave a comment