OC Commands for OpenShift (OCP) Troubleshooting
Here’s a comprehensive reference organized by troubleshooting category:
Cluster & Node Health
# Cluster version and statusoc get clusterversionoc describe clusterversion# Node statusoc get nodesoc get nodes -o wideoc describe node <node-name># Node resource usageoc adm top nodes# Check cluster operatorsoc get clusteroperatorsoc get co # shorthand
Pod Troubleshooting
# List pods (all namespaces)oc get pods -Aoc get pods -n <namespace> -o wide# Pod details & eventsoc describe pod <pod-name> -n <namespace># Logsoc logs <pod-name> -n <namespace>oc logs <pod-name> -n <namespace> --previous # crashed containeroc logs <pod-name> -n <namespace> -c <container> # specific containeroc logs <pod-name> -n <namespace> --tail=100 -f # follow last 100 lines# Exec into podoc exec -it <pod-name> -n <namespace> -- /bin/bash# Pod resource usageoc adm top pods -n <namespace>
Deployment / DeploymentConfig
# Statusoc get deployments -n <namespace>oc get dc -n <namespace> # DeploymentConfigoc rollout status deployment/<name> -n <namespace># Rollout history & rollbackoc rollout history deployment/<name> -n <namespace>oc rollout undo deployment/<name> -n <namespace># Scaleoc scale deployment/<name> --replicas=3 -n <namespace># Restart (rolling)oc rollout restart deployment/<name> -n <namespace>
Networking & Routes
# Servicesoc get svc -n <namespace>oc describe svc <service-name> -n <namespace># Routesoc get routes -n <namespace>oc describe route <route-name> -n <namespace># Endpoints (checks pod–service binding)oc get endpoints -n <namespace># Network policiesoc get networkpolicy -n <namespace>
RBAC & Permissions
# Check what a user can dooc auth can-i <verb> <resource> --as=<user> -n <namespace>oc auth can-i --list --as=<user> -n <namespace># Role bindingsoc get rolebindings -n <namespace>oc get clusterrolebindings | grep <user># Service account permissionsoc get sa -n <namespace>oc describe sa <sa-name> -n <namespace># Add role to useroc adm policy add-role-to-user <role> <user> -n <namespace>oc adm policy add-cluster-role-to-user <role> <user>
Events & Alerts
# Namespace events (sorted by time)oc get events -n <namespace> --sort-by='.lastTimestamp'# Warning events onlyoc get events -n <namespace> --field-selector type=Warning# Cluster-wide eventsoc get events -A --sort-by='.lastTimestamp'
Storage / PVCs
# PVCs and PVsoc get pvc -n <namespace>oc get pvoc describe pvc <pvc-name> -n <namespace># Storage classesoc get storageclass
etcd & Control Plane
# etcd pod healthoc get pods -n openshift-etcdoc logs -n openshift-etcd <etcd-pod-name> -c etcd# API serveroc get pods -n openshift-kube-apiserveroc get pods -n openshift-apiserver# Scheduler & controller manageroc get pods -n openshift-kube-scheduleroc get pods -n openshift-kube-controller-manager
MachineConfig & Node Config
# MachineConfig poolsoc get mcpoc describe mcp worker# MachineConfigsoc get mcoc describe mc <mc-name># Machine health checkoc get machinehealthcheck -n openshift-machine-apioc get machines -n openshift-machine-api
Resource Inspection (General)
# Get all resources in a namespaceoc get all -n <namespace># Output as YAML/JSONoc get pod <pod-name> -n <namespace> -o yamloc get pod <pod-name> -n <namespace> -o json# jsonpath queryoc get pods -n <namespace> -o jsonpath='{.items[*].metadata.name}'# Watch resources liveoc get pods -n <namespace> -w# Diff live vs local configoc diff -f <manifest.yaml>
Quick Fixes
# Force delete stuck podoc delete pod <pod-name> -n <namespace> --grace-period=0 --force# Cordon / uncordon nodeoc adm cordon <node-name>oc adm uncordon <node-name># Drain nodeoc adm drain <node-name> --ignore-daemonsets --delete-emptydir-data# Mark node schedulableoc adm manage-node <node-name> --schedulable=true
Pro tip: Add -o wide to most get commands for extra detail, and --sort-by='.lastTimestamp' to events for chronological ordering. Use oc explain <resource> to explore any resource’s field schema inline.