How To

Run Network Tools on OpenShift 4.x Nodes

Red Hat Enterprise Linux CoreOS (RHCOS) is the immutable operating system running on all OpenShift 4.x nodes. Unlike traditional RHEL, RHCOS does not include package managers like yum or dnf – the OS uses rpm-ostree for transactional upgrades delivered through the OpenShift update process. This means you cannot directly install network troubleshooting tools like telnet, tcpdump, or netstat on the nodes.

Original content from computingforgeeks.com - post 70236

This guide covers multiple methods to run network diagnostic tools on OpenShift 4.x nodes – from oc debug sessions and the built-in toolbox container, to dedicated debug pods using nicolaka/netshoot. Every method works on RHCOS nodes without modifying the underlying OS.

Prerequisites

Before you start, confirm the following are in place:

  • A running OpenShift 4.x cluster (4.12+)
  • The oc CLI installed and authenticated with cluster-admin privileges
  • Access to the cluster API endpoint from your workstation
  • Wireshark installed locally if you plan to analyze .pcap captures

Verify your cluster nodes are ready before troubleshooting:

oc get nodes -o wide

The output lists all cluster nodes with their roles, OS image, and internal IP addresses:

NAME                              STATUS   ROLES           AGE    VERSION   INTERNAL-IP    OS-IMAGE
master01.ocp.example.com          Ready    control-plane   120d   v1.28.6   10.10.30.10    Red Hat Enterprise Linux CoreOS 414.92
master02.ocp.example.com          Ready    control-plane   120d   v1.28.6   10.10.30.11    Red Hat Enterprise Linux CoreOS 414.92
worker01.ocp.example.com          Ready    worker          120d   v1.28.6   10.10.30.20    Red Hat Enterprise Linux CoreOS 414.92
worker02.ocp.example.com          Ready    worker          120d   v1.28.6   10.10.30.21    Red Hat Enterprise Linux CoreOS 414.92

Step 1: Open a Debug Shell on an OpenShift Node

The oc debug node command starts a privileged pod on the target node and drops you into a shell. This is the primary way to access RHCOS nodes for troubleshooting since direct SSH is not configured by default. For a deeper walkthrough, see how to open a shell prompt on an OpenShift node.

Start a debug session on any node:

oc debug node/worker01.ocp.example.com

The debug pod starts and mounts the node’s root filesystem at /host:

Starting pod/worker01ocpexamplecom-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.10.30.20
If you don't see a command prompt, try pressing enter.
sh-4.4#

Switch to the node’s root filesystem with chroot to access host binaries like crictl, ip, and ss:

chroot /host

You now have a shell running directly on the RHCOS node. Any tools already bundled with RHCOS (like ip, ss, crictl) are available immediately.

Step 2: Run tcpdump on a Node

RHCOS includes tcpdump in its base image, so you can capture traffic directly from the debug shell without installing anything. After running chroot /host, identify the network interfaces on the node:

ip link show

The output shows all network interfaces on the node – look for the primary interface (usually ens3, ens192, or eth0 depending on your infrastructure):

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 00:1a:4a:16:01:73 brd ff:ff:ff:ff:ff:ff
8: br0: <BROADCAST,MULTICAST> mtu 1450 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 72:d6:df:e8:13:48 brd ff:ff:ff:ff:ff:ff
9: vxlan_sys_4789: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65000 qdisc noqueue master ovs-system state UNKNOWN
    link/ether 4a:c4:7f:c1:85:f7 brd ff:ff:ff:ff:ff:ff

Capture all traffic on port 443 across all interfaces and write the output to a pcap file on the node’s filesystem:

tcpdump -i any port 443 -s 0 -vv -w /tmp/https_capture.pcap

To capture traffic on a specific interface and filter by destination host:

tcpdump -i ens3 host 10.10.30.21 -s 0 -w /tmp/node_traffic.pcap

Press Ctrl+C to stop the capture. The pcap file is saved at /tmp on the node – we cover how to copy it off in Step 7.

Step 3: Run Telnet and Network Tools with Toolbox

Tools not included in the RHCOS base image (like telnet, nmap, strace) require the toolbox container. This is a privileged container that shares the node’s network namespace and lets you install packages with yum.

From the debug shell (after chroot /host), start the toolbox container:

toolbox

The first run pulls the support-tools image from the Red Hat registry:

Trying to pull registry.redhat.io/rhel9/support-tools...
Getting image source signatures
Copying blob ec1681b6a383 done
Copying blob c4d668e229cd done
Copying config 50b63c2aff done
Writing manifest to image destination
Spawning a container 'toolbox-root' with image 'registry.redhat.io/rhel9/support-tools'

Inside the toolbox container, install the network tools you need:

yum -y install telnet net-tools tcpdump nmap-ncat bind-utils

Now test connectivity to a service using telnet – for example, checking if port 6443 (Kubernetes API) is reachable from the node:

telnet 10.10.30.10 6443

A successful connection looks like this:

Trying 10.10.30.10...
Connected to 10.10.30.10.
Escape character is '^]'.

Press Ctrl+] then type quit to exit the telnet session. For DNS troubleshooting inside the toolbox, use dig or nslookup (from bind-utils):

dig api.ocp.example.com +short

Type exit to leave the toolbox container and return to the debug shell.

Step 4: Run curl and wget from a Debug Pod

Sometimes you need to test HTTP endpoints from inside the cluster network without logging into a node. The oc debug command can also launch a debug copy of an existing pod or a fresh pod with the tools you need.

Run a one-off curl command from a new debug pod in the cluster:

oc run debug-curl --image=registry.access.redhat.com/ubi9/ubi-minimal --restart=Never --rm -it -- curl -sv https://kubernetes.default.svc:443/healthz

This creates a temporary pod that runs the curl command and deletes itself when done. You can also test internal service DNS resolution:

oc run debug-dns --image=registry.access.redhat.com/ubi9/ubi-minimal --restart=Never --rm -it -- curl -s http://my-service.my-namespace.svc.cluster.local:8080/health

For persistent debugging where you need to run multiple commands, start an interactive shell:

oc run debug-shell --image=registry.access.redhat.com/ubi9/ubi-minimal --restart=Never --rm -it -- /bin/bash

From inside the pod, install wget or other tools with microdnf:

microdnf install wget curl bind-utils -y

The pod runs in the cluster network, so it can reach ClusterIP services and internal DNS – making it ideal for testing service-to-service connectivity.

Step 5: Deploy a Network Debug Pod with netshoot

The nicolaka/netshoot container image comes preloaded with every network tool you might need – tcpdump, traceroute, nmap, iperf3, mtr, netstat, and dozens more. It is the fastest way to get a fully equipped network debugging environment inside your cluster.

Launch a netshoot pod in the current namespace:

oc run netshoot --image=nicolaka/netshoot --restart=Never --rm -it -- /bin/bash

Inside the pod, all tools are available immediately without installing anything:

bash-5.2# tcpdump -i eth0 -c 10
bash-5.2# traceroute 10.10.30.10
bash-5.2# nmap -sT -p 6443,8443,443 10.10.30.10
bash-5.2# iperf3 -c iperf-server.my-namespace.svc.cluster.local

To run netshoot with host networking (useful for testing node-level connectivity), add a security context override:

oc run netshoot-host --image=nicolaka/netshoot --restart=Never --rm -it --overrides='{"spec":{"hostNetwork":true,"nodeName":"worker01.ocp.example.com"}}' -- /bin/bash

This gives the pod access to the node’s network interfaces directly, which is useful for capturing traffic at the node level without using oc debug node.

Step 6: Capture Traffic Between Specific Pods

When troubleshooting communication between two specific pods, you need to capture traffic inside a pod’s network namespace on the node. This approach is more targeted than capturing all node traffic.

First, identify which node the target pod is running on:

oc get pod my-app-pod -n my-namespace -o wide

The output shows the node name and pod IP:

NAME         READY   STATUS    RESTARTS   AGE   IP            NODE
my-app-pod   1/1     Running   0          3d    10.128.2.45   worker01.ocp.example.com

Open a debug shell on that node and chroot into the host:

oc debug node/worker01.ocp.example.com
chroot /host

Find the container ID for the target pod using crictl:

crictl ps --name my-app-pod

The output includes the container ID in the first column:

CONTAINER           IMAGE               CREATED             STATE     NAME           POD ID
a1b2c3d4e5f67       abcdef123456        3 days ago          Running   my-app-pod     f9e8d7c6b5a4

Get the container’s process ID (PID) from the container runtime:

container_pid=$(crictl inspect --output json a1b2c3d4e5f67 | python3 -c "import sys,json; print(json.load(sys.stdin)['info']['pid'])")

Verify the PID was captured correctly:

echo $container_pid

Now run tcpdump inside the container’s network namespace using nsenter:

nsenter -n -t $container_pid -- tcpdump -i any -s 0 -w /tmp/pod_capture.pcap

This captures only the traffic flowing through the target pod’s network stack – much cleaner than capturing all node traffic. Press Ctrl+C to stop the capture when you have enough data.

Step 7: Copy Packet Captures Off the Node

After collecting pcap files on a node, you need to copy them to your workstation for analysis in Wireshark or similar tools. There are two approaches depending on how you ran the capture.

Copy from a debug pod using oc cp

If you ran tcpdump inside a debug pod that is still running, use oc cp to copy the file directly:

oc cp worker01ocpexamplecom-debug:/host/tmp/https_capture.pcap ./https_capture.pcap

The debug pod mounts the node filesystem at /host, so the path includes the /host prefix for files written to the node’s /tmp.

Copy using oc rsync for larger captures

For large capture files or multiple files, oc rsync is more reliable. While your debug pod is running, open a second terminal and sync the files:

oc rsync worker01ocpexamplecom-debug:/host/tmp/ ./captures/ --include="*.pcap"

If you need to check node logs while analyzing captures, use oc adm node-logs in a separate terminal to correlate events with your packet data.

Clean up the capture files on the node when you are done:

oc debug node/worker01.ocp.example.com -- chroot /host rm -f /tmp/https_capture.pcap /tmp/node_traffic.pcap /tmp/pod_capture.pcap

Step 8: Use must-gather for Network Diagnostics

For systematic network troubleshooting across the entire cluster – especially when dealing with node network configuration issues – the oc adm must-gather command collects comprehensive diagnostic data. This is the recommended approach for Red Hat support cases and cluster-wide issues.

Run the network-specific must-gather to collect OVN/SDN logs, network policies, and routing information:

oc adm must-gather --image=registry.redhat.io/openshift4/network-tools-rhel9 --dest-dir=./network-must-gather

This collects OVN-Kubernetes logs, Open vSwitch flows, network policies, and pod networking details across all nodes. The process takes a few minutes depending on cluster size.

For general cluster diagnostics that include networking alongside other subsystems:

oc adm must-gather --dest-dir=./cluster-must-gather

The output is saved to a timestamped directory. Review the network-related files:

ls ./network-must-gather/*/network_logs/

Key files to examine in the must-gather output include OVN northbound/southbound database dumps, network operator logs, and per-node OVS flow tables. Refer to the OpenShift must-gather documentation for the complete reference on interpreting the collected data.

Conclusion

Running network diagnostics on OpenShift 4.x nodes requires working around the immutable RHCOS filesystem, but the available methods – oc debug node, toolbox containers, netshoot pods, and must-gather – cover every troubleshooting scenario. For quick checks, oc debug node with the built-in tcpdump is the fastest path. For thorough investigations involving pod-level metrics and targeted captures, combine nsenter with crictl to isolate traffic to specific pods.

Always clean up debug pods and pcap files after troubleshooting sessions – leftover privileged pods and large capture files on node filesystems can consume resources and pose security risks in production clusters.

Related Articles

Monitoring How To Install Nagios on Ubuntu 24.04 (Noble Numbat) Automation How To Deploy Matrix Server using Ansible and Docker Kubernetes How To Perform Git clone in Kubernetes Pod deployment Containers Integrate Harbor Registry With LDAP for user Authentication

Press ESC to close