Troubleshooting¶
Detailed scenarios from real-world debugging sessions with solutions.
Scenario 1: Handshake Never Completes¶
Symptoms:
wg show wire_kube
peer: <pubkey>
endpoint: 1.2.3.4:51820
latest handshake: (none)
transfer: 0 B received, 1.2 KiB sent
Root Cause: Both peers are behind Symmetric NAT. STUN-discovered endpoints are unreliable because each new destination gets a different NAT mapping.
Diagnosis:
# Check agent logs for NAT type
kubectl logs -n wirekube-system <agent-pod> | grep -i "symmetric\|nat type"
# Manual STUN check (run on the node)
stun stun.cloudflare.com 3478
stun stun.l.google.com 19302
# If mapped ports differ → Symmetric NAT
Fix:
- Ensure relay is configured:
kubectl get wirekubemesh default -o yaml | grep relay - Set
relay.mode: autoand verifyhandshakeTimeoutSeconds - Check relay connectivity:
nc -zv <relay-endpoint> 3478 - Restart the agent:
kubectl rollout restart ds/wirekube-agent -n wirekube-system
Scenario 2: EPERM on UDP Write (Cilium)¶
Symptoms:
Root Cause: Cilium's cgroup BPF program (cil_sock4_sendmsg) intercepts
sendto() calls and returns EPERM.
Fix:
The adaptive proxy auto-switches to syscall.Write mode. Check agent logs:
Alternative: set Cilium socketLB.hostNamespaceOnly: true to prevent BPF hooks
on hostNetwork: true pods.
See CNI Compatibility for details.
Scenario 3: Relay Connection Timeout¶
Symptoms:
Root Cause: Relay server unreachable (firewall, service down, NLB health check).
Diagnosis:
# Test TCP connectivity from node
nc -zv <relay-endpoint> 3478
# Check relay pods
kubectl get pods -n wirekube-system -l app=wirekube-relay
# Check Service external address (for managed relay)
kubectl get svc wirekube-relay -n wirekube-system -o wide
Fix:
The relay client auto-reconnects with exponential backoff (1s–30s). If the relay is persistently unreachable:
- Check firewall / security group rules (TCP 3478 inbound)
- For managed relay, verify the Service has an ExternalIP or LoadBalancer IP
- After fixing, the agent reconnects automatically — no restart needed
Scenario 4: IPSec xfrm Conflict¶
Symptoms:
pingbetween nodes shows 100% packet losstcpdump -i wire_kubeshows both outgoing requests and incoming replies- But
pingstill reports no packets received
Root Cause: Existing IPSec xfrm policies intercept traffic on the wire_kube
interface. The kernel applies xfrm policies to inbound packets and drops them
because they weren't encrypted with IPSec.
Diagnosis:
# Check if xfrm bypass is enabled
cat /proc/sys/net/ipv4/conf/wire_kube/disable_xfrm
cat /proc/sys/net/ipv4/conf/wire_kube/disable_policy
# Both should be 1
# Check for IPSec xfrm policies
ip xfrm policy show
Fix:
The agent sets disable_xfrm=1 and disable_policy=1 on the WireGuard interface
automatically. If these values are 0:
- Check that the DaemonSet mounts
/proc/sys/netfrom the host - The volume
host-proc-sys-netshould mount hostPath/proc/sys/netto/host/proc/sys/net - Check agent logs for
xfrm bypass enabledor sysctl warnings - Manual override:
echo 1 > /proc/sys/net/ipv4/conf/wire_kube/disable_xfrm
Scenario 5: Relay Proxy Address in CRD¶
Symptoms:
kubectl get wirekubepeer <name> -o yaml
# spec.endpoint: "127.0.0.1:54321" ← relay proxy address, not real endpoint
Root Cause: The NAT endpoint reflection feature wrote the relay proxy's local loopback address back into the CRD.
Fix:
The agent filters 127.0.0.1:* from NAT reflection. To manually fix:
kubectl patch wirekubepeer <name> --type merge \
-p '{"spec":{"endpoint":"<correct-public-ip>:51820"}}'
Scenario 6: Same-VPC Nodes Cannot Communicate¶
Symptoms:
- Nodes in the same VPC/subnet fail to establish WireGuard handshake
wg showshows packets sent but 0 received
Root Cause: Missing fwmark routing rule causes a WireGuard packet loop.
Diagnosis:
Fix:
The agent creates this rule automatically on startup. The initContainer also removes stale rules from previous runs.
Scenario 7: Stale Interface After Crash¶
Symptoms:
Root Cause: A previous agent instance crashed and left a stale wire_kube interface.
Fix:
The DaemonSet's initContainer cleans up stale interfaces on startup. For manual fix:
ip link del wire_kube 2>/dev/null
ip rule del fwmark 0x574B 2>/dev/null
ip route flush table 22347 2>/dev/null
Scenario 8: AllowedIPs Empty → No Traffic¶
Symptoms:
Peer CRD exists but no routes are added and no traffic flows.
Root Cause: AllowedIPs are intentionally user-managed. When empty, the agent enters passive mode — no routes added for any peer.
Fix:
Scenario 9: Public Endpoint Overwritten with Private IP¶
Symptoms:
A node's CRD endpoint shows a private IP (e.g., 10.0.0.5:51820) instead of
the STUN-discovered public IP.
Root Cause: Another agent's reflectNATEndpoints wrote a private IP from
its WireGuard kernel cache back to the CRD, overwriting the public endpoint.
Fix:
The agent's reflectNATEndpoints function prevents downgrading a public IP to
a private IP. If the CRD is already corrupted:
kubectl patch wirekubepeer <name> --type merge \
-p '{"spec":{"endpoint":"<correct-public-ip>:51820"}}'
Scenario 10: Managed Relay Unreachable¶
Symptoms:
Agent logs show managed relay: no externally reachable address found on wirekube-relay Service.
Root Cause: The managed relay Service does not have an ExternalIP, LoadBalancer Ingress, or NodePort with a reachable node IP. The agent does not use ClusterIP/CoreDNS for relay discovery because it may be unreachable on hybrid/NAT'd nodes before the mesh tunnel is established.
Fix:
Ensure the relay Service has an externally reachable address:
If using serviceType: LoadBalancer, wait for the external IP to be assigned.
For NodePort, ensure at least one cluster node has a public IP (ExternalIP or
a public InternalIP). The agent retries relay initialization with exponential
backoff, so it will connect once the external address becomes available.
Diagnostic Commands Reference¶
| Command | Purpose |
|---|---|
wg show wire_kube |
WireGuard interface status |
wg show wire_kube dump |
Machine-parseable peer dump |
ip route show dev wire_kube |
Routes through WireGuard |
ip route show table 22347 |
WireKube routing table |
ip rule show |
Routing policy (check fwmark 0x574B) |
ss -tnp \| grep 3478 |
Relay TCP connection status |
kubectl get wirekubepeers -o wide |
All peer CRDs |
kubectl logs -n wirekube-system -l app=wirekube-agent |
Agent logs |
tcpdump -i wire_kube -n |
WireGuard decrypted traffic |
tcpdump -i eth0 udp port 51820 |
WireGuard encrypted packets |
cat /proc/sys/net/ipv4/conf/wire_kube/disable_xfrm |
xfrm bypass status |
cat /proc/sys/net/ipv4/conf/wire_kube/disable_policy |
xfrm policy bypass status |