EKS Hybrid Nodes with WireKube¶
This guide covers deploying WireKube on an Amazon EKS cluster with Hybrid Nodes — external worker nodes (on-premises or other clouds) managed by an EKS control plane.
WireKube establishes a WireGuard mesh VPN across all nodes, enabling:
- Encrypted node-to-node connectivity over WireGuard tunnels
- Cross-node pod networking via Cilium VXLAN over WireGuard
- kubectl exec/logs/port-forward on hybrid nodes (via Virtual Gateway)
- Automatic NAT traversal with STUN + relay fallback
Architecture¶
graph TB
subgraph VPC["AWS VPC"]
API["EKS Control Plane<br/>API Server"]
LB["NLB :3478"]
subgraph EC2["EC2 Managed Node"]
Agent0["wirekube-agent<br/>(VGW · IP fwd + SNAT)"]
Relay["wirekube-relay"]
VPCCNI["VPC CNI (aws-node)"]
end
end
subgraph OnPrem["On-Premises / External Cloud"]
subgraph HybridA["Hybrid Node A"]
CiliumA["Cilium CNI"]
AgentA["wirekube-agent"]
end
subgraph HybridB["Hybrid Node B"]
CiliumB["Cilium CNI"]
AgentB["wirekube-agent"]
end
end
API -->|"kubelet :10250<br/>(via VGW)"| Agent0
Relay -. "exposed via" .-> LB
Agent0 <-->|"WireGuard P2P<br/>(UDP direct)"| HybridA
Agent0 <-->|"WireGuard P2P<br/>(UDP direct)"| HybridB
HybridA <-->|"WireGuard P2P<br/>(UDP direct)"| HybridB
HybridA ---|"Relay TCP fallback"| LB
HybridB ---|"Relay TCP fallback"| LB
HybridA <-->|"Cilium VXLAN<br/>over WireGuard"| HybridB
Four Network Planes¶
| Plane | Technology | Scope |
|---|---|---|
| Node-to-node | WireGuard (wire_kube) |
All nodes ↔ All nodes |
| Pod-to-pod (hybrid ↔ hybrid) | Cilium VXLAN over WireGuard | Hybrid nodes only |
| Pod-to-pod (EC2 ↔ hybrid) | VPC CNI SNAT + WireGuard AllowedIPs | EC2 ↔ Hybrid |
| Control plane ↔ kubelet | VGW Gateway (WireKubeGateway) | kube-apiserver → Hybrid |
Connection Strategy¶
- Direct P2P — STUN discovers NAT-mapped endpoints; nodes connect directly when NAT allows it (Cone ↔ Cone, Cone ↔ Symmetric)
- Relay fallback — After 30s handshake timeout, traffic flows through the TCP relay (WireGuard encryption preserved end-to-end)
- Birthday attack — Optional port-prediction for Symmetric ↔ Symmetric NAT pairs behind CGNAT
- VGW gateway — EC2 forwards VPC traffic to hybrid nodes through WireGuard, enabling kube-apiserver ↔ kubelet and EC2 ↔ hybrid pod routing
Network Design¶
CIDR Planning¶
Careful CIDR allocation prevents routing conflicts:
| Network | CIDR | Purpose |
|---|---|---|
| AWS VPC | 10.100.0.0/16 |
EC2 nodes and VPC CNI pods |
| EKS Service | 172.16.0.0/16 |
Kubernetes service ClusterIPs |
| Hybrid node subnet A | 172.20.0.0/16 |
On-prem/external cloud network A |
| Hybrid node subnet B | 10.20.0.0/16 |
On-prem/external cloud network B |
| Cilium Pod CIDR | 10.200.0.0/16 |
Pods on hybrid nodes (RFC 1918) |
Cilium Pod CIDR must NOT overlap with VPC CIDR or cloud internal ranges
By default, Cilium's cluster-pool IPAM may use the same CIDR as the VPC
(e.g., 10.100.0.0/16). This causes routing conflicts — WireGuard routes
for hybrid pod CIDRs would capture VPC-local traffic. Use a completely
separate RFC 1918 range like 10.200.0.0/16. Avoid CGNAT (100.64.0.0/10)
as some cloud providers use it internally. EKS requires remotePodNetworks
CIDRs to be RFC 1918 or CGNAT.
Pod Networking Data Flow¶
Hybrid ↔ Hybrid Pods (Cilium VXLAN over WireGuard)¶
No additional WireKube configuration required. Cilium handles pod routing natively through VXLAN tunneling, using node IPs as tunnel endpoints. Since WireGuard routes node IPs, the VXLAN packets are automatically encrypted and tunneled:
Pod A (198.18.0.149)
→ Cilium BPF (veth)
→ VXLAN encap (outer: nodeA:8472 → nodeB:8472)
→ kernel routing table 22347
→ wire_kube (WireGuard encrypts)
→ relay or direct
→ remote wire_kube (WireGuard decrypts)
→ VXLAN decap
→ Cilium BPF → Pod B (198.18.1.17)
EC2 → Hybrid Pods (WireGuard AllowedIPs)¶
Each hybrid node's WireKubePeer must include its Cilium pod CIDR in
AllowedIPs. The EC2 node then routes pod traffic through WireGuard to
the correct hybrid node:
EC2 Pod (10.100.0.200)
→ VPC CNI SNAT (src → 10.100.0.187)
→ kernel routing table 22347 (198.18.0.0/25 → wire_kube)
→ WireGuard → hybrid node
→ Cilium decap → Hybrid Pod (198.18.0.50)
VPC CNI automatically SNATs outbound traffic to non-VPC destinations
(AWS_VPC_K8S_CNI_EXTERNALSNAT=false by default). This ensures the
hybrid node sees the EC2 node IP as the source, which it can route back
through WireGuard.
Hybrid → EC2 Pods (WireKubeGateway routes)¶
Add the VPC subnets to the WireKubeGateway routes. Hybrid nodes route VPC-destined traffic through WireGuard to the EC2 gateway peer:
Hybrid Pod (198.18.0.50)
→ Cilium BPF (masquerade: src → nodeIP)
→ kernel routing table 22347 (10.100.0.0/24 → wire_kube)
→ WireGuard → EC2 node
→ VPC forwarding → EC2 Pod (10.100.0.200)
Prerequisites¶
AWS¶
- EKS cluster with Hybrid Nodes feature enabled
- At least one managed EC2 nodegroup (hosts relay + gateway)
- VPC with subnets and internet gateway
- IAM Roles Anywhere or SSM hybrid activations configured
Hybrid Nodes¶
- Linux servers joined to the EKS cluster via
nodeadm - Outbound internet access (STUN servers and relay LB)
- UDP port 51822 (WireGuard) — open for direct P2P if possible
- TCP port 3478 outbound (relay connection)
- WireGuard kernel module loaded
Tools¶
kubectlconfigured for the EKS clusterhelmv3 (for Cilium)awsCLI (for VPC route and EC2 configuration)
Step 1: Install CRDs¶
This installs three CRDs:
- WireKubeMesh — cluster-wide mesh configuration
- WireKubePeer — per-node state (auto-created by agent)
- WireKubeGateway — virtual gateway for cross-network routing
Step 2: Install Cilium on Hybrid Nodes¶
Hybrid nodes require a CNI. AWS recommends Cilium for EKS Hybrid Nodes.
Install from AWS Public ECR¶
helm upgrade --install cilium \
oci://public.ecr.aws/eks/cilium/cilium \
--version <VERSION> \
--namespace kube-system \
-f config/examples/eks-hybrid/cilium-values.yaml
Before installing, update cilium-values.yaml:
- Set
k8sServiceHostto your EKS API endpoint - Set
ipam.operator.clusterPoolIPv4PodCIDRListto a non-overlapping CIDR
Exclude kube-proxy from Hybrid Nodes¶
Cilium runs with kubeProxyReplacement: true, so exclude kube-proxy
from hybrid nodes:
kubectl patch ds kube-proxy -n kube-system --type merge -p '{
"spec":{"template":{"spec":{"affinity":{"nodeAffinity":{
"requiredDuringSchedulingIgnoredDuringExecution":{"nodeSelectorTerms":[{
"matchExpressions":[{
"key":"eks.amazonaws.com/compute-type",
"operator":"NotIn",
"values":["hybrid"]
}]
}]}
}}}}}}'
Critical Cilium Settings¶
| Setting | Value | Reason |
|---|---|---|
kubeProxyReplacement |
"true" |
Replace kube-proxy with BPF service routing |
socketLB.enabled |
true |
Safe to enable — agent auto-falls back to connected sockets on EPERM |
k8sServiceHost |
EKS API endpoint | Hybrid nodes cannot reach ClusterIP before CNI init |
ipam.operator.clusterPoolIPv4PodCIDRList |
10.200.0.0/16 |
RFC 1918 — must NOT overlap with VPC CIDR or cloud internal ranges |
affinity |
eks.amazonaws.com/compute-type: hybrid |
Only schedule Cilium on hybrid nodes |
socketLB and WireKube compatibility
Cilium's socket-level LB attaches BPF sendmsg hooks to the root
cgroup, which can intercept sendto(2) syscalls from hostNetwork
pods. In some Cilium versions this caused EPERM errors for the
WireKube relay proxy's loopback UDP traffic. The agent handles this
automatically by falling back to connected sockets (write(2) syscall),
which bypasses the BPF hook. The fallback is transparent with at most
one recoverable packet drop per peer. See
CNI Compatibility for details.
Step 3: Deploy WireKube¶
Namespace and RBAC¶
kubectl apply -f config/examples/eks-hybrid/namespace.yaml
kubectl apply -f config/examples/eks-hybrid/rbac.yaml
Relay Server¶
The relay deploys on the EC2 managed nodegroup and is exposed via LoadBalancer. Wait for the external endpoint:
Agent DaemonSet¶
Update KUBERNETES_SERVICE_HOST with your EKS API endpoint, then deploy:
EKS_ENDPOINT=$(aws eks describe-cluster --name <CLUSTER> \
--query 'cluster.endpoint' --output text | sed 's|https://||')
echo "Set KUBERNETES_SERVICE_HOST to: ${EKS_ENDPOINT}"
kubectl apply -f config/examples/eks-hybrid/daemonset.yaml
Why KUBERNETES_SERVICE_HOST?
Hybrid nodes cannot reach the Kubernetes service ClusterIP before CNI is ready. The agent needs API access at startup to create its WireKubePeer CRD. The EKS public endpoint bypasses this dependency.
WireKubeMesh¶
RELAY_EP=$(kubectl get svc wirekube-relay -n wirekube-system \
-o jsonpath='{.status.loadBalancer.ingress[0].hostname}'):3478
sed "s|REPLACE_WITH_RELAY_LB:3478|${RELAY_EP}|" \
config/examples/eks-hybrid/wirekubemesh.yaml | kubectl apply -f -
Verify Node Connectivity¶
kubectl get wirekubepeers \
-o custom-columns='NAME:.metadata.name,CONNECTED:.status.connected,NAT:.status.natType'
All peers should show CONNECTED=true.
Step 4: Enable VGW Gateway¶
EKS has no built-in Konnectivity service. The kube-apiserver reaches kubelet
via VPC ENIs, but hybrid nodes are outside the VPC. Without routing,
kubectl exec, kubectl logs, and kubectl port-forward fail on hybrid nodes.
The WireKubeGateway makes the EC2 node forward VPC traffic to hybrid nodes through the WireGuard tunnel.
AWS Setup¶
1. Disable Source/Dest Check¶
EC2_INSTANCE_ID=<your-ec2-instance-id>
ENI_IDS=$(aws ec2 describe-network-interfaces \
--filters "Name=attachment.instance-id,Values=${EC2_INSTANCE_ID}" \
--query 'NetworkInterfaces[].NetworkInterfaceId' --output text)
for eni in $ENI_IDS; do
aws ec2 modify-network-interface-attribute \
--network-interface-id "$eni" --no-source-dest-check
done
2. Add VPC Route Table Entries¶
Route hybrid node CIDRs to the EC2 instance's primary ENI in every route table associated with VPC subnets:
EC2_ENI=<ec2-primary-eni-id>
VPC_ID=$(aws ec2 describe-instances --instance-ids ${EC2_INSTANCE_ID} \
--query 'Reservations[0].Instances[0].VpcId' --output text)
RTB_IDS=$(aws ec2 describe-route-tables \
--filters "Name=vpc-id,Values=${VPC_ID}" \
--query 'RouteTables[].RouteTableId' --output text)
for rtb in $RTB_IDS; do
for cidr in "<HYBRID_CIDR_A>" "<HYBRID_CIDR_B>"; do
aws ec2 create-route --route-table-id "$rtb" \
--destination-cidr-block "$cidr" \
--network-interface-id "$EC2_ENI" 2>/dev/null \
|| aws ec2 replace-route --route-table-id "$rtb" \
--destination-cidr-block "$cidr" \
--network-interface-id "$EC2_ENI"
done
done
Deploy WireKubeGateway¶
Edit config/examples/eks-hybrid/gateway.yaml with your node name and CIDRs,
then apply:
Verify¶
kubectl get wirekubegateway hybrid-gateway -o jsonpath='{.status}'
kubectl exec <pod-on-hybrid-node> -- hostname
kubectl logs <pod-on-hybrid-node> --tail=5
Step 5: Enable Cross-Cloud Pod Networking¶
After Steps 1–4, node-to-node connectivity and kubectl exec/logs work.
To enable full pod-to-pod communication across EC2 and hybrid nodes:
5a. Hybrid ↔ Hybrid Pods¶
This works automatically. Cilium's VXLAN tunneling uses node IPs as endpoints, which are routed through WireGuard. No additional configuration is needed — verify with:
5b. EC2 → Hybrid Pods¶
Add each hybrid node's Cilium pod CIDR to its WireKubePeer AllowedIPs:
# Get Cilium pod CIDRs
kubectl get ciliumnodes -o custom-columns='NAME:.metadata.name,POD_CIDR:.spec.ipam.podCIDRs'
# Patch each hybrid peer
kubectl patch wirekubepeer <hybrid-node-A> --type=json \
-p='[{"op":"add","path":"/spec/allowedIPs/-","value":"<CILIUM_POD_CIDR_A>"}]'
kubectl patch wirekubepeer <hybrid-node-B> --type=json \
-p='[{"op":"add","path":"/spec/allowedIPs/-","value":"<CILIUM_POD_CIDR_B>"}]'
This creates WireGuard routes on the EC2 node so pod traffic for each hybrid pod CIDR is sent through the correct WireGuard tunnel. VPC CNI automatically SNATs the source to the EC2 node IP.
5c. Hybrid → EC2 Pods¶
Add the VPC subnets to the WireKubeGateway routes:
kubectl patch wirekubegateway hybrid-gateway --type=json -p='[
{"op":"add","path":"/spec/routes/-","value":{"cidr":"<VPC_SUBNET_A>","description":"VPC subnet A"}},
{"op":"add","path":"/spec/routes/-","value":{"cidr":"<VPC_SUBNET_B>","description":"VPC subnet B"}}
]'
The gateway injects these CIDRs into the EC2 peer's AllowedIPs. Hybrid nodes then route VPC-destined pod traffic through WireGuard to the EC2 node, which forwards it locally via VPC networking.
Verify Full Pod Connectivity¶
# EC2 pod → hybrid pod
kubectl exec <ec2-pod> -- wget -qO- --timeout=5 http://<hybrid-pod-IP>
# Hybrid pod → EC2 pod
kubectl exec <hybrid-pod> -- wget -qO- --timeout=5 http://<ec2-pod-IP>
# Hybrid pod → hybrid pod (different node)
kubectl exec <hybrid-pod-A> -- wget -qO- --timeout=5 http://<hybrid-pod-B-IP>
Routing Table Reference¶
IP Rules (on hybrid nodes)¶
| Priority | Rule | Purpose |
|---|---|---|
| 9 | fwmark 0x200/0xf00 → table 2004 |
Cilium BPF socket redirect |
| 100 | lookup local |
Loopback / local addresses |
| 100 | fwmark 0x574b → lookup main |
WireGuard socket bypass (prevents route loop) |
| 200 | lookup 22347 |
WireGuard mesh routes |
| 32766 | lookup main |
Default kernel routes |
WireGuard Routing Table (22347)¶
| Route | Example | Source |
|---|---|---|
Remote node /32 |
172.20.1.6 dev wire_kube |
autoAllowedIPs: node-internal-ip |
Pod CIDR /25 |
198.18.0.0/25 dev wire_kube |
Manual AllowedIPs patch |
| Gateway CIDR | 10.100.0.0/24 dev wire_kube |
WireKubeGateway routes |
fwmark Design¶
WireGuard's kernel module marks its own encrypted UDP packets with
fwmark 0x574b. The ip rule at priority 100 sends these to the main
routing table, preventing them from re-entering the wire_kube interface
(which would create an infinite encryption loop).
Troubleshooting¶
Recommended Initial Setup Sequence¶
When deploying on freshly joined hybrid nodes, follow this exact order:
- Install Cilium with
kubeProxyReplacement: true - Exclude kube-proxy from hybrid nodes (nodeAffinity patch)
- ⚠️ Reboot all hybrid nodes (mandatory — see below)
- Deploy WireKube (CRDs, RBAC, relay, agent, mesh)
- If peers remain
connected: false, delete WireKubePeer and restart agent
Important: Hybrid nodes MUST be rebooted once after the initial Cilium installation and before deploying WireKube. Nodes that are freshly provisioned and join the cluster for the first time do NOT need a reboot. The reboot is only required when Cilium and kube-proxy state was established before WireKube's first deployment.
Why reboot? Three types of stale kernel state interfere with WireKube:
KUBE-FIREWALLiptables chain — Created by kube-proxy before exclusion. Contains a DROP rule for non-local source packets to loopback (!127.0.0.0/8 → 127.0.0.0/8). This blocks WireGuard's relay proxy traffic even after kube-proxy pods are removed, because iptables rules persist in the kernel until flushed or rebooted.- Stale conntrack entries — kube-proxy creates conntrack state for service routing. These entries survive pod removal and may cause connection tracking conflicts with new WireGuard/relay connections.
- Cilium BPF
sendmsghook — The EKS Cilium build attachescil_sock4_sendmsgto the root cgroup. When the WireKube agent pod starts on a node where Cilium's BPF endpoint state is stale (from a previous agent pod), UDP sockets created during the registration gap receive persistentEPERMerrors onsendto(). A reboot clears all BPF maps and cgroup attachments, allowing clean endpoint registration.
A reboot cleanly resets all three: iptables chains are rebuilt from scratch, conntrack is emptied, and BPF programs are detached.
After the initial reboot, subsequent agent updates (DaemonSet rolling updates) work without rebooting — the agent handles transient EPERM by recreating the affected UDP socket after the BPF state settles.
EPERM on relay proxy¶
Symptom: relay-proxy: EPERM on port 51822, scheduling socket recreation
Cause: Cilium's BPF sendmsg hook returns EPERM on sendto(2) when
the agent pod's UDP socket was created before Cilium finished BPF endpoint
registration. This typically happens on the first WireKube deployment before
the required reboot (see setup sequence above).
Action: The agent automatically detects EPERM and recreates the UDP socket after a 3-second delay, binding to the same port. If EPERM persists after recreation (indicating the node needs a reboot), reboot the hybrid node to clear all BPF state.
Cilium VXLAN pod-to-pod fails between hybrid nodes¶
Symptom: Pods on different hybrid nodes cannot reach each other.
Diagnosis:
# Check Cilium health (should show all nodes reachable)
kubectl exec -n kube-system <cilium-pod> -- cilium-health status
# Check WireGuard handshakes
kubectl exec -n wirekube-system <agent-pod> -- wg show wire_kube
Possible causes:
- WireGuard handshake not complete — Cilium VXLAN requires working WireGuard tunnels as underlay. Restart the agent or delete/recreate the peer.
- CIDR overlap — If Cilium cluster-pool overlaps with VPC CIDR, routing
conflicts prevent VXLAN packets from reaching WireGuard. Change
clusterPoolIPv4PodCIDRListto a non-overlapping range and restart Cilium. - Stale CiliumNode — After changing the cluster-pool CIDR, delete all CiliumNode resources and restart Cilium DaemonSet:
kubectl exec/logs timeout on hybrid nodes¶
Symptom: dial tcp <hybrid-IP>:10250: i/o timeout
Fix: Deploy the WireKubeGateway (Step 4). Verify:
- EC2 Source/Dest Check is disabled
- VPC route tables have entries for hybrid CIDRs → EC2 ENI
kubectl get wirekubegatewayshowsready: true- EC2 agent logs show
[gateway] MASQUERADE added
EC2 → hybrid pod traffic dropped¶
Symptom: EC2 pods cannot reach hybrid pods by IP.
Fix:
- Ensure Cilium pod CIDRs are added to hybrid WireKubePeer AllowedIPs (Step 5b)
- Verify routes exist on EC2:
ip route show table 22347should show198.18.x.x/25 - Check VPC CNI SNAT is active:
AWS_VPC_K8S_CNI_EXTERNALSNATshould befalse
WireKubePeers show connected: false¶
Possible causes:
- Relay unreachable — Verify relay LB endpoint in WireKubeMesh
- WireGuard module not loaded —
lsmod | grep wireguardon the node - Firewall — UDP 51822 and TCP 3478 outbound must be open
- Stale state — Delete the WireKubePeer and restart the agent
Reference Files¶
All example manifests are in config/examples/eks-hybrid/:
| File | Purpose |
|---|---|
namespace.yaml |
WireKube namespace |
rbac.yaml |
ServiceAccount, ClusterRole, ClusterRoleBinding |
daemonset.yaml |
Agent DaemonSet (hostNetwork, init cleanup) |
relay.yaml |
Relay Deployment + LoadBalancer Service |
wirekubemesh.yaml |
WireKubeMesh CR (auto AllowedIPs, external relay) |
gateway.yaml |
WireKubeGateway CR (VGW for kubectl + pod routing) |
cilium-values.yaml |
Helm values for Cilium on hybrid nodes |