WSL2 GPU Node: Windows PC to Kubernetes GPU Worker¶
This guide turns a freshly formatted Windows PC with an NVIDIA GPU into a Kubernetes GPU worker node, connected to your existing cluster via WireKube.
flowchart LR
subgraph cloud["Existing K8s Cluster"]
CP["control-plane"]
W1["worker-1"]
end
subgraph win["Windows PC (WSL2)"]
GPU["NVIDIA GPU"]
WSL["Ubuntu<br/>kubelet + wirekube-agent"]
end
CP <-->|"WireGuard tunnel<br/>via WireKube"| WSL
W1 <-->|"WireGuard tunnel<br/>via WireKube"| WSL
GPU -.->|"GPU passthrough"| WSL
Prerequisites¶
| Requirement | Detail |
|---|---|
| Windows | 11 22H2+ or 10 22H2+ |
| NVIDIA GPU | Any CUDA-capable GPU |
| NVIDIA Driver | 535+ (Windows side — NOT in WSL2) |
| Existing K8s cluster | kubeadm, k3s, EKS, etc. with kubectl access |
| Network | Outbound internet from the Windows PC |
Phase 1: Windows Setup¶
1.1 Install NVIDIA Driver (Windows Side)¶
Download and install the latest NVIDIA Game Ready or Studio driver from nvidia.com/drivers. Do NOT install a driver inside WSL2 — the Windows driver provides GPU access to WSL2 automatically.
After installation, verify from PowerShell:
1.2 Configure .wslconfig¶
Create C:\Users\<USERNAME>\.wslconfig in Notepad before installing WSL2:
[wsl2]
memory=16GB
swap=0
networkingMode=mirrored
kernelCommandLine=systemd.unified_cgroup_hierarchy=1 cgroup_no_v1=all
[experimental]
autoMemoryReclaim=gradual
kernelCommandLine is essential
cgroup_no_v1=all forces pure cgroup v2. Without this, Cilium's
Socket LB (kube-proxy replacement) cannot intercept pod traffic,
causing dial tcp 10.96.0.1:443: i/o timeout on all ClusterIP services.
Why mirrored networking?
networkingMode=mirrored gives WSL2 the same IP as the Windows host,
making NAT traversal simpler — WireKube's STUN sees the real router-mapped
endpoint instead of a double-NAT (Windows NAT + router NAT).
1.3 Install WSL2¶
Open PowerShell as Administrator:
# Check available distributions
wsl --list --online
# Install Ubuntu (pick the version shown in the list)
wsl --install -d Ubuntu-24.04
Distribution name varies by system
Run wsl --list --online first and use the name shown (e.g. Ubuntu,
Ubuntu-24.04). Ubuntu installs the latest available LTS.
Reboot when prompted. After reboot, Ubuntu will launch and ask you to create a username and password.
Verify:
Phase 2: WSL2 Environment Setup¶
All commands from here run inside WSL2 (Ubuntu).
2.1 Verify cgroup v2 and GPU¶
If /sys/fs/cgroup/cgroup.controllers does not exist, .wslconfig
kernelCommandLine was not applied. Double check the file path and
content, then wsl --shutdown from PowerShell and retry.
2.2 System Update¶
2.3 Configure /etc/wsl.conf¶
sudo tee /etc/wsl.conf <<'EOF'
[boot]
systemd=true
command="mount --make-shared / && mkdir -p /var/run/netns && mount --bind /var/run/netns /var/run/netns && mount --make-shared /var/run/netns && ip link set eth0 mtu 1500"
EOF
Apply immediately (without restart):
sudo mount --make-shared /
sudo mkdir -p /var/run/netns
sudo mount --bind /var/run/netns /var/run/netns
sudo mount --make-shared /var/run/netns
sudo ip link set eth0 mtu 1500
Without shared mount propagation
CNI pods (Cilium, Flannel, etc.) will fail with:
path "/var/run/netns" is mounted on "/" but it is not a shared or slave mount
2.4 Install containerd¶
sudo apt install -y containerd
# Generate default config
sudo mkdir -p /etc/containerd
containerd config default | sudo tee /etc/containerd/config.toml > /dev/null
# Enable SystemdCgroup
sudo sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.toml
sudo systemctl restart containerd
2.5 Install NVIDIA Container Toolkit¶
WSL2 uses the Windows host GPU driver, but the container runtime still needs nvidia-container-toolkit to expose the GPU inside containers.
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \
| sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \
| sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' \
| sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update
sudo apt install -y nvidia-container-toolkit
# Configure containerd to use nvidia runtime
sudo nvidia-ctk runtime configure --runtime=containerd
# Generate CDI spec (required for WSL2 — GPU is exposed via /dev/dxg, not /dev/nvidia*)
sudo mkdir -p /etc/cdi
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
sudo systemctl restart containerd
2.5 Install WireGuard¶
sudo apt install -y wireguard-tools
# Verify kernel module (built-in on most WSL2 kernels)
sudo modprobe wireguard && echo "OK" || echo "FAIL"
WSL2 Kernel WireGuard Support
The default WSL2 kernel (5.15+) includes WireGuard. If modprobe
fails, update WSL2 with wsl --update and retry.
2.6 Install Kubernetes Components¶
sudo apt install -y apt-transport-https ca-certificates curl gpg
# IMPORTANT: Match the cluster's Kubernetes version
# Check your cluster version: kubectl version
K8S_VERSION=v1.34 # <-- change to match your cluster
sudo mkdir -p /etc/apt/keyrings
curl -fsSL "https://pkgs.k8s.io/core:/stable:/${K8S_VERSION}/deb/Release.key" \
| sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo "deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] \
https://pkgs.k8s.io/core:/stable:/${K8S_VERSION}/deb/ /" \
| sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt update
sudo apt install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl
2.7 Disable Swap¶
2.8 Enable Required Kernel Modules and sysctl¶
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF
sudo modprobe overlay
sudo modprobe br_netfilter
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF
sudo sysctl --system
Phase 3: Join the Kubernetes Cluster¶
Option A: kubeadm join (kubeadm-based clusters)¶
On your existing cluster's control plane, create a join token:
On the WSL2 node:
sudo systemctl enable --now kubelet
sudo kubeadm join <API_SERVER>:6443 \
--token <TOKEN> \
--discovery-token-ca-cert-hash sha256:<HASH> \
--ignore-preflight-errors=all
kubeadm config version mismatch
If you get cannot unmarshal object into Go struct field ... extraArgs,
the cluster's kubeadm-config ConfigMap uses v1beta3 map-style
extraArgs but your kubeadm expects v1beta4 array-style. Either
update the ConfigMap or use a JoinConfiguration file:
Option B: k3s agent (k3s clusters)¶
K3S_URL="https://<K3S_SERVER>:6443"
K3S_TOKEN="<your-k3s-token>"
curl -sfL https://get.k3s.io | \
INSTALL_K3S_EXEC="agent" \
K3S_URL="${K3S_URL}" \
K3S_TOKEN="${K3S_TOKEN}" \
sh -
Verify Node Joined¶
Phase 4: Deploy WireKube¶
WireKube must be deployed before the GPU Operator. The WSL2 node is
behind NAT and not directly reachable from the cluster — without WireKube
tunnels, the control plane cannot reach kubelet (port 10250), so
kubectl exec/logs and DaemonSet pod scheduling will fail.
4.1 Install WireKube on the Cluster¶
If WireKube is not already deployed on your cluster:
kubectl apply -f config/crd/
kubectl apply -f config/agent/rbac.yaml
kubectl create namespace wirekube-system --dry-run=client -o yaml | kubectl apply -f -
kubectl apply -f config/agent/daemonset.yaml
4.2 Create or Update WireKubeMesh¶
kubectl apply -f - <<'EOF'
apiVersion: wirekube.io/v1alpha1
kind: WireKubeMesh
metadata:
name: default
spec:
listenPort: 51822
interfaceName: wire_kube
mtu: 1420
stunServers:
- stun.cloudflare.com:3478
- stun.l.google.com:19302
autoAllowedIPs:
strategy: node-internal-ip
relay:
mode: auto
provider: managed
handshakeTimeoutSeconds: 30
directRetryIntervalSeconds: 120
EOF
MTU
WSL2's default eth0 MTU is 1420. The boot command in Phase 2.3
raises it to 1500 so WireGuard MTU 1420 works without fragmentation.
If you skip that step, set this to 1360.
4.3 Verify WireKube Connectivity¶
# Check all peers
kubectl get wirekubepeers -o wide
# Check WireGuard on WSL2 node
wg show wire_kube
# Ping a cluster node through the tunnel
ping -c 3 -I wire_kube <CLUSTER_NODE_IP>
# Verify kubectl exec works
kubectl exec <any-pod-on-wsl2-node> -- hostname
Phase 5: GPU Operator¶
The NVIDIA GPU Operator automates device plugin, GPU feature discovery, and DCGM metrics exporter. WSL2 requires special handling for driver and toolkit components.
5.1 Install GPU Operator via Helm¶
WSL2 requires two flags that differ from standard deployments:
driver.enabled=false— WSL2 uses the Windows host GPU driver, not an in-cluster driver pod.toolkit.enabled=false— The operator's toolkit DaemonSet tries to create/dev/nvidia*device nodes, which don't exist on WSL2 (GPU is exposed via/dev/dxg). We already installed nvidia-container-toolkit manually in Phase 2.5.
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
helm repo update
helm install gpu-operator nvidia/gpu-operator \
--namespace gpu-operator \
--create-namespace \
--set driver.enabled=false \
--set toolkit.enabled=false
5.2 Label the WSL2 Node for GPU Discovery¶
WSL2 has no PCI bus, so NFD (Node Feature Discovery) cannot auto-detect the GPU. Add the NVIDIA PCI vendor label manually:
NODE_NAME=<wsl2-node-name>
# NFD PCI label (triggers GPU Operator to deploy on this node)
kubectl label node ${NODE_NAME} feature.node.kubernetes.io/pci-10de.present=true
5.3 Verify GPU Operator¶
# All pods on WSL2 node should be Running
kubectl get pods -n gpu-operator -o wide --field-selector spec.nodeName=<wsl2-node-name>
# Check GPU is allocatable
kubectl describe node <wsl2-node-name> | grep -A5 "Allocatable" | grep nvidia
# Should show: nvidia.com/gpu: 1
Phase 6: Test GPU Workload¶
GPU pods on WSL2 must use runtimeClassName: nvidia. Without it,
containers cannot find nvidia-smi or access the GPU — WSL2 exposes the
GPU via /dev/dxg and CDI mounts, not /dev/nvidia*.
kubectl apply -f - <<'EOF'
apiVersion: v1
kind: Pod
metadata:
name: gpu-test
spec:
runtimeClassName: nvidia
restartPolicy: Never
containers:
- name: cuda-test
image: nvidia/cuda:12.6.2-base-ubuntu24.04
command: ["nvidia-smi"]
resources:
limits:
nvidia.com/gpu: 1
EOF
# Wait and check result
kubectl logs -f gpu-test
kubectl delete pod gpu-test
You should see nvidia-smi output showing your GPU from inside the
Kubernetes pod running on the WSL2 node.
LLM Serving Test (Optional)¶
Deploy an LLM using vLLM to verify end-to-end GPU inference. For 8GB VRAM GPUs, use an AWQ-quantized model to fit within memory:
kubectl apply -f - <<'EOF'
apiVersion: v1
kind: Pod
metadata:
name: llm-test
labels:
app: llm-test
spec:
runtimeClassName: nvidia
restartPolicy: Never
containers:
- name: vllm
image: vllm/vllm-openai:latest
args:
- "--model"
- "Qwen/Qwen3-4B-AWQ"
- "--quantization"
- "awq"
- "--max-model-len"
- "4096"
- "--gpu-memory-utilization"
- "0.85"
- "--enforce-eager"
ports:
- containerPort: 8000
resources:
limits:
nvidia.com/gpu: 1
env:
- name: HF_HUB_CACHE
value: /tmp/hf-cache
EOF
Wait for the model to load (~3-5 min for first pull):
Test inference:
kubectl exec llm-test -- curl -s http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen3-4B-AWQ",
"messages": [{"role": "user", "content": "Hello, what can you do?"}],
"max_tokens": 100
}'
Clean up:
WSL2 Auto-Start (Optional)¶
WSL2 does not start services automatically on Windows boot. Create a scheduled task to start kubelet and containerd on login.
From PowerShell as Admin:
$Action = New-ScheduledTaskAction -Execute "wsl" `
-Argument "-d Ubuntu -u root -- bash -c 'systemctl start containerd && systemctl start kubelet'"
$Trigger = New-ScheduledTaskTrigger -AtLogOn
Register-ScheduledTask -TaskName "WSL2-K8s-Start" -Action $Action -Trigger $Trigger `
-Description "Start K8s services in WSL2" -RunLevel Highest
Troubleshooting¶
Pods cannot reach ClusterIP services (i/o timeout)¶
Symptom: dial tcp 10.96.0.1:443: i/o timeout from pods on WSL2 node.
hostNetwork pods work fine.
Cause: cgroup v1. Cilium's Socket LB needs cgroup v2 to attach BPF programs to pod cgroups.
Fix: Ensure .wslconfig has:
Then wsl --shutdown and verify stat /sys/fs/cgroup/cgroup.controllers.
CNI pods fail: "not a shared or slave mount"¶
Cause: WSL2 defaults to private mount propagation.
Fix: See Phase 2.3 — /etc/wsl.conf command= with mount --make-shared.
Cilium iptables error: unknown option "--transparent"¶
Symptom: iptables: unknown option "--transparent" in cilium-agent logs.
Cause: WSL2 kernel lacks xt_socket module. This only affects Cilium's
L7 transparent proxy — basic networking and service routing still work via
BPF if cgroup v2 is enabled.
MTU issues: large packets dropped¶
Symptom: Small pings work but kubectl exec, TLS, or large transfers fail.
Cause: WSL2's default eth0 MTU is 1420. WireGuard adds ~60 bytes
overhead, so WireGuard MTU 1420 on a 1420 underlay causes fragmentation.
Fix: Ensure Phase 2.3 boot command includes ip link set eth0 mtu 1500.
Verify with ip link show eth0.
nvidia-smi works in WSL2 but not in pods¶
Cause: Missing runtimeClassName: nvidia in pod spec. WSL2 exposes GPU
via /dev/dxg and CDI — without the nvidia RuntimeClass, containerd does not
mount the GPU devices into the container.
Fix: Add runtimeClassName: nvidia to the pod spec (see Phase 6).
Cilium not ready after WSL2 restart¶
Symptom: Cilium agent shows 0/1 ready after wsl --shutdown and restart.
Cause: KUBE-FIREWALL iptables chain may contain a DROP rule that blocks
loopback traffic needed by Cilium's health checks.
Fix:
WireGuard module not found¶
Cause: WSL2 kernel too old or missing module.
Fix: Update WSL2 from PowerShell: wsl --update, then wsl --shutdown
and retry. The default WSL2 kernel 5.15+ includes WireGuard.
NAT traversal: double NAT¶
If WireKube detects Symmetric NAT and cannot establish direct P2P:
- Ensure
.wslconfighasnetworkingMode=mirrored - Consider port-forwarding UDP 51822 on your router to the Windows PC
- Deploy a WireKube relay if not already running
- Check NAT type:
kubectl get wirekubepeer <node> -o jsonpath='{.status.natType}'
Next Steps¶
- Configuration — Relay modes, STUN servers, and mesh options
- NAT Traversal — How WireKube handles different NAT types
- Monitoring — Prometheus metrics and Grafana dashboards
- EKS Hybrid Nodes — Production deployment with EKS