When a Deployment is sized for one machine family (say n2d) but the scheduler places its pods on a more powerful one (n4, ~25% more CPU per core), the original requests over-provision capacity. workload-resizer watches scheduled pods, reads the assigned node’s cloud.google.com/machine-family label, and patches the pod’s requests in place using a configurable performance-unit matrix — without restarting the container.
No-restart resize
Adjustments land on running pods via the in-place resize subresource (GA in K8s 1.35). Containers keep their identity, sockets, in-memory state; the kubelet adjusts cgroup limits live.
Heterogeneous-pool aware
One Deployment, many machine families. The controller normalizes effective capacity at scheduling time using per-family performance units — calibrate once, run anywhere your cluster has nodes for.
QoS-class preserving
When a pod is Guaranteed (requests == limits), the resize mirrors the request change into the limit so the API server doesn’t reject the patch. Original template requests are captured as annotations before the first resize, so controller restarts don’t compound or undo changes.
Install
ConfigMap first, then the controller. Reversed order works too but
the controller pod crash-loops briefly until the ConfigMap lands —
applying in this order avoids that. For non-GKE clusters, edit
config.yaml first to match your nodes’ machine-family label
values.
URL=https://github.com/gke-demos/workload-resizer/releases/latest/download
kubectl apply -f $URL/config.yaml
kubectl apply -f $URL/install.yaml
The Install guide covers prerequisites, how to inventory your cluster’s machine families, picking performance units, and verifying with a sample workload.