Published April 26, 2026 · Original data

    Why your dev cluster wastes 75% of compute

    A typical 10-namespace, 200-deployment dev cluster wastes roughly $1,520 per month on idle nights and weekends - about 76% of its total compute spend. That's $18,240 per year of compute paying rent on workloads no one is touching. The math is simple, the failure modes of common fixes are consistent, and the answer that actually works is durable.

    The math

    A dev cluster is billed for every hour it exists - 8,760 hours per year. The workloads in those clusters are actively used roughly 20–25% of the time: weekday business hours minus meetings, focus time, and breaks. Nights, weekends, and holidays are pure waste.

    Using a conservative idle-hours model:

    168 hours per week (24 × 7)
     -40 hours actively used (5 weekdays × 8 hours)
     = 128 hours idle per week (76% of the week)
    
     80 weeknight hours (5 days × 16 non-work hours)
     +48 weekend hours (2 days × 24)
     = 128 hours sleepable
    
     × 0.65 vacation factor (holidays, PTO, sick days)
     = ~83 effective sleepable hours per week

    For a cluster with 200 deployments averaging 0.5 vCPU and 1 GiB of memory each, on-demand pricing on AWS comes to $4,365/month in raw compute, and the sleepable share is roughly $2,162/month - about 76% of total spend, recoverable if you can sleep workloads when they're not in use.

    Where the waste shows up

    The waste is not concentrated in one obvious place - it's distributed across the entire dev-cluster footprint:

    • Empty pods running idle. Web services, internal dashboards, and admin tools that haven't received a request in 18 hours but still hold their replica count and consume their declared CPU/memory budget.
    • HPA min replicas. Workloads with minReplicas: 2 for "high availability" in dev. HPA never scales below that floor - even at 3am with zero requests.
    • Weekend cron and scheduled jobs. Backup jobs, sync jobs, and metric scrapers that run nightly in dev for no reason other than the schedule was copied from production.
    • Long-lived preview environments. PR-per-environment setups where the env outlives the PR by weeks because nobody owns the tear-down.
    • Resource-hungry sidecars. Service mesh proxies, log shippers, and tracing agents attached to every pod, idling alongside their host.

    What teams typically try first

    The first attempt is almost always a manual cron job:

    # crontab on a jump box, 2021
    0 18 * * 1-5  kubectl scale deploy --all -n staging --replicas=0
    0 9  * * 1-5  kubectl scale deploy --all -n staging --replicas=1

    It works until it doesn't. The script doesn't know about new deployments - anything added after the cron was written silently misses the schedule. The schedule doesn't account for a developer working late or a 2am on-call deploy. There's no audit log; if a deployment comes back wrong on Monday morning, nobody knows what the cron did. There's no idempotency: running the script twice with --replicas=0 against an already-zero deployment corrupts the "saved state" if you ever want to restore the original count. And the person who wrote it leaves the company within 18 months.

    The second attempt is usually a schedule-based controller like kube-downscaler or GoKubeDownscaler. Both are well-engineered open-source tools that beat the cron job on every dimension: they handle new deployments automatically, save original replica counts, and emit Kubernetes events. They still require the team's idle hours to be expressible as a fixed time window - which works for predictable office-hours workflows and breaks for irregular hours, multiple timezones, and on-call deploys.

    The third attempt is usually a homegrown operator that watches ingress traffic and scales workloads up and down on observed activity. These projects start ambitiously and tend to stall around the second edge case - what happens when the activity signal is wrong, what happens when wake takes 90 seconds because the image isn't cached on the node, what happens when a developer hits the namespace via port- forward instead of ingress. The reasoning is always sound; the operational surface is always larger than expected.

    What works

    The combination that holds up in real teams:

    1. Auto-detect idleness from activity. Use CPU usage, recent traffic, and pod age signals - not a clock. A deployment with 0% CPU and zero pod restarts in 4 hours is idle regardless of what time it is.
    2. Scale the candidates to zero. Save the original replica count in an annotation, set replicas to zero. The Deployment object stays in the cluster; only the pods go away.
    3. Wake on demand via a single API call. When a developer needs the namespace, a one-second HTTP POST restores every workload to its original state. No schedule override, no waiting.
    4. Hard-block production namespaces. The path of least resistance for a tool like this is to creep into production. Make that impossible at the operator level - refuse to sleep production, prod, kube-system, and the other system namespaces with no override flag.
    5. Record an audit trail. Every sleep and wake action needs a record: actor, target, timestamp, prior state. When something breaks Monday morning, the first question is always "what scaled what when?"

    This is the design KuberNap implements. Activity-based detection backed by a 0–100 idle score (40% CPU, 40% traffic, 20% age), per-deployment scale-to-zero with state stored in kubernap.com/* annotations, wake-on-demand HTTP API, hard-coded protected namespaces, and an event log at GET /api/v1/events.

    Original data: per-cluster savings across providers

    Numbers below are calculated from the KuberNap pricing engine (web/src/lib/pricing.ts) using a sample 200-deployment cluster (10 namespaces × 20 deployments) at 0.5 vCPU and 1 GiB memory per deployment. "On-demand" is the recoverable monthly waste with no commitment discount; "with 50% commitment" applies a 40% discount to half the workload to reflect typical Reserved Instance / CUD enrollment.

    ProviderMonthly computeRecoverable on-demandWith 50% commitment
    AWS (EKS)$4,365$2,162/mo$1,730/mo
    GCP (GKE)$3,103$1,536/mo$1,229/mo
    Azure (AKS)$3,927$1,945/mo$1,556/mo
    Blended default$3,212$1,591/mo$1,273/mo

    Methodology: idle-hours model with 0.65 vacation factor. The pricing engine uses PROVIDER_RATES from public general-purpose instance rates per provider. Numbers round to the nearest dollar; the engine's full output includes a ±28% range for sensitivity. See the 2026 cost-cutting guide for additional levers (spot/preemptible, right-sizing).

    Related reading

    FAQ

    How much do typical dev clusters waste?
    A team running 10 namespaces with 20 deployments each at roughly $200 per namespace per month wastes about $1,520 per month - around 76% of total compute spend - on idle nights and weekends. That is roughly $18,240 per cluster per year of compute paying rent on workloads no one is using.
    Why doesn't HPA solve this?
    HPA tunes replica count of running workloads based on CPU or custom metrics. It does not decide whether a workload should exist. A deployment with replicas: 1 and zero traffic still costs the same as a deployment with replicas: 1 and active traffic. HPA scales workloads; it does not toggle them on or off.
    Why doesn't a cron job that scales staging at 6pm work?
    It works until it doesn't. The script doesn't know about new deployments. It can't account for a developer working late or a 2am incident deploy. There is no audit log. There is no idempotency, so running it twice corrupts the saved state. The person who wrote it leaves the company. This is the most common starting point and the most common cause of broken Friday deploys.
    Does scaling pods to zero actually save money?
    On per-pod billing platforms (GKE Autopilot, EKS Fargate) yes, immediately - the per-pod charge stops. On standard node-based clusters (GKE, EKS, AKS, on-prem), pods scaling to zero reduces pod count, and your node autoscaler (Karpenter or Cluster Autoscaler) then removes the now-empty nodes. The actual savings come from removed nodes.
    What's the typical idle threshold?
    KuberNap defaults to 75/100 on its composite idle score (40% CPU, 40% traffic, 20% pod age). Below that the deployment is treated as active; at or above, it becomes a sleep candidate. The threshold is configurable via the IDLE_THRESHOLD environment variable.

    Built by KuberNap - Kubby naps so your cluster doesn't have to. kubernap.com