Published April 26, 2026 ยท Guide

    How to scale idle Kubernetes namespaces to zero

    Scaling idle Kubernetes namespaces to zero is the single highest-leverage cost cut for dev and staging clusters - most teams recover roughly 70% of compute spend within a week. This guide covers when scale-to-zero is the right tool, the three main approaches, and how to set up KuberNap end to end with copy-pasteable commands.

    When scale-to-zero is the right tool

    Scale-to-zero applies when a workload is idle by default and active by exception. That description fits most non-production Kubernetes workloads: dev clusters, staging, preview environments, sandboxes, and demo environments. It does not fit:

    • Production HTTP services that need to respond in tens of milliseconds. Cold-start latency on a from-zero scale-up is real, even with optimized images.
    • Stateful databases where a restart loses the leader election or rebuilds a replica set. Scale-to-zero is for stateless and idempotent workloads.
    • System namespaces and infrastructure (kube-system, ingress, service mesh control planes). These need to run continuously.
    • Workloads with strict SLAs on first-byte response time. The wake step adds image-pull and readiness-probe latency.

    For everything else - the vast majority of dev-cluster workloads - scale-to-zero is appropriate and safe.

    Three approaches

    There are three patterns in widespread use. They differ in how they decide when to sleep and how they wake.

    1. Manual cron job

    A scheduled cron that runs kubectl scale against the namespace at a fixed time:

    # crontab
    0 18 * * 1-5  kubectl scale deploy --all -n staging --replicas=0
    0 9  * * 1-5  kubectl scale deploy --all -n staging --replicas=1

    Cheap to start, painful at scale. New deployments aren't covered automatically. Original replica counts aren't preserved (everything comes back as 1). No audit log. No idempotency guarantees. Common starting point, common cause of broken Monday-morning environments.

    2. Schedule-based controller (kube-downscaler / GoKubeDownscaler)

    An open-source controller reads annotations like downscaler/uptime: "Mon-Fri 09:00-18:00 UTC" and scales workloads accordingly. Handles new deployments. Saves original replica counts. Emits Kubernetes events on errors. Broader workload type coverage than KuberNap (DaemonSets, PDBs, Argo Rollouts).

    Right answer when "idle" can be expressed as a fixed time window. Wrong answer when your team works late, ships on weekends sometimes, or supports multiple timezones - the schedule scales workloads down regardless. See the kube-downscaler comparison for the full tradeoff.

    3. Activity-based auto-sleep (KuberNap)

    An operator detects per-deployment idleness from CPU usage, recent traffic, and pod age, and scales sleep candidates to zero. Wake is a single HTTP POST. State lives in kubernap.com/* annotations on the workloads themselves. No CRDs, no webhooks, no external database.

    The right answer when your team's idle hours are irregular and you need wake-on-demand. The rest of this guide covers the KuberNap setup.

    Setting up KuberNap (5 minutes)

    Two install paths. Both produce a KuberNap installation in the kubernap-system namespace.

    Option A: Helm

    helm install kubernap kubernap/kubernap-scanner \
      --namespace kubernap-system \
      --create-namespace

    Option B: Raw YAML

    kubectl apply -f https://kubernap.com/install.yaml

    First sleep

    Port-forward the API and sleep one deployment to confirm the install works end to end:

    kubectl port-forward -n kubernap-system svc/kubernap 8080:8080 &
    
    # Sleep one deployment
    curl -X POST \
      http://localhost:8080/api/v1/namespaces/staging/deployments/api-server/sleep \
      | jq
    
    # Or sleep the entire namespace
    curl -X POST \
      http://localhost:8080/api/v1/namespaces/staging/sleep \
      | jq

    Verifying it worked

    After a sleep call, the deployment's replica count is zero and the annotations are populated:

    kubectl get deploy -n staging api-server
    # READY UP-TO-DATE AVAILABLE
    # 0/0   0          0
    
    kubectl get deploy -n staging api-server -o jsonpath='{.metadata.annotations}' | jq
    # {
    #   "kubernap.com/state": "sleeping",
    #   "kubernap.com/slept-at": "2026-04-26T15:30:12Z",
    #   "kubernap.com/original-replicas": "3"
    # }

    The audit trail is exposed at GET /api/v1/events and includes every sleep and wake action with actor, target, timestamp, prior state, and new state:

    curl -s http://localhost:8080/api/v1/events | jq '.events[0]'
    # {
    #   "timestamp": "2026-04-26T15:30:12Z",
    #   "action": "sleep",
    #   "namespace": "staging",
    #   "deployment": "api-server",
    #   "previous_replicas": 3,
    #   "new_replicas": 0
    # }

    Waking on demand

    Wake is the symmetric operation. POST to the wake endpoint and KuberNap reads the saved annotations and restores the original replica count:

    # Wake one deployment
    curl -X POST \
      http://localhost:8080/api/v1/namespaces/staging/deployments/api-server/wake \
      | jq
    
    # Wake the whole namespace
    curl -X POST \
      http://localhost:8080/api/v1/namespaces/staging/wake \
      | jq

    Both calls are idempotent - calling wake on an already-running deployment returns success without making changes. The same is true for sleep on an already-sleeping deployment.

    Common patterns for wake automation:

    • Slack bot - a slash command like /wake staging POSTs to the wake endpoint and replies with the namespace state after wake. Most teams ship this in an afternoon and it becomes the most-used surface within a week.
    • CI hook - when a PR opens for an environment, the CI job wakes the corresponding namespace. When the PR closes or merges, a follow-up job sleeps it. Combined with a TTL controller, this approximates ephemeral environments without requiring full per-PR provisioning.
    • Ingress middleware - an ingress filter detects the first incoming request, fires a wake POST, and queues the request until the workload is ready. This is the closest thing to "transparent wake on first traffic" and works well for HTTP services with tolerant clients.

    RBAC and safety

    KuberNap requests the minimum permissions to do its job. The read-only scanner uses four verbs only:

    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      name: kubernap-scanner
    rules:
      - apiGroups: [""]
        resources: ["namespaces", "pods", "nodes"]
        verbs: ["get", "list"]
      - apiGroups: ["apps"]
        resources: ["deployments"]
        verbs: ["get", "list"]

    No create, update, delete, patch, or watch verbs in the scanner role. No access to secrets or configmaps, ever. You can render the exact YAML before applying anything:

    helm template kubernap-scanner oci://ghcr.io/milando12/kubernap-scanner

    For the operator with sleep/wake, KuberNap adds patch and update verbs on Deployments, StatefulSets, and CronJobs in the namespaces it manages. This can be scoped via RoleBinding to specific namespaces if you want to restrict the blast radius.

    The hard-coded protected namespace list - kube-system, default, production, prod, kube-public, kube-node-lease - is enforced at the operator level. Any sleep call against a protected namespace returns HTTP 403. There is no flag, environment variable, or annotation that overrides this list.

    The container itself runs as non-root (UID 65532), on a read-only filesystem, in a distroless base image with all capabilities dropped. It meets the Kubernetes Restricted Pod Security Standard out of the box.

    Related reading

    FAQ

    Is scale-to-zero safe in production?
    Don't do it without explicit, deliberate guardrails. Production workloads need to be reachable on demand. KuberNap hard-blocks the production, prod, kube-system, kube-public, kube-node-lease, and default namespaces at the operator level - there is no flag to override.
    Will my deployments come back the same way?
    Yes. KuberNap saves the original replica count in the kubernap.com/original-replicas annotation before scaling to zero. On wake, it reads that annotation and restores the exact replica count. If your replicas are managed by HPA, HPA resumes normal behavior from the restored count.
    What about StatefulSets and CronJobs?
    StatefulSets are scaled to zero and restored from the same annotation pattern. CronJobs are suspended via spec.suspend: true, and the original suspend value is saved in kubernap.com/original-suspend so wake restores the prior state. Both are handled automatically when you sleep a namespace.
    How long does sleep / wake take?
    Sleep is a single PATCH against the deployment to set replicas to zero - typically under a second. Wake is the same, in reverse. The actual pod startup time after wake depends on the application's image pull and readiness probe latency, not on KuberNap.
    What permissions does KuberNap need?
    For the read-only scanner: get and list on namespaces, deployments, pods, and nodes. For the operator with sleep / wake: add patch and update verbs on deployments, statefulsets, cronjobs, plus get and list on metrics.k8s.io pods. No access to secrets or configmaps, ever.

    Built by KuberNap - Kubby naps so your cluster doesn't have to. kubernap.com