ADR-0005: GKE Standard with config-driven tenant node pools
Status: Accepted Date: 2026-06-15
Context
Section titled “Context”Orchestra’s production cluster ran on GKE Autopilot. Autopilot is operationally simple but bills at roughly 2× the per-vCPU/GB rate of self-managed Standard nodes. For light usage this is fine; under heavier workshop load — many concurrent, stateful RStudio/Jupyter pods spun up per attendee — the premium becomes the dominant cost.
The workload has an unusual shape: pod scheduling is driven natively by the Orchestra operator via the Workshop CRD, so we do not want pod-level Horizontal Pod Autoscaling. What we want instead is responsive infrastructure-level autoscaling that provisions raw VMs on demand and dense-packs them, then scales back to zero when idle.
Two design questions fell out of this:
- Cluster shape — what does the GKE Standard cluster look like?
- How do workshop pods land on the right nodes — and does whatever mechanism we pick lock Orchestra to GKE?
The second question matters because Orchestra is a Bioconductor community
project run at multiple venues (BioC conferences, CSHL, etc.). It must remain
installable on EKS, AKS, bare-metal, and single-node clusters (kind/minikube)
for local development and demos. The operator is cloud-neutral today
(storageClass is parameterised in pvc.py; ingress is controller/annotation
driven), and that property must not regress.
Options considered for tenant targeting:
- No targeting — let the default scheduler place workshop pods anywhere. Simple, but cannot separate small/large tenants onto right-sized pools or drive scale-to-zero per tier.
- Hardcoded GKE taints/labels in the operator — bake the tenant-pool taint
keys and node labels directly into
deployment.py. Works on GKE, but couples the operator to one cluster topology and breaks single-node/other-cloud installs. - Config-driven tier → scheduling map — the operator reads a
tier-to-(
nodeSelector,tolerations) mapping from configuration; the CRD selects a tier by name. Mirrors howstorageClassand ingress are already handled.
Decision
Section titled “Decision”Move the production cluster to GKE Standard, and express tenant targeting as a config-driven tier map (Option 3).
Cluster design (implementation spec lives in deploy/tofu/README.md;
production OpenTofu in monode/infrastructure/):
- GKE Standard (not Autopilot),
optimize-utilizationautoscaling profile for aggressive bin-packing and fast scale-down. - Two tenant node pools that scale to zero when idle, each with a taint and
label the operator can target:
small-tenant-pool— e2-medium, min 0 / max 50, tainttenant-size=small:NoSchedule, labeltenant-tier=small.large-tenant-pool— e2-standard-4, min 0 / max 20, tainttenant-size=large:NoSchedule, labeltenant-tier=large.
- GKE Image Streaming and
pd-balanced30GB nodes on both pools. - A
cluster-warmerballoon Deployment at a negativePriorityClassto hold warm capacity so the first tenant of a tier doesn’t wait on a cold node.
Tenant targeting design (the operator-side decision):
-
The operator emits
nodeSelector+tolerationson workshop pods from a configurable tier map, e.g.:tiers:small:nodeSelector: { tenant-tier: small }tolerations: [{ key: tenant-size, value: small, effect: NoSchedule }]large:nodeSelector: { tenant-tier: large }tolerations: [{ key: tenant-size, value: large, effect: NoSchedule }]default: {} # empty → schedule anywhere -
The Workshop CRD/template references a tier by name; the operator looks up the mapping. The CRD schema contains no cloud-specific nouns.
-
The taint keys, label values, and pool names are arbitrary strings supplied by configuration, not constants in Python. The GKE values above are just the production instance of that config.
-
An empty /
defaulttier emits nonodeSelectorortolerations, so single-node clusters (kind, minikube, Docker Desktop) and generic clusters schedule workshop pods normally with no setup.
Consequences
Section titled “Consequences”Positive:
- Substantially lower compute cost under load versus Autopilot, while keeping scale-to-zero so idle tenants cost nothing.
- Right-sized pools: small and large workshops no longer share one machine shape.
- Portability is preserved. Taints/tolerations/labels are vanilla Kubernetes; EKS managed node groups, AKS node pools, and manually labelled bare-metal nodes reach the same outcome with their own values. Cloud-specific autoscaling (Cluster Autoscaler, Karpenter) lives in the infra layer, invisible to the operator.
- The tier map follows the established
storageClass/ingress pattern, so it is consistent with how the chart already handles cloud variation.
Negative / trade-offs:
- GKE Standard shifts node upgrades, security patching, and system node-pool sizing onto us — operational work Autopilot absorbed.
- Balloon pods trade a small standing cost for warm-start latency; the replica count needs tuning against real workshop arrival patterns.
- The cluster-shape pieces (scale-to-zero,
optimize-utilization, Image Streaming, balloons) are GKE-specific by nature. They are deliberately confined to the infra layer; other environments would need their own equivalents, but the operator and CRD are unaffected.
Follow-up work (not yet implemented):
- The operator currently sets no
nodeSelector/tolerationson workshop pods (operator/src/resources/deployment.py), so the tainted pools would stay empty. The config-driven tier map above must be implemented, plus a way for Workshop templates to select a tier, before the GKE Standard pools carry traffic. - Open infra questions tracked in
deploy/tofu/README.md: system node-pool sizing, cutover sequencing from the current cluster, and balloon replica tuning.
Not chosen:
- No targeting was rejected because it defeats per-tier sizing and scale-to-zero, the cost levers that motivated the migration.
- Hardcoded GKE taints/labels was rejected because it couples the operator to one cluster topology and breaks single-node and non-GKE installs — a regression of Orchestra’s current cloud-neutrality.