GCP Autopilot Deployment
This guide covers a complete Orchestra deployment on GKE Autopilot — the recommended configuration for teams running on Google Cloud.
Architecture overview
Section titled “Architecture overview”Cloud DNS wildcard *.orchestra.example.edu │ ▼ Network LB (Traefik) │ ┌────┴──────────────────────────────────┐ │ GKE Autopilot cluster │ │ │ │ Traefik ──► oauth2-proxy │ │ ──► orchestra-frontend │ │ ──► orchestra-api ──► Cloud SQL Auth Proxy ──► Cloud SQL │ ──► <session>.<domain> ──► workshop pods └───────────────────────────────────────┘Key choices:
- Traefik instead of GKE native ingress (per-session Ingress compatibility)
- Cloud SQL for PostgreSQL (managed, no in-cluster DB pod)
- Cloud SQL Auth Proxy sidecar + Workload Identity (no password in Secrets)
- cert-manager with Cloud DNS for wildcard TLS
Prerequisites
Section titled “Prerequisites”gcloudCLI authenticated with a projectkubectlandhelminstalled- A domain you control with Cloud DNS managing it
Step 1 — Create the Autopilot cluster
Section titled “Step 1 — Create the Autopilot cluster”PROJECT=my-gcp-projectREGION=us-central1CLUSTER=orchestra
gcloud services enable container.googleapis.com \ sqladmin.googleapis.com \ iamcredentials.googleapis.com
gcloud container clusters create-auto $CLUSTER \ --project $PROJECT \ --region $REGION \ --release-channel regular
gcloud container clusters get-credentials $CLUSTER \ --project $PROJECT \ --region $REGIONStep 2 — Create the Cloud SQL instance
Section titled “Step 2 — Create the Cloud SQL instance”gcloud sql instances create orchestra-db \ --project $PROJECT \ --region $REGION \ --database-version POSTGRES_18 \ --tier db-g1-small \ --storage-auto-increase
gcloud sql databases create orchestra --instance orchestra-db --project $PROJECT
# Note the instance connection name (used later for the proxy sidecar)gcloud sql instances describe orchestra-db \ --project $PROJECT \ --format="value(connectionName)"# → my-gcp-project:us-central1:orchestra-dbStep 3 — Workload Identity for Cloud SQL
Section titled “Step 3 — Workload Identity for Cloud SQL”Workload Identity lets the API pod authenticate to Cloud SQL without a service account key file.
# Create GCP service accountgcloud iam service-accounts create orchestra-api \ --project $PROJECT \ --display-name "Orchestra API"
# Grant Cloud SQL client rolegcloud projects add-iam-policy-binding $PROJECT \ --member "serviceAccount:orchestra-api@$PROJECT.iam.gserviceaccount.com" \ --role roles/cloudsql.client
# Create the release namespace firstkubectl create namespace orchestra-system
# Bind the GCP SA to the Kubernetes SA that the API pod usesgcloud iam service-accounts add-iam-policy-binding \ orchestra-api@$PROJECT.iam.gserviceaccount.com \ --project $PROJECT \ --role roles/iam.workloadIdentityUser \ --member "serviceAccount:$PROJECT.svc.id.goog[orchestra-system/default]"
# Annotate the Kubernetes SA (created by the chart) after install, or pre-create it:kubectl annotate serviceaccount orchestra-operator \ --namespace orchestra-system \ iam.gke.io/gcp-service-account=orchestra-api@$PROJECT.iam.gserviceaccount.comStep 4 — Install Traefik
Section titled “Step 4 — Install Traefik”helm repo add traefik https://helm.traefik.io/traefik && helm repo update
helm install traefik traefik/traefik \ --namespace traefik \ --create-namespace \ --set service.type=LoadBalancer \ --set resources.requests.cpu=250m \ --set resources.requests.memory=512Mi
# Wait for external IP (1-2 min on Autopilot)kubectl get svc -n traefik traefik -wStep 5 — Install cert-manager
Section titled “Step 5 — Install cert-manager”helm repo add jetstack https://charts.jetstack.io && helm repo update
helm install cert-manager jetstack/cert-manager \ --namespace cert-manager \ --create-namespace \ --set crds.enabled=true \ --set global.leaderElection.namespace=cert-managerWildcard TLS
Section titled “Wildcard TLS”Wildcard certificates require DNS-01 validation. Your DNS provider does not need to be Google — the cluster is on GCP, but DNS is often managed separately (Cloudflare is a common choice). Pick the tab that matches your setup.
Create a Cloudflare API token with Zone → DNS → Edit permission for the
relevant zone (Cloudflare dashboard → My Profile → API Tokens).
Store it as a Secret:
kubectl create secret generic cloudflare-api-token \ --namespace cert-manager \ --from-literal=api-token=<YOUR_CLOUDFLARE_API_TOKEN>Create the ClusterIssuer:
apiVersion: cert-manager.io/v1kind: ClusterIssuermetadata: name: letsencrypt-prodspec: acme: server: https://acme-v02.api.letsencrypt.org/directory email: admin@example.edu privateKeySecretRef: name: letsencrypt-prod-key solvers: - dns01: cloudflare: apiTokenSecretRef: name: cloudflare-api-token key: api-tokenCreate a GCP service account and bind it to cert-manager via Workload Identity:
gcloud iam service-accounts create cert-manager-dns \ --project $PROJECT \ --display-name "cert-manager Cloud DNS"
gcloud projects add-iam-policy-binding $PROJECT \ --member "serviceAccount:cert-manager-dns@$PROJECT.iam.gserviceaccount.com" \ --role roles/dns.admin
gcloud iam service-accounts add-iam-policy-binding \ cert-manager-dns@$PROJECT.iam.gserviceaccount.com \ --project $PROJECT \ --role roles/iam.workloadIdentityUser \ --member "serviceAccount:$PROJECT.svc.id.goog[cert-manager/cert-manager]"
kubectl annotate serviceaccount cert-manager \ --namespace cert-manager \ iam.gke.io/gcp-service-account=cert-manager-dns@$PROJECT.iam.gserviceaccount.comCreate the ClusterIssuer:
apiVersion: cert-manager.io/v1kind: ClusterIssuermetadata: name: letsencrypt-prodspec: acme: server: https://acme-v02.api.letsencrypt.org/directory email: admin@example.edu privateKeySecretRef: name: letsencrypt-prod-key solvers: - dns01: cloudDNS: project: my-gcp-projectcert-manager supports Route53 (AWS), Azure DNS, DigitalOcean, and
many others. The
ClusterIssuer structure is the same — only the dns01 solver block differs.
Refer to the cert-manager DNS01 docs
for your provider’s configuration.
kubectl apply -f cluster-issuer.yamlCreate the wildcard certificate (same regardless of DNS provider):
apiVersion: cert-manager.io/v1kind: Certificatemetadata: name: orchestra-wildcard namespace: orchestra-systemspec: secretName: orchestra-wildcard-tls dnsNames: - "*.orchestra.example.edu" - "orchestra.example.edu" issuerRef: name: letsencrypt-prod kind: ClusterIssuerkubectl apply -f wildcard-cert.yaml# Watch issuance — DNS propagation typically takes 1-2 min with Cloudflare,# up to 10 min with slower providers.kubectl get certificate -n orchestra-system orchestra-wildcard -wStep 6 — DNS
Section titled “Step 6 — DNS”TRAEFIK_IP=$(kubectl get svc -n traefik traefik -o jsonpath='{.status.loadBalancer.ingress[0].ip}')echo "Traefik IP: $TRAEFIK_IP"In the Cloudflare dashboard (or via the API/Terraform):
Type: AName: *.orchestra (or *.orchestra.example.edu if it's a subdomain zone)IPv4: <TRAEFIK_IP>Proxy status: DNS only (grey cloud) ← required; proxied mode breaks wildcard + TLSTTL: AutoOptionally add the apex record too:
Type: A Name: orchestra IPv4: <TRAEFIK_IP> Proxy: DNS onlygcloud dns record-sets create "*.orchestra.example.edu." \ --zone=my-zone \ --type=A \ --ttl=300 \ --rrdatas=$TRAEFIK_IP
# Apex hostname (optional)gcloud dns record-sets create "orchestra.example.edu." \ --zone=my-zone --type=A --ttl=300 --rrdatas=$TRAEFIK_IPAdd an A record:
| Field | Value |
|---|---|
| Name / Host | *.orchestra (or * if the zone is already orchestra.example.edu) |
| Type | A |
| Value | <TRAEFIK_IP> |
| TTL | 300 |
Repeat for the apex (orchestra.example.edu) if needed.
Step 7 — Google OAuth credentials
Section titled “Step 7 — Google OAuth credentials”- Go to Google Cloud Console → APIs & Services → Credentials
- Create an OAuth 2.0 Client ID (Web application)
- Add authorized redirect URI:
https://app.orchestra.example.edu/oauth2/callback - Copy the Client ID and Client Secret
Create the OAuth Secret
Section titled “Create the OAuth Secret”Store your OAuth credentials in a Kubernetes Secret. This is the recommended way to manage sensitive information and avoids hardcoding secrets in your values file.
kubectl create secret generic orchestra-oauth-secrets \ --namespace orchestra-system \ --from-literal=client-id="YOUR_CLIENT_ID" \ --from-literal=client-secret="YOUR_CLIENT_SECRET" \ --from-literal=cookie-secret="$(python3 -c 'import secrets; print(secrets.token_hex(16))')"Step 8 — Install Orchestra
Section titled “Step 8 — Install Orchestra”Create my-values.yaml (start from values-prod.yaml in the chart):
global: domain: "orchestra.example.edu" # All workshop sessions will be launched in this namespace. # The UI no longer provides a namespace selection field. defaultNamespace: "default"
api: image: tag: "v0.1.0" # pin to a release replicas: 2 resources: requests: cpu: 250m # Autopilot minimum memory: 512Mi adminEmails: - "admin@example.edu" # No database.existingSecret needed — Cloud SQL proxy handles auth extraContainers: - name: cloud-sql-proxy image: gcr.io/cloud-sql-connectors/cloud-sql-proxy:2 args: - "--auto-iam-authn" - "my-gcp-project:us-central1:orchestra-db" securityContext: runAsNonRoot: true runAsUser: 65532 resources: requests: cpu: 250m memory: 512Mi extraEnv: - name: ORCHESTRA_DATABASE_URL value: "postgresql+asyncpg://orchestra@localhost:5432/orchestra"
operator: image: tag: "v0.1.0" resources: requests: cpu: 250m memory: 512Mi
frontend: image: tag: "v0.1.0" replicas: 2 resources: requests: cpu: 250m memory: 512Mi
persistence: storageClass: "standard-rwo" # GKE default ReadWriteOnce class
ingress: controller: traefik className: traefik tls: enabled: true clusterIssuer: letsencrypt-prod
networkPolicy: enabled: true
# Orchestra-chart-level oauth2-proxy settings: deploy the bundled proxy and use# full-proxy mode (oauth2-proxy in front of the frontend — recommended for Traefik).oauth2Proxy: enabled: true fullProxy: true
# Bundled oauth2-proxy SUBCHART values — MUST be at the root level (not nested# under oauth2Proxy) so Helm passes them through to the subchart."oauth2-proxy": config: # Use the secret created in Step 7 (client-id / client-secret / cookie-secret) existingSecret: "orchestra-oauth-secrets" configFile: |- email_domains = [ "example.edu" ] upstreams = [ "http://orchestra-frontend.orchestra-system.svc.cluster.local:80" ] extraArgs: redirect-url: "https://app.orchestra.example.edu/oauth2/callback" cookie-domain: ".orchestra.example.edu" whitelist-domain: ".orchestra.example.edu" set-xauthrequest: "true" skip-provider-button: "true" # Optional — restrict to specific email addresses on top of email_domains: # authenticatedEmailsFile: # enabled: true # restricted_access: |- # admin@example.eduInstall:
helm install orchestra deploy/charts/orchestra \ --namespace orchestra-system \ --create-namespace \ -f my-values.yamlAfter install, annotate the ServiceAccount for Workload Identity (if not done in step 3):
kubectl annotate serviceaccount orchestra-operator \ --namespace orchestra-system \ iam.gke.io/gcp-service-account=orchestra-api@$PROJECT.iam.gserviceaccount.comRestart the API pod to pick up the annotation:
kubectl rollout restart deployment/orchestra-api -n orchestra-systemUpgrading
Section titled “Upgrading”Use just ship-gcp to build, push, and deploy in a single atomic step. This
ensures the image tag recorded in Helm always matches what was pushed — running
just build-push and just deploy-gcp separately risks a SHA mismatch if
you commit between the two steps.
just ship-gcpThis command:
- Builds all three images (
api,operator,frontend) forlinux/amd64 - Tags each image with the current git SHA and
:latest - Applies the CRD schema (so new phase values or validation rules take effect)
- Runs
helm upgradewith--set *.image.tag=<sha>so each Helm revision is traceable back to an exact commit (helm history orchestra -n orchestra-system)
To deploy without rebuilding (e.g. a config-only change):
just deploy-gcpThis still applies CRDs and pins the Helm release to the current git SHA, but skips the Docker build steps.
Verify
Section titled “Verify”# All pods running?kubectl get pods -n orchestra-system
# TLS certificate issued?kubectl get certificate -n orchestra-system orchestra-wildcard
# API health checkcurl https://api.orchestra.example.edu/health/readyAutopilot-specific notes
Section titled “Autopilot-specific notes”| Issue | Fix |
|---|---|
Pod stuck in Pending with resource violation | Increase resources.requests.cpu to 250m and memory to 512Mi |
readOnlyRootFilesystem errors at startup | Add an emptyDir volume mounted at the path that needs writes (e.g. /tmp) |
hostPath volumes rejected | Use emptyDir instead — Autopilot blocks hostPath |
Node selector kubernetes.io/os: linux — harmless, all Autopilot nodes are Linux | No action needed |