Skip to content

GCP Autopilot Deployment

This guide covers a complete Orchestra deployment on GKE Autopilot — the recommended configuration for teams running on Google Cloud.

Cloud DNS wildcard *.orchestra.example.edu
Network LB (Traefik)
┌────┴──────────────────────────────────┐
│ GKE Autopilot cluster │
│ │
│ Traefik ──► oauth2-proxy │
│ ──► orchestra-frontend │
│ ──► orchestra-api ──► Cloud SQL Auth Proxy ──► Cloud SQL
│ ──► <session>.<domain> ──► workshop pods
└───────────────────────────────────────┘

Key choices:

  • Traefik instead of GKE native ingress (per-session Ingress compatibility)
  • Cloud SQL for PostgreSQL (managed, no in-cluster DB pod)
  • Cloud SQL Auth Proxy sidecar + Workload Identity (no password in Secrets)
  • cert-manager with Cloud DNS for wildcard TLS
  • gcloud CLI authenticated with a project
  • kubectl and helm installed
  • A domain you control with Cloud DNS managing it
Terminal window
PROJECT=my-gcp-project
REGION=us-central1
CLUSTER=orchestra
gcloud services enable container.googleapis.com \
sqladmin.googleapis.com \
iamcredentials.googleapis.com
gcloud container clusters create-auto $CLUSTER \
--project $PROJECT \
--region $REGION \
--release-channel regular
gcloud container clusters get-credentials $CLUSTER \
--project $PROJECT \
--region $REGION
Terminal window
gcloud sql instances create orchestra-db \
--project $PROJECT \
--region $REGION \
--database-version POSTGRES_18 \
--tier db-g1-small \
--storage-auto-increase
gcloud sql databases create orchestra --instance orchestra-db --project $PROJECT
# Note the instance connection name (used later for the proxy sidecar)
gcloud sql instances describe orchestra-db \
--project $PROJECT \
--format="value(connectionName)"
# → my-gcp-project:us-central1:orchestra-db

Step 3 — Workload Identity for Cloud SQL

Section titled “Step 3 — Workload Identity for Cloud SQL”

Workload Identity lets the API pod authenticate to Cloud SQL without a service account key file.

Terminal window
# Create GCP service account
gcloud iam service-accounts create orchestra-api \
--project $PROJECT \
--display-name "Orchestra API"
# Grant Cloud SQL client role
gcloud projects add-iam-policy-binding $PROJECT \
--member "serviceAccount:orchestra-api@$PROJECT.iam.gserviceaccount.com" \
--role roles/cloudsql.client
# Create the release namespace first
kubectl create namespace orchestra-system
# Bind the GCP SA to the Kubernetes SA that the API pod uses
gcloud iam service-accounts add-iam-policy-binding \
orchestra-api@$PROJECT.iam.gserviceaccount.com \
--project $PROJECT \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:$PROJECT.svc.id.goog[orchestra-system/default]"
# Annotate the Kubernetes SA (created by the chart) after install, or pre-create it:
kubectl annotate serviceaccount orchestra-operator \
--namespace orchestra-system \
iam.gke.io/gcp-service-account=orchestra-api@$PROJECT.iam.gserviceaccount.com
Terminal window
helm repo add traefik https://helm.traefik.io/traefik && helm repo update
helm install traefik traefik/traefik \
--namespace traefik \
--create-namespace \
--set service.type=LoadBalancer \
--set resources.requests.cpu=250m \
--set resources.requests.memory=512Mi
# Wait for external IP (1-2 min on Autopilot)
kubectl get svc -n traefik traefik -w
Terminal window
helm repo add jetstack https://charts.jetstack.io && helm repo update
helm install cert-manager jetstack/cert-manager \
--namespace cert-manager \
--create-namespace \
--set crds.enabled=true \
--set global.leaderElection.namespace=cert-manager

Wildcard certificates require DNS-01 validation. Your DNS provider does not need to be Google — the cluster is on GCP, but DNS is often managed separately (Cloudflare is a common choice). Pick the tab that matches your setup.

Create a Cloudflare API token with Zone → DNS → Edit permission for the relevant zone (Cloudflare dashboard → My Profile → API Tokens).

Store it as a Secret:

Terminal window
kubectl create secret generic cloudflare-api-token \
--namespace cert-manager \
--from-literal=api-token=<YOUR_CLOUDFLARE_API_TOKEN>

Create the ClusterIssuer:

cluster-issuer.yaml
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: admin@example.edu
privateKeySecretRef:
name: letsencrypt-prod-key
solvers:
- dns01:
cloudflare:
apiTokenSecretRef:
name: cloudflare-api-token
key: api-token
Terminal window
kubectl apply -f cluster-issuer.yaml

Create the wildcard certificate (same regardless of DNS provider):

wildcard-cert.yaml
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: orchestra-wildcard
namespace: orchestra-system
spec:
secretName: orchestra-wildcard-tls
dnsNames:
- "*.orchestra.example.edu"
- "orchestra.example.edu"
issuerRef:
name: letsencrypt-prod
kind: ClusterIssuer
Terminal window
kubectl apply -f wildcard-cert.yaml
# Watch issuance — DNS propagation typically takes 1-2 min with Cloudflare,
# up to 10 min with slower providers.
kubectl get certificate -n orchestra-system orchestra-wildcard -w
Terminal window
TRAEFIK_IP=$(kubectl get svc -n traefik traefik -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
echo "Traefik IP: $TRAEFIK_IP"

In the Cloudflare dashboard (or via the API/Terraform):

Type: A
Name: *.orchestra (or *.orchestra.example.edu if it's a subdomain zone)
IPv4: <TRAEFIK_IP>
Proxy status: DNS only (grey cloud) ← required; proxied mode breaks wildcard + TLS
TTL: Auto

Optionally add the apex record too:

Type: A Name: orchestra IPv4: <TRAEFIK_IP> Proxy: DNS only
  1. Go to Google Cloud Console → APIs & Services → Credentials
  2. Create an OAuth 2.0 Client ID (Web application)
  3. Add authorized redirect URI: https://app.orchestra.example.edu/oauth2/callback
  4. Copy the Client ID and Client Secret

Store your OAuth credentials in a Kubernetes Secret. This is the recommended way to manage sensitive information and avoids hardcoding secrets in your values file.

Terminal window
kubectl create secret generic orchestra-oauth-secrets \
--namespace orchestra-system \
--from-literal=client-id="YOUR_CLIENT_ID" \
--from-literal=client-secret="YOUR_CLIENT_SECRET" \
--from-literal=cookie-secret="$(python3 -c 'import secrets; print(secrets.token_hex(16))')"

Create my-values.yaml (start from values-prod.yaml in the chart):

global:
domain: "orchestra.example.edu"
# All workshop sessions will be launched in this namespace.
# The UI no longer provides a namespace selection field.
defaultNamespace: "default"
api:
image:
tag: "v0.1.0" # pin to a release
replicas: 2
resources:
requests:
cpu: 250m # Autopilot minimum
memory: 512Mi
adminEmails:
- "admin@example.edu"
# No database.existingSecret needed — Cloud SQL proxy handles auth
extraContainers:
- name: cloud-sql-proxy
image: gcr.io/cloud-sql-connectors/cloud-sql-proxy:2
args:
- "--auto-iam-authn"
- "my-gcp-project:us-central1:orchestra-db"
securityContext:
runAsNonRoot: true
runAsUser: 65532
resources:
requests:
cpu: 250m
memory: 512Mi
extraEnv:
- name: ORCHESTRA_DATABASE_URL
value: "postgresql+asyncpg://orchestra@localhost:5432/orchestra"
operator:
image:
tag: "v0.1.0"
resources:
requests:
cpu: 250m
memory: 512Mi
frontend:
image:
tag: "v0.1.0"
replicas: 2
resources:
requests:
cpu: 250m
memory: 512Mi
persistence:
storageClass: "standard-rwo" # GKE default ReadWriteOnce class
ingress:
controller: traefik
className: traefik
tls:
enabled: true
clusterIssuer: letsencrypt-prod
networkPolicy:
enabled: true
# Orchestra-chart-level oauth2-proxy settings: deploy the bundled proxy and use
# full-proxy mode (oauth2-proxy in front of the frontend — recommended for Traefik).
oauth2Proxy:
enabled: true
fullProxy: true
# Bundled oauth2-proxy SUBCHART values — MUST be at the root level (not nested
# under oauth2Proxy) so Helm passes them through to the subchart.
"oauth2-proxy":
config:
# Use the secret created in Step 7 (client-id / client-secret / cookie-secret)
existingSecret: "orchestra-oauth-secrets"
configFile: |-
email_domains = [ "example.edu" ]
upstreams = [ "http://orchestra-frontend.orchestra-system.svc.cluster.local:80" ]
extraArgs:
redirect-url: "https://app.orchestra.example.edu/oauth2/callback"
cookie-domain: ".orchestra.example.edu"
whitelist-domain: ".orchestra.example.edu"
set-xauthrequest: "true"
skip-provider-button: "true"
# Optional — restrict to specific email addresses on top of email_domains:
# authenticatedEmailsFile:
# enabled: true
# restricted_access: |-
# admin@example.edu

Install:

Terminal window
helm install orchestra deploy/charts/orchestra \
--namespace orchestra-system \
--create-namespace \
-f my-values.yaml

After install, annotate the ServiceAccount for Workload Identity (if not done in step 3):

Terminal window
kubectl annotate serviceaccount orchestra-operator \
--namespace orchestra-system \
iam.gke.io/gcp-service-account=orchestra-api@$PROJECT.iam.gserviceaccount.com

Restart the API pod to pick up the annotation:

Terminal window
kubectl rollout restart deployment/orchestra-api -n orchestra-system

Use just ship-gcp to build, push, and deploy in a single atomic step. This ensures the image tag recorded in Helm always matches what was pushed — running just build-push and just deploy-gcp separately risks a SHA mismatch if you commit between the two steps.

Terminal window
just ship-gcp

This command:

  1. Builds all three images (api, operator, frontend) for linux/amd64
  2. Tags each image with the current git SHA and :latest
  3. Applies the CRD schema (so new phase values or validation rules take effect)
  4. Runs helm upgrade with --set *.image.tag=<sha> so each Helm revision is traceable back to an exact commit (helm history orchestra -n orchestra-system)

To deploy without rebuilding (e.g. a config-only change):

Terminal window
just deploy-gcp

This still applies CRDs and pins the Helm release to the current git SHA, but skips the Docker build steps.

Terminal window
# All pods running?
kubectl get pods -n orchestra-system
# TLS certificate issued?
kubectl get certificate -n orchestra-system orchestra-wildcard
# API health check
curl https://api.orchestra.example.edu/health/ready
IssueFix
Pod stuck in Pending with resource violationIncrease resources.requests.cpu to 250m and memory to 512Mi
readOnlyRootFilesystem errors at startupAdd an emptyDir volume mounted at the path that needs writes (e.g. /tmp)
hostPath volumes rejectedUse emptyDir instead — Autopilot blocks hostPath
Node selector kubernetes.io/os: linux — harmless, all Autopilot nodes are LinuxNo action needed