GKE Best Practices for Optimal Performance

GKE Best Practices

What is GKE?

Google Kubernetes Engine is Google Cloud’s managed Kubernetes service — Google manages the control plane, you manage the worker nodes (or let Autopilot manage everything).

			
GKE Modes:
┌─────────────────────────────────────────────────────────────┐
│  Standard Mode              │  Autopilot Mode               │
│  ─────────────              │  ───────────────              │
│  You manage node pools      │  Google manages everything    │
│  You choose machine types   │  Pay per pod not node         │
│  Full node customization    │  No node management           │
│  More control               │  More managed/serverless      │
│  Best for: complex workloads│  Best for: simplicity         │
└─────────────────────────────────────────────────────────────┘

		

1. Cluster Architecture Best Practices

Use Regional Clusters (Not Zonal)

			
# ❌ Zonal — single point of failure
gcloud container clusters create my-cluster \
  --zone us-central1-a
# ✅ Regional — control plane + nodes across 3 zones
gcloud container clusters create my-cluster \
  --region us-central1 \
  --num-nodes 2              # 2 per zone = 6 total nodes

		

			
Zonal Cluster:              Regional Cluster:
us-central1-a               us-central1-a  us-central1-b  us-central1-c
  control plane               control        control        control
  node node node              plane          plane          plane
                              node node      node node      node node
  Zone fails = cluster down   Zone fails = cluster healthy

		

Separate Node Pools by Workload Type

			
# System node pool — for cluster components
gcloud container node-pools create system-pool \
  --cluster my-cluster \
  --region us-central1 \
  --machine-type n2-standard-2 \
  --num-nodes 1 \
  --node-taints CriticalAddonsOnly=true:NoSchedule \
  --node-labels pool=system
# Application node pool — for your apps
gcloud container node-pools create app-pool \
  --cluster my-cluster \
  --region us-central1 \
  --machine-type n2-standard-4 \
  --num-nodes 2 \
  --enable-autoscaling \
  --min-nodes 1 \
  --max-nodes 10 \
  --node-labels pool=application
# GPU node pool — for ML workloads
gcloud container node-pools create gpu-pool \
  --cluster my-cluster \
  --region us-central1 \
  --machine-type n1-standard-4 \
  --accelerator type=nvidia-tesla-t4,count=1 \
  --num-nodes 0 \
  --enable-autoscaling \
  --min-nodes 0 \
  --max-nodes 5 \
  --node-taints nvidia.com/gpu=present:NoSchedule
# Spot node pool — for batch / fault-tolerant workloads
gcloud container node-pools create spot-pool \
  --cluster my-cluster \
  --region us-central1 \
  --machine-type n2-standard-4 \
  --spot \
  --enable-autoscaling \
  --min-nodes 0 \
  --max-nodes 20

		

Terraform Cluster Setup

			
# main.tf
resource "google_container_cluster" "primary" {
  name     = "prod-cluster"
  location = "us-central1"      # regional
  # Remove default node pool — use custom ones
  remove_default_node_pool = true
  initial_node_count       = 1
  # Networking
  network    = google_compute_network.vpc.name
  subnetwork = google_compute_subnetwork.subnet.name
  networking_config {
    enable_intra_node_visibility = true
  }
  ip_allocation_policy {
    cluster_secondary_range_name  = "pods"
    services_secondary_range_name = "services"
  }
  # Private cluster — no public node IPs
  private_cluster_config {
    enable_private_nodes    = true
    enable_private_endpoint = false
    master_ipv4_cidr_block  = "172.16.0.0/28"
  }
  # Authorized networks for control plane access
  master_authorized_networks_config {
    cidr_blocks {
      cidr_block   = "10.0.0.0/8"
      display_name = "internal"
    }
    cidr_blocks {
      cidr_block   = var.office_ip
      display_name = "office"
    }
  }
  # Security
  workload_identity_config {
    workload_pool = "${var.project_id}.svc.id.goog"
  }
  # Enable addons
  addons_config {
    horizontal_pod_autoscaling { disabled = false }
    http_load_balancing        { disabled = false }
    network_policy_addon       { disabled = false }
    gce_persistent_disk_csi_driver_config { enabled = true }
    gcs_fuse_csi_driver_config            { enabled = true }
  }
  # Enable network policy
  network_policy {
    enabled  = true
    provider = "CALICO"
  }
  # Cluster autoscaling
  cluster_autoscaling {
    enabled = true
    resource_limits {
      resource_type = "cpu"
      minimum       = 4
      maximum       = 100
    }
    resource_limits {
      resource_type = "memory"
      minimum       = 16
      maximum       = 400
    }
    auto_provisioning_defaults {
      service_account = google_service_account.nodes.email
      oauth_scopes    = ["https://www.googleapis.com/auth/cloud-platform"]
    }
  }
  # Maintenance window
  maintenance_policy {
    recurring_window {
      start_time = "2024-01-01T02:00:00Z"
      end_time   = "2024-01-01T06:00:00Z"
      recurrence = "FREQ=WEEKLY;BYDAY=SA,SU"
    }
  }
  # Logging and monitoring
  logging_config {
    enable_components = [
      "SYSTEM_COMPONENTS",
      "WORKLOADS"
    ]
  }
  monitoring_config {
    enable_components = [
      "SYSTEM_COMPONENTS",
      "WORKLOADS"
    ]
    managed_prometheus {
      enabled = true
    }
  }
  # Release channel — get automatic updates
  release_channel {
    channel = "REGULAR"
  }
}
# System node pool
resource "google_container_node_pool" "system" {
  name       = "system-pool"
  cluster    = google_container_cluster.primary.name
  location   = "us-central1"
  node_count = 1
  node_config {
    machine_type    = "n2-standard-2"
    service_account = google_service_account.nodes.email
    oauth_scopes    = ["https://www.googleapis.com/auth/cloud-platform"]
    workload_metadata_config {
      mode = "GKE_METADATA"       # Workload Identity
    }
    shielded_instance_config {
      enable_secure_boot          = true
      enable_integrity_monitoring = true
    }
    taint {
      key    = "CriticalAddonsOnly"
      value  = "true"
      effect = "NO_SCHEDULE"
    }
    labels = {
      pool = "system"
    }
  }
  management {
    auto_repair  = true
    auto_upgrade = true
  }
}
# Application node pool with autoscaling
resource "google_container_node_pool" "application" {
  name     = "app-pool"
  cluster  = google_container_cluster.primary.name
  location = "us-central1"
  autoscaling {
    min_node_count = 1
    max_node_count = 10
  }
  node_config {
    machine_type    = "n2-standard-4"
    disk_size_gb    = 100
    disk_type       = "pd-ssd"
    service_account = google_service_account.nodes.email
    oauth_scopes    = ["https://www.googleapis.com/auth/cloud-platform"]
    workload_metadata_config {
      mode = "GKE_METADATA"
    }
    shielded_instance_config {
      enable_secure_boot          = true
      enable_integrity_monitoring = true
    }
    labels = {
      pool = "application"
      env  = "production"
    }
  }
  management {
    auto_repair  = true
    auto_upgrade = true
  }
  upgrade_settings {
    max_surge       = 1
    max_unavailable = 0
  }
}

		

2. Security Best Practices

Workload Identity (No Service Account Keys)

			
# Create GCP service account
gcloud iam service-accounts create api-sa \
  --display-name="API Service Account"
# Grant permissions to GCP SA
gcloud projects add-iam-policy-binding $PROJECT_ID \
  --member="serviceAccount:api-sa@$PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/storage.objectViewer"
# Create Kubernetes service account
kubectl create serviceaccount api-ksa -n production
# Bind K8s SA to GCP SA
gcloud iam service-accounts add-iam-policy-binding \
  api-sa@$PROJECT_ID.iam.gserviceaccount.com \
  --role="roles/iam.workloadIdentityUser" \
  --member="serviceAccount:$PROJECT_ID.svc.id.goog[production/api-ksa]"
# Annotate K8s SA
kubectl annotate serviceaccount api-ksa \
  -n production \
  iam.gke.io/gcp-service-account=api-sa@$PROJECT_ID.iam.gserviceaccount.com

		

			
# Pod uses Workload Identity — no key files needed
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
  namespace: production
spec:
  template:
    spec:
      serviceAccountName: api-ksa     # ← K8s SA with WI annotation
      containers:
      - name: api
        image: gcr.io/myproject/api:latest
        # GCP SDK auto-detects credentials via metadata server
        # No GOOGLE_APPLICATION_CREDENTIALS needed

		

Pod Security Standards

			
# Enforce restricted security for namespace
apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted
---
# Pod that meets restricted standards
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
  namespace: production
spec:
  template:
    spec:
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        runAsGroup: 1000
        fsGroup: 1000
        seccompProfile:
          type: RuntimeDefault
      containers:
      - name: api
        image: gcr.io/myproject/api:latest
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          capabilities:
            drop:
            - ALL
        volumeMounts:
        - name: tmp
          mountPath: /tmp              # writable tmp dir
        - name: cache
          mountPath: /app/cache
      volumes:
      - name: tmp
        emptyDir: {}
      - name: cache
        emptyDir: {}

		

Binary Authorization

			
# Enable Binary Authorization
gcloud services enable binaryauthorization.googleapis.com
# Create attestor — only signed images can deploy
gcloud container binauthz attestors create production-attestor \
  --attestation-authority-note=projects/$PROJECT_ID/notes/production-note \
  --attestation-authority-note-project=$PROJECT_ID
# Set policy — require attestation
cat > /tmp/policy.yaml << EOF
defaultAdmissionRule:
  evaluationMode: REQUIRE_ATTESTATION
  requireAttestationsBy:
  - projects/$PROJECT_ID/attestors/production-attestor
  enforcementMode: ENFORCED_BLOCK_AND_AUDIT_LOG
EOF
gcloud container binauthz policy import /tmp/policy.yaml

		

Network Policies

			
# Default deny all
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress
---
# Allow api to reach database only
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: api-to-db
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: database
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: api
    ports:
    - port: 5432
---
# Allow egress to Google APIs
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-google-apis
  namespace: production
spec:
  podSelector: {}
  egress:
  - to:
    - ipBlock:
        cidr: 199.36.153.8/30    # restricted.googleapis.com
    ports:
    - port: 443
  - to:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: kube-system
    ports:
    - port: 53
      protocol: UDP              # DNS

		

Secret Management with Secret Manager

			
# Use External Secrets Operator to sync GCP secrets → K8s secrets
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
  name: gcp-secret-store
  namespace: production
spec:
  provider:
    gcpsm:
      projectID: my-project-id
---
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: api-secrets
  namespace: production
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: gcp-secret-store
    kind: SecretStore
  target:
    name: api-secrets           # creates K8s secret
    creationPolicy: Owner
  data:
  - secretKey: db-password
    remoteRef:
      key: prod/api/db-password
  - secretKey: api-key
    remoteRef:
      key: prod/api/external-api-key

		

3. Resource Management Best Practices

Always Set Resource Requests and Limits

			
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
spec:
  template:
    spec:
      containers:
      - name: api
        image: gcr.io/myproject/api:latest
        resources:
          requests:
            cpu: "250m"          # guaranteed CPU
            memory: "256Mi"      # guaranteed memory
          limits:
            cpu: "500m"          # max CPU (throttled if exceeded)
            memory: "512Mi"      # max memory (OOM killed if exceeded)

		

LimitRange — Default Limits per Namespace

			
apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: production
spec:
  limits:
  - type: Container
    default:                     # default limit if not set
      cpu: "500m"
      memory: "512Mi"
    defaultRequest:              # default request if not set
      cpu: "100m"
      memory: "128Mi"
    max:                         # hard max per container
      cpu: "4"
      memory: "8Gi"
    min:                         # minimum per container
      cpu: "50m"
      memory: "64Mi"
  - type: Pod
    max:
      cpu: "8"
      memory: "16Gi"
  - type: PersistentVolumeClaim
    max:
      storage: "100Gi"

		

ResourceQuota per Namespace

			
apiVersion: v1
kind: ResourceQuota
metadata:
  name: production-quota
  namespace: production
spec:
  hard:
    # Compute
    requests.cpu: "20"
    requests.memory: 40Gi
    limits.cpu: "40"
    limits.memory: 80Gi
    # Objects
    pods: "100"
    services: "20"
    persistentvolumeclaims: "20"
    secrets: "50"
    configmaps: "50"
    # Service types
    services.loadbalancers: "3"
    services.nodeports: "0"

		

Priority Classes

			
# Define priority classes
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: critical
value: 1000000
globalDefault: false
description: "Critical production services"
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high
value: 100000
description: "Important production services"
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: low
value: 1000
description: "Batch and background jobs"
---
# Use in deployment
spec:
  template:
    spec:
      priorityClassName: critical   # ← won't be evicted for lower priority

		

4. Autoscaling Best Practices

Horizontal Pod Autoscaler

			
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  minReplicas: 3
  maxReplicas: 50
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 60    # scale at 60% CPU
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 70
  - type: External
    external:
      metric:
        name: pubsub.googleapis.com|subscription|num_undelivered_messages
        selector:
          matchLabels:
            resource.labels.subscription_id: my-subscription
      target:
        type: AverageValue
        averageValue: "100"
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
      - type: Percent
        value: 100              # double pods in one step
        periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300   # wait 5min before scale down
      policies:
      - type: Pods
        value: 2
        periodSeconds: 60

		

Vertical Pod Autoscaler

			
# Install VPA first
# kubectl apply -f https://github.com/kubernetes/autoscaler/releases/latest
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api
  updatePolicy:
    updateMode: "Off"          # Recommend only — don't auto-update
    # Options: Off | Initial | Recreate | Auto
  resourcePolicy:
    containerPolicies:
    - containerName: api
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: "2"
        memory: 2Gi
      controlledResources:
      - cpu
      - memory
---
# Check VPA recommendations
# kubectl describe vpa api-vpa -n production
# Look for: Status.Recommendation.ContainerRecommendations

		

Cluster Autoscaler Best Practices

			
# Pod Disruption Budget — prevent CA from evicting too many pods
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-pdb
  namespace: production
spec:
  minAvailable: 2             # keep at least 2 pods running
  # OR
  # maxUnavailable: 1         # allow at most 1 pod down
  selector:
    matchLabels:
      app: api

		

			
# Configure cluster autoscaler behavior
gcloud container clusters update my-cluster \
  --region us-central1 \
  --autoscaling-profile optimize-utilization  # or balanced
# Set scale-down delay
gcloud container node-pools update app-pool \
  --cluster my-cluster \
  --region us-central1 \
  --autoscaling-profile optimize-utilization

		

5. Networking Best Practices

Use Private Cluster with VPC-Native Networking

			
# Create VPC and subnets
gcloud compute networks create prod-vpc \
  --subnet-mode custom
gcloud compute networks subnets create prod-subnet \
  --network prod-vpc \
  --region us-central1 \
  --range 10.0.0.0/20 \
  --secondary-range pods=10.4.0.0/14,services=10.0.16.0/20
# Create private cluster
gcloud container clusters create prod-cluster \
  --region us-central1 \
  --network prod-vpc \
  --subnetwork prod-subnet \
  --cluster-secondary-range-name pods \
  --services-secondary-range-name services \
  --enable-private-nodes \
  --master-ipv4-cidr 172.16.0.0/28 \
  --enable-ip-alias

		

Cloud Armor WAF for Ingress

			
# BackendConfig — attach Cloud Armor policy
apiVersion: cloud.google.com/v1
kind: BackendConfig
metadata:
  name: api-backend-config
  namespace: production
spec:
  securityPolicy:
    name: prod-waf-policy     # Cloud Armor policy name
  connectionDraining:
    drainingTimeoutSec: 60
  healthCheck:
    checkIntervalSec: 15
    timeoutSec: 15
    healthyThreshold: 1
    unhealthyThreshold: 2
    type: HTTP
    requestPath: /health
    port: 8080
---
# Service references BackendConfig
apiVersion: v1
kind: Service
metadata:
  name: api-service
  namespace: production
  annotations:
    cloud.google.com/backend-config: '{"default":"api-backend-config"}'
    cloud.google.com/neg: '{"ingress": true}'    # Container-native LB
spec:
  selector:
    app: api
  ports:
  - port: 80
    targetPort: 8080
  type: ClusterIP
---
# GKE Ingress with HTTPS and managed cert
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: api-ingress
  namespace: production
  annotations:
    kubernetes.io/ingress.class: gce
    kubernetes.io/ingress.global-static-ip-name: prod-ip
    networking.gke.io/managed-certificates: api-cert
    kubernetes.io/ingress.allow-http: "false"
spec:
  rules:
  - host: api.acme.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: api-service
            port:
              number: 80

		

			
# Create Cloud Armor WAF policy
gcloud compute security-policies create prod-waf-policy \
  --description "Production WAF policy"
# Enable OWASP rules
gcloud compute security-policies rules create 1000 \
  --security-policy prod-waf-policy \
  --expression "evaluatePreconfiguredExpr('xss-v33-stable')" \
  --action deny-403
# Rate limiting
gcloud compute security-policies rules create 2000 \
  --security-policy prod-waf-policy \
  --expression "true" \
  --action throttle \
  --rate-limit-threshold-count 1000 \
  --rate-limit-threshold-interval-sec 60 \
  --conform-action allow \
  --exceed-action deny-429

		

6. Reliability Best Practices

Pod Anti-Affinity — Spread Across Zones

			
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
  namespace: production
spec:
  replicas: 6
  template:
    spec:
      # Spread across zones
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app: api
      # Don't put two api pods on same node
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchLabels:
                app: api
            topologyKey: kubernetes.io/hostname

		

Readiness and Liveness Probes

			
containers:
- name: api
  image: gcr.io/myproject/api:latest
  ports:
  - containerPort: 8080
  # Liveness — restart pod if unhealthy
  livenessProbe:
    httpGet:
      path: /healthz
      port: 8080
    initialDelaySeconds: 30    # wait before first check
    periodSeconds: 10
    failureThreshold: 3        # fail 3 times = restart
    timeoutSeconds: 5
  # Readiness — remove from LB if not ready
  readinessProbe:
    httpGet:
      path: /ready
      port: 8080
    initialDelaySeconds: 10
    periodSeconds: 5
    failureThreshold: 3
    successThreshold: 1
  # Startup — for slow-starting apps
  startupProbe:
    httpGet:
      path: /healthz
      port: 8080
    failureThreshold: 30       # allow 5 min to start
    periodSeconds: 10

		

Graceful Shutdown

			
containers:
- name: api
  lifecycle:
    preStop:
      exec:
        command:
        - /bin/sh
        - -c
        - sleep 15             # allow LB to drain connections
  # Allow time for preStop + app shutdown
  terminationGracePeriodSeconds: 60

		

7. Cost Optimization Best Practices

Use Spot VMs for Non-Critical Workloads

			
# Schedule batch jobs on spot nodes
apiVersion: batch/v1
kind: Job
metadata:
  name: data-processing
spec:
  template:
    spec:
      # Target spot node pool
      nodeSelector:
        cloud.google.com/gke-spot: "true"
      tolerations:
      - key: "cloud.google.com/gke-spot"
        operator: Equal
        value: "true"
        effect: NoSchedule
      # Handle spot preemption gracefully
      terminationGracePeriodSeconds: 25   # spot gives 30s warning
      restartPolicy: OnFailure
      containers:
      - name: processor
        image: gcr.io/myproject/processor:latest

		

Committed Use Discounts

			
# Purchase committed use for baseline workloads
gcloud compute commitments create prod-commitment \
  --plan 1-year \
  --region us-central1 \
  --resources vcpu=20,memory=80GB
# Savings: ~37% for 1-year, ~55% for 3-year

		

Node Auto-Provisioning with Resource Limits

			
# Set cluster-level resource limits for NAP
gcloud container clusters update prod-cluster \
  --region us-central1 \
  --enable-autoprovisioning \
  --max-cpu 100 \
  --max-memory 400 \
  --min-cpu 4 \
  --min-memory 16 \
  --autoprovisioning-scopes=https://www.googleapis.com/auth/cloud-platform

		

8. Observability Best Practices

Google Cloud Managed Prometheus

			
# Enable managed Prometheus (built into GKE)
gcloud container clusters update prod-cluster \
  --region us-central1 \
  --enable-managed-prometheus
# Deploy PodMonitoring to scrape your apps

		

			
apiVersion: monitoring.googleapis.com/v1
kind: PodMonitoring
metadata:
  name: api-monitoring
  namespace: production
spec:
  selector:
    matchLabels:
      app: api
  endpoints:
  - port: metrics
    interval: 30s
    path: /metrics

		

Structured Logging

			
# Always log in JSON format for Cloud Logging
import json
import logging
class JsonFormatter(logging.Formatter):
    def format(self, record):
        return json.dumps({
            "severity":  record.levelname,
            "message":   record.getMessage(),
            "timestamp": self.formatTime(record),
            "component": record.name,
            "httpRequest": getattr(record, "httpRequest", None),
            "labels": {
                "service": "api",
                "version": "v2",
                "env":     "production"
            }
        })

		

Cloud Trace Integration

			
# Auto-instrument with OpenTelemetry
from opentelemetry import trace
from opentelemetry.exporter.cloud_trace import CloudTraceSpanExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
provider = TracerProvider()
provider.add_span_processor(
    BatchSpanProcessor(CloudTraceSpanExporter())
)
trace.set_tracer_provider(provider)

		

9. CI/CD Best Practices

Cloud Build + Artifact Registry

			
# cloudbuild.yaml
steps:
# Build image
- name: gcr.io/cloud-builders/docker
  args:
  - build
  - -t
  - us-central1-docker.pkg.dev/$PROJECT_ID/prod/api:$SHORT_SHA
  - -t
  - us-central1-docker.pkg.dev/$PROJECT_ID/prod/api:latest
  - .
# Scan for vulnerabilities
- name: gcr.io/cloud-builders/gcloud
  args:
  - artifacts
  - docker
  - images
  - scan
  - us-central1-docker.pkg.dev/$PROJECT_ID/prod/api:$SHORT_SHA
  - --format=json
# Push to Artifact Registry
- name: gcr.io/cloud-builders/docker
  args:
  - push
  - --all-tags
  - us-central1-docker.pkg.dev/$PROJECT_ID/prod/api
# Deploy to GKE
- name: gcr.io/cloud-builders/kubectl
  args:
  - set
  - image
  - deployment/api
  - api=us-central1-docker.pkg.dev/$PROJECT_ID/prod/api:$SHORT_SHA
  - -n
  - production
  env:
  - CLOUDSDK_COMPUTE_REGION=us-central1
  - CLOUDSDK_CONTAINER_CLUSTER=prod-cluster
options:
  machineType: E2_HIGHCPU_8
  logging: CLOUD_LOGGING_ONLY

		

10. GKE Best Practices Checklist

			
Cluster Setup
  ✅ Regional cluster (not zonal)
  ✅ Private cluster (no public node IPs)
  ✅ Separate node pools by workload type
  ✅ Release channel enabled (auto-updates)
  ✅ Maintenance window set
  ✅ VPC-native networking
Security
  ✅ Workload Identity (no SA keys)
  ✅ Binary Authorization
  ✅ Pod Security Standards (restricted)
  ✅ Network Policies (default deny)
  ✅ Secrets in Secret Manager
  ✅ Shielded nodes enabled
  ✅ Container image scanning
  ✅ Cloud Armor WAF on ingress
Resource Management
  ✅ Requests and limits on every container
  ✅ LimitRange per namespace
  ✅ ResourceQuota per namespace
  ✅ Priority classes defined
  ✅ PodDisruptionBudgets set
Reliability
  ✅ Minimum 3 replicas for prod services
  ✅ Pod anti-affinity across zones
  ✅ HPA configured
  ✅ Liveness + readiness + startup probes
  ✅ Graceful shutdown (preStop + terminationGrace)
  ✅ PodDisruptionBudget (minAvailable ≥ 1)
Cost
  ✅ Spot VMs for batch/non-critical
  ✅ Committed use discounts for baseline
  ✅ Cluster autoscaler enabled
  ✅ VPA recommendations reviewed
  ✅ Node auto-provisioning for mixed workloads
Observability
  ✅ Managed Prometheus enabled
  ✅ Cloud Logging with structured JSON
  ✅ Cloud Trace instrumented
  ✅ Dashboards for golden signals
  ✅ Alerts on SLO breaches

		

GKE best practices come down to three pillars — security by default (private cluster, Workload Identity, least privilege), reliability by design (regional cluster, anti-affinity, autoscaling, probes), and cost efficiency (spot VMs, committed use, right-sizing with VPA). Get these right from day one and you avoid the most painful production incidents.