Understanding ARO’s Kubernetes API Operations

May 1, 2026May 1, 2026 techhadoop AKS, azure, OCP ai, cloud, devops, kubernetes, technology

Kubernetes API Operations Through the ARO Private Endpoint

Every interaction with an ARO cluster — whether from a human, a tool, or an automated controller — flows through a single TCP connection to port 6443 on the API server private endpoint. The API server is the absolute centre of gravity for all cluster operations.

Every Operation Is a REST Call

The Kubernetes API server exposes a RESTful HTTP/2 API over TLS. Every tool — kubectl, oc, operators, kubelet — translates its work into one of five HTTP verbs against a resource path:

			
GET     /api/v1/namespaces/payments/pods          list pods
GET     /api/v1/namespaces/payments/pods/web-1    get single pod
POST    /api/v1/namespaces/payments/pods          create pod
PUT     /api/v1/namespaces/payments/pods/web-1    replace pod
PATCH   /api/v1/namespaces/payments/pods/web-1    partial update
DELETE  /api/v1/namespaces/payments/pods/web-1    delete pod
GET     /api/v1/namespaces/payments/pods?watch=1  watch stream

		

Every one of these travels as TLS-encrypted HTTP/2 to 10.1.0.8:6443.

Category 1 — Human CLI Operations (kubectl + oc)

kubectl — standard Kubernetes operations

			
# Every one of these becomes a REST call through the private endpoint
# LIST pods → GET /api/v1/namespaces/default/pods
kubectl get pods -n payments
# CREATE deployment → POST /apps/v1/namespaces/payments/deployments
kubectl apply -f deployment.yaml
# EXEC into pod → POST + UPGRADE to SPDY/WebSocket
kubectl exec -it web-1 -- /bin/bash
# PORT-FORWARD → POST + WebSocket tunnel
kubectl port-forward svc/my-app 8080:80
# LOGS → GET /api/v1/namespaces/payments/pods/web-1/log
kubectl logs web-1 --follow
# WATCH resources → GET with ?watch=1 (long-lived streaming connection)
kubectl get pods --watch

		

oc CLI — OpenShift-specific additions

oc wraps kubectl completely and adds calls to OpenShift-specific API groups:

			
# OpenShift Route → POST /apis/route.openshift.io/v1/namespaces/.../routes
oc expose svc/my-app
# Project (OpenShift namespace wrapper)
# → POST /apis/project.openshift.io/v1/projectrequests
oc new-project my-team
# ImageStream → GET /apis/image.openshift.io/v1/namespaces/.../imagestreams
oc get imagestreams
# BuildConfig → POST /apis/build.openshift.io/v1/namespaces/.../builds
oc start-build my-app
# DeploymentConfig (legacy OpenShift resource)
# → GET /apis/apps.openshift.io/v1/namespaces/.../deploymentconfigs
oc rollout latest dc/my-app
# SCC inspection → GET /apis/security.openshift.io/v1/securitycontextconstraints
oc get scc

		

Category 2 — Operators and Controllers

Operators are long-running processes inside the cluster that maintain perpetual watch connections to the API server — the busiest category of API consumers by connection count.

The watch loop — how operators work

			
// Every operator runs this pattern against the API server
// Connection: persistent HTTP/2 stream to 10.1.0.8:6443
// 1. LIST — get current state (one-time at startup)
GET /apis/apps/v1/namespaces/payments/deployments
→ Returns: all deployments + resourceVersion: 48291
// 2. WATCH — subscribe to changes (permanent long-poll)
GET /apis/apps/v1/namespaces/payments/deployments?watch=1&resourceVersion=48291
→ Server keeps connection open indefinitely
→ Pushes events as they occur:
   {"type":"MODIFIED","object":{"metadata":{"name":"web"},...}}
   {"type":"ADDED","object":{"metadata":{"name":"worker"},...}}
   {"type":"DELETED","object":{"metadata":{"name":"old"},...}}
// 3. RECONCILE — when event received, fix actual → desired state
PATCH /apis/apps/v1/namespaces/payments/replicasets/web-abc
→ Creates/deletes pods to match desired replicas
// 4. STATUS UPDATE — write observed state back
PATCH /apis/apps/v1/namespaces/payments/deployments/web/status
→ {"observedGeneration": 5, "availableReplicas": 3}

		

Built-in OpenShift operators that run this loop continuously

Operator	What it watches	What it does
`openshift-apiserver-operator`	`apiservers.config.openshift.io`	Manages API server config and certs
`cluster-version-operator`	`clusterversions.config.openshift.io`	Drives cluster upgrades
`machine-config-operator`	`machineconfigs`, `machineconfigpools`	Applies RHCOS config to nodes
`ingress-operator`	`ingresses.config.openshift.io`	Manages router deployments
`dns-operator`	`dnses.config.openshift.io`	Manages CoreDNS config
`network-operator`	`networks.config.openshift.io`	Manages OVN-Kubernetes
`image-registry-operator`	`configs.imageregistry.operator.openshift.io`	Manages internal registry
`authentication-operator`	`authentications.config.openshift.io`	Manages OAuth server

Every one of these has persistent watch connections open to the API server at all times — a healthy ARO cluster typically has 40–80 active watch streams running 24/7.

Category 3 — Kubelet (Node Agent)

Every worker node runs a kubelet process that maintains its own connection to the API server — reporting node health and receiving pod assignments:

			
Worker node kubelet → 10.1.0.8:6443
Outbound (kubelet → API server):
  POST /api/v1/nodes/worker-1/status          every 10 seconds — node heartbeat
  PATCH /api/v1/namespaces/app/pods/web-1/status  when pod state changes
  POST /api/v1/events                         kubelet events (OOM, image pull)
Inbound (API server → kubelet port 10250):
  GET  https://worker-1:10250/exec/...        kubectl exec forwarding
  GET  https://worker-1:10250/log/...         kubectl logs forwarding
  GET  https://worker-1:10250/metrics         Prometheus scraping

		

If the kubelet loses its connection to the API server for more than the node-monitor-grace-period (default 40 seconds), the node is marked NotReady and pods begin eviction.

Category 4 — CI/CD Pipelines

Self-hosted CI/CD runners inside the VNet authenticate to the API server using a service account token:

			
# Service account for CI/CD — scoped to specific namespace
apiVersion: v1
kind: ServiceAccount
metadata:
  name: cicd-deployer
  namespace: payments
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: deployer
  namespace: payments
rules:
  - apiGroups: ["apps"]
    resources: ["deployments", "replicasets"]
    verbs: ["get", "list", "create", "update", "patch"]
  - apiGroups: [""]
    resources: ["pods", "services", "configmaps"]
    verbs: ["get", "list", "create", "update", "patch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: cicd-deployer-binding
  namespace: payments
roleRef:
  kind: Role
  name: deployer
subjects:
  - kind: ServiceAccount
    name: cicd-deployer
    namespace: payments

		

GitHub Actions pipeline using this service account:

			
- name: Deploy to ARO
  run: |
    # Authenticate with service account token — all traffic to 10.1.0.8:6443
    oc login ${{ secrets.ARO_API_URL }} \
      --token ${{ secrets.CICD_SA_TOKEN }}
    # Each command = REST call through private endpoint
    oc set image deployment/web \
      web=acrprod.azurecr.io/my-app:${{ github.sha }} \
      -n payments
    oc rollout status deployment/web -n payments

		

Category 5 — Admission Webhooks

Admission webhooks add an external hop during the API server request pipeline — the API server calls out to your webhook service before persisting any object:

			
kubectl apply -f pod.yaml
        ↓
API server receives POST /api/v1/namespaces/payments/pods
        ↓
Authn + RBAC pass
        ↓
Mutating admission webhook:
  API server → POST https://gatekeeper-webhook.gatekeeper-system.svc:443/mutate
  Webhook adds labels, sets resource limits, injects sidecars
  → Returns mutated pod spec
        ↓
Validating admission webhook:
  API server → POST https://gatekeeper-webhook.gatekeeper-system.svc:443/validate
  Checks policy: must have resource limits, no root, valid image registry
  → Returns: allowed: true (or denied with reason)
        ↓
Persist to etcd → notify watchers → return 201 Created

		

Common admission webhooks in ARO:

Webhook	Purpose
OPA Gatekeeper	Policy enforcement — block non-compliant resources
Kyverno	Policy as code — mutate, validate, generate
Istio / OpenShift Service Mesh	Inject Envoy sidecar into pods automatically
Red Hat ACM	Multi-cluster governance policies
Cert-manager	Inject TLS certificates into resources

Category 6 — Monitoring and Observability

			
# Prometheus scrapes API server metrics via the API endpoint
GET https://10.1.0.8:6443/metrics
# Returns: apiserver_request_total, apiserver_request_duration_seconds,
#          etcd_request_duration_seconds, workqueue_depth, ...
# Health endpoints checked by Azure ARO service monitor
GET https://10.1.0.8:6443/healthz    → "ok"
GET https://10.1.0.8:6443/readyz     → "ok"
GET https://10.1.0.8:6443/livez      → "ok"
# OpenShift console reads cluster state continuously
GET /apis/config.openshift.io/v1/clusterversions/version
GET /api/v1/namespaces?limit=500
GET /apis/project.openshift.io/v1/projects

		

The Request Pipeline — What Happens Inside

Every request through the private endpoint traverses this exact pipeline inside kube-apiserver:

			
TLS handshake on 10.1.0.8:6443
        ↓
1. AUTHENTICATION — who are you?
   • OIDC token (Entra ID) → extract user + groups
   • x509 client cert → extract CN as username
   • Bearer token → look up service account
   • Failure → 401 Unauthorized
2. AUTHORIZATION (RBAC) — are you allowed?
   • Check: user + groups + verb + resource + namespace
   • ClusterRoleBinding / RoleBinding lookup
   • OpenShift SCC evaluation for pods
   • Failure → 403 Forbidden
3. ADMISSION CONTROL — is this allowed by policy?
   • Mutating webhooks (modify the object)
   • Built-in admission plugins (ResourceQuota, LimitRanger)
   • Validating webhooks (accept or reject)
   • Failure → 400/403 with reason
4. VALIDATION — is the object schema correct?
   • OpenAPI schema validation
   • CRD schema validation
   • Field immutability checks
   • Failure → 422 Unprocessable Entity
5. PERSIST TO etcd
   • Serialise to protobuf
   • Encrypt at rest (AES-GCM, ARO managed)
   • Write to etcd with optimistic concurrency (resourceVersion)
   • Failure → 409 Conflict (resourceVersion mismatch)
6. NOTIFY WATCHERS
   • Push event to all active watch streams matching the resource
   • Controllers, operators, scheduler, kubelet all receive notification
7. RETURN RESPONSE
   • 200 OK (GET)
   • 201 Created (POST)
   • 200 OK with updated object (PATCH/PUT)
   • 404 Not Found
   • Streaming response for watch/exec/logs/port-forward

		

API Groups — Kubernetes vs OpenShift

The API server serves two parallel API surfaces — Kubernetes core APIs and OpenShift extension APIs — all through the same 10.1.0.8:6443 endpoint:

			
Kubernetes core APIs:
  /api/v1/                          pods, services, configmaps, secrets, nodes
  /apis/apps/v1/                    deployments, replicasets, statefulsets, daemonsets
  /apis/batch/v1/                   jobs, cronjobs
  /apis/rbac.authorization.k8s.io/  clusterroles, rolebindings
  /apis/storage.k8s.io/             storageclasses, persistentvolumes
  /apis/networking.k8s.io/          ingresses, networkpolicies
OpenShift extension APIs:
  /apis/route.openshift.io/         routes (OpenShift ingress primitive)
  /apis/project.openshift.io/       projects (namespace + RBAC wrapper)
  /apis/build.openshift.io/         buildconfigs, builds
  /apis/image.openshift.io/         imagestreams, imagestreamtags
  /apis/apps.openshift.io/          deploymentconfigs (legacy)
  /apis/security.openshift.io/      securitycontextconstraints
  /apis/config.openshift.io/        cluster-wide config (DNS, network, auth)
  /apis/operator.openshift.io/      operator configuration resources
  /apis/machine.openshift.io/       machines, machinesets (MachineAPI)

		

Key Takeaway

The ARO API server private endpoint at 10.1.0.8:6443 is not just the entry point for human CLI commands — it is the nervous system of the entire cluster. Every automated process — the 40+ built-in OpenShift operators maintaining cluster state, every kubelet heartbeating from every worker node every 10 seconds, every CI/CD deployment, every admission webhook validation, every Prometheus health check — flows through this single TLS endpoint. Making it private eliminates the internet attack surface entirely, while the seven-stage request pipeline inside the API server ensures every operation is authenticated, authorised, policy-checked, validated, and durably persisted before any response is returned.

Best Practices for OpenShift on Azure: ARO Guide

April 30, 2026April 30, 2026 techhadoop Uncategorized azure, cloud, devops, kubernetes, technology

OpenShift Container Platform on Azure — ARO Best Practices

Azure Red Hat OpenShift (ARO) is a fully managed OpenShift 4 service jointly operated by Microsoft and Red Hat — both companies share responsibility for the control plane, infrastructure, and SLA (99.95%).

1. Networking Best Practices

Always deploy a private cluster

A private ARO cluster hides the Kubernetes API server behind a private endpoint — no public IP, unreachable from the internet:

az aro create \
  --resource-group rg-aro \
  --name aro-prod \
  --vnet aro-spoke-vnet \
  --master-subnet master-subnet \
  --worker-subnet worker-subnet \
  --apiserver-visibility Private \      # ← API server private
  --ingress-visibility Private \        # ← ingress private
  --pull-secret @pull-secret.txt

Access to the private API server is then through Azure Bastion → jump host, or over ExpressRoute/VPN from on-premises.

Subnet sizing — get this right before deployment (cannot resize after)

ARO consumes IP addresses aggressively — every pod gets its own IP from the node’s subnet range:

Subnet	Minimum	Recommended	Notes
Master subnet	/27	/24	Fixed 3 masters — needs room for Azure infra IPs
Worker subnet	/27	/23 or /22	Every pod consumes an IP — size generously
Ingress subnet	/28	/27	For LB / App Gateway front-end IPs
Private endpoints	/28	/27	One IP per private endpoint

Worker subnet sizing example:
  /23 = 512 addresses
  Azure reserves 5
  Available: 507
  Max pods per node: 250 (default OpenShift SDN)
  Nodes supportable: ~2 per node × workers
  Plan for: 3× current need for growth headroom

Egress lockdown via Azure Firewall

ARO requires outbound internet access for Red Hat update servers, telemetry, and pull.registry.redhat.io. Lock this down with Azure Firewall application rules rather than allowing all outbound:

			
Azure Firewall Application Rules for ARO egress:
  ┌─────────────────────────────────────────────────────────┐
  │ Name                    Target FQDN                     │
  ├─────────────────────────────────────────────────────────┤
  │ aro-rh-registry         registry.redhat.io              │
  │                         registry.access.redhat.com      │
  │                         quay.io                         │
  │                         cdn.quay.io                     │
  ├─────────────────────────────────────────────────────────┤
  │ aro-azure-services      *.blob.core.windows.net         │
  │                         *.servicebus.windows.net        │
  │                         *.table.core.windows.net        │
  ├─────────────────────────────────────────────────────────┤
  │ aro-monitoring          *.ods.opinsights.azure.com      │
  │                         *.oms.opinsights.azure.com      │
  ├─────────────────────────────────────────────────────────┤
  │ aro-rh-telemetry        cert-api.access.redhat.com      │
  │                         api.access.redhat.com           │
  └─────────────────────────────────────────────────────────┘

		

Apply a UDR on the master and worker subnets pointing 0.0.0.0/0 to the Azure Firewall private IP — same hub and spoke pattern as any spoke workload.

Use a custom DNS server

Point the ARO VNet DNS to your hub DNS Private Resolver so cluster nodes can resolve private endpoints and internal domains:

az network vnet update \
  --resource-group rg-aro-network \
  --name aro-spoke-vnet \
  --dns-servers 10.0.5.4    # DNS Private Resolver inbound endpoint IP

2. Availability and Resilience Best Practices

Spread across all three Availability Zones

ARO deploys 3 master nodes — one per AZ automatically. Workers must be explicitly spread via MachineSets:

# MachineSet for AZ1 — replicate for AZ2, AZ3
apiVersion: machine.openshift.io/v1beta1
kind: MachineSet
metadata:
  name: aro-prod-worker-eastus-1
  namespace: openshift-machine-api
spec:
  replicas: 3
  template:
    spec:
      providerSpec:
        value:
          zone: "1"                         # AZ1
          vmSize: Standard_D16s_v3
          osDisk:
            diskSizeGB: 128
            managedDisk:
              storageAccountType: Premium_LRS

Create three MachineSets — one per zone — with equal replica counts. This ensures workloads survive a full AZ failure.

Enable cluster autoscaler

apiVersion: autoscaling.openshift.io/v1
kind: ClusterAutoscaler
metadata:
  name: default
spec:
  resourceLimits:
    maxNodesTotal: 24
  scaleDown:
    enabled: true
    delayAfterAdd: 10m
    delayAfterDelete: 5m
    delayAfterFailure: 30s
---
apiVersion: autoscaling.openshift.io/v1beta1
kind: MachineAutoscaler
metadata:
  name: aro-prod-worker-eastus-1
  namespace: openshift-machine-api
spec:
  minReplicas: 3
  maxReplicas: 8
  scaleTargetRef:
    kind: MachineSet
    name: aro-prod-worker-eastus-1

Use zone-redundant storage for persistent volumes

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: managed-premium-zrs
provisioner: disk.csi.azure.com
parameters:
  skuName: Premium_ZRS       # Zone-redundant storage — survives AZ failure
  cachingMode: ReadOnly
reclaimPolicy: Retain        # Retain on PVC delete — prevents data loss
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true

Use Premium_ZRS instead of Premium_LRS for stateful workloads — ZRS replicates the disk synchronously across three AZs so a pod can reschedule to another zone without losing its data.

3. Security Best Practices

Use Workload Identity (pod-level Azure RBAC)

Never put Azure credentials in pods. Use Workload Identity to give individual pods an Azure AD identity with scoped RBAC permissions:

# Enable workload identity on ARO cluster
az aro update \
  --resource-group rg-aro \
  --name aro-prod \
  --enable-managed-identity

# Create a managed identity for a specific workload
az identity create \
  --resource-group rg-aro-workloads \
  --name id-payment-service

# Grant it only what it needs
az role assignment create \
  --assignee <identity-client-id> \
  --role "Key Vault Secrets User" \
  --scope /subscriptions/.../vaults/kv-prod

# Annotate the service account
apiVersion: v1
kind: ServiceAccount
metadata:
  name: payment-service-sa
  namespace: payments
  annotations:
    azure.workload.identity/client-id: "<managed-identity-client-id>"

Integrate Azure Key Vault for secrets via CSI driver

Never store secrets in OpenShift Secrets (base64 is not encryption). Use the Secrets Store CSI driver to mount Key Vault secrets directly into pods:

apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
  name: azure-kv-secrets
  namespace: payments
spec:
  provider: azure
  parameters:
    usePodIdentity: "false"
    clientID: "<managed-identity-client-id>"
    keyvaultName: kv-prod
    tenantID: "<tenant-id>"
    objects: |
      array:
        - |
          objectName: db-connection-string
          objectType: secret
        - |
          objectName: api-key
          objectType: secret

Integrate with Azure Container Registry via private endpoint

# Create ACR with private endpoint — no public access
az acr create \
  --resource-group rg-aro \
  --name acrprodaro \
  --sku Premium \
  --public-network-enabled false

# Private endpoint in ARO spoke
az network private-endpoint create \
  --name pe-acr-prod \
  --resource-group rg-aro-network \
  --vnet-name aro-spoke-vnet \
  --subnet private-endpoint-subnet \
  --private-connection-resource-id $(az acr show --name acrprodaro --query id -o tsv) \
  --group-id registry \
  --connection-name pe-acr-conn

# Grant ARO pull access
az role assignment create \
  --assignee <aro-kubelet-identity> \
  --role AcrPull \
  --scope $(az acr show --name acrprodaro --query id -o tsv)

Apply OpenShift Security Context Constraints (SCC)

Never run pods as root. Use the restricted-v2 SCC (default in OpenShift 4.11+):

apiVersion: v1
kind: Pod
spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 1001
    seccompProfile:
      type: RuntimeDefault
  containers:
    - name: app
      securityContext:
        allowPrivilegeEscalation: false
        capabilities:
          drop: ["ALL"]
        readOnlyRootFilesystem: true

Enable Microsoft Defender for Containers

az security pricing create \
  --name Containers \
  --tier Standard

Defender for Containers provides runtime threat detection, vulnerability scanning for images in ACR, and Kubernetes audit log analysis — all surfaced in Microsoft Defender for Cloud.

4. Observability Best Practices

Forward logs to Azure Monitor / Log Analytics

# Enable container insights on ARO
az aro update \
  --resource-group rg-aro \
  --name aro-prod \
  --enable-managed-identity

# Deploy the monitoring add-on via Helm
helm repo add microsoft https://microsoft.github.io/charts/repo
helm install azuremonitor-containers \
  microsoft/azuremonitor-containers \
  --set omsagent.secret.wsid=<workspace-id> \
  --set omsagent.secret.key=<workspace-key> \
  --namespace kube-system

Use Azure Monitor alerts for cluster health

Alert	Metric	Threshold
Node CPU pressure	`cpuUsageNanoCores`	> 85% for 5 min
Node memory pressure	`memoryWorkingSetBytes`	> 80% of capacity
Pod restart loop	`restartCount`	> 5 in 10 min
PVC near full	`pvUsedBytes`	> 85% of capacity
Node not ready	`nodeCondition`	NotReady > 2 min

5. Day-2 Operations Best Practices

Cluster upgrade strategy

ARO manages the control plane upgrade automatically — you control timing for worker nodes:

# Check available upgrade versions
az aro get-upgrade-versions \
  --resource-group rg-aro \
  --name aro-prod

# Schedule upgrade in maintenance window
az aro update \
  --resource-group rg-aro \
  --name aro-prod \
  --version 4.14.12

Use the EUS (Extended Update Support) channel for production clusters — it allows staying on a minor version for up to 18 months while still receiving security patches, avoiding the churn of mandatory minor version upgrades every 45 days.

Worker node upgrade — use surge capacity

# MachineConfigPool surge upgrade strategy
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: worker
spec:
  maxUnavailable: 1          # Upgrade one node at a time

Upgrade workers one node at a time to maintain application availability — pods are gracefully drained before each node reboots into the new RHCOS version.

Summary — ARO Best Practice Checklist

Category	Practice
Network	Private cluster — no public API or ingress
Network	Egress via Azure Firewall with FQDN allow-list
Network	DNS Private Resolver for private endpoint resolution
Network	Worker subnet `/22` or larger — never resize after
Availability	Workers spread across AZs via 3 MachineSets
Availability	Cluster autoscaler min 3 per zone
Availability	Premium_ZRS disks for stateful workloads
Availability	Zone-redundant Azure Load Balancer
Security	Workload Identity — no credentials in pods
Security	Key Vault + CSI driver — no base64 secrets
Security	ACR via private endpoint — no public pull
Security	SCC restricted-v2 — no root containers
Security	Defender for Containers enabled
Observability	Container Insights → Log Analytics
Observability	Azure Monitor alerts on node and pod health
Operations	EUS channel for production stability
Operations	maxUnavailable: 1 for worker upgrades

MCP Operations Server: AI-Enabled Managed Ops Explained

April 25, 2026April 25, 2026 techhadoop ai, AKS, azure ai, cloud, devops, kubernetes, technology

To bridge your local Python code to a production-ready AKS environment, you need a Dockerfile that doesn’t just run the code, but does so securely and efficiently.

By 2026, the standard for MCP servers in production is to move away from STDIO (local command line) and use SSE (Server-Sent Events) over HTTP. This allows your AI agents to talk to the server over a network.

1. The Production Dockerfile

This Dockerfile uses a “non-root” user (security best practice) and installs the necessary drivers to talk to the Docker socket or Kubernetes API.

Dockerfile

			
# Use a lightweight Python 2026-ready base image
FROM python:3.12-slim
# Install system dependencies (curl for health checks)
RUN apt-get update && apt-get install -y \
    curl \
    && rm -rf /var/lib/apt/lists/*
# Set working directory
WORKDIR /app
# Create a non-root user for security
RUN groupadd -r mcpuser && useradd -r -g mcpuser mcpuser
# Copy requirements and install
# Note: includes 'mcp[cli]' for server capabilities
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Copy server code
COPY server.py .
# Give our non-root user access to the app folder
RUN chown -r mcpuser:mcpuser /app
USER mcpuser
# Expose the port for SSE/HTTP transport (Standard for 2026)
EXPOSE 8000
# Start the server using the FastMCP production runner
CMD ["python", "server.py", "--transport", "sse", "--port", "8000"]

		

2. The `requirements.txt`

You’ll need these specific libraries:

Plaintext

			
fastmcp>=1.0.0
docker>=7.0.0
kubernetes>=30.0.0
uvicorn # Required for high-performance HTTP transport

3. Deploying to AKS (The “Support” Strategy)

When you deploy this to your client’s AKS cluster, you’ll use a standard Kubernetes Deployment.

Why this is better for your role:

Scaling: If the dev team grows, you can scale the MCP server to 3 replicas so the AI assistant never lags.
Security: Instead of sharing your personal kubeconfig, the MCP server uses a ServiceAccount with “View Only” permissions. This means the AI can see the logs but can’t accidentally delete the production database.

4. How to Pitch the “AI Operations” Tier

You can now offer a new support tier called “AI-Enabled Managed Ops”:

“I’ve built a custom MCP Operations Server for our cluster. It allows our internal AI agents to perform health checks, retrieve logs, and analyze container stats using natural language. This doesn’t replace me; it allows me to respond to your requests 10x faster because the AI is doing the ‘data gathering’ for me inside our secure perimeter.”

One final piece of the puzzle

To make this work in AKS, the pod needs permission to “see” the other pods.

To finish the MCP server integration on AKS, you need to grant the pod the right permissions to “talk” to the Kubernetes API.

If you don’t do this, the AI will be “blind”—it will try to list pods and get a 403 Forbidden error.

1. The RBAC Strategy

We will use three Kubernetes objects:

ServiceAccount: The identity for your MCP pod.
ClusterRole: A set of rules that allow “Viewing” (reading pods, logs, and events).
ClusterRoleBinding: The “glue” that attaches the Role to the ServiceAccount.

2. The RBAC YAML (`mcp-rbac.yaml`)

YAML

			
# 1. The Identity
apiVersion: v1
kind: ServiceAccount
metadata:
  name: mcp-server-sa
  namespace: default
---
# 2. The Permissions (Read-Only/Viewer)
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: mcp-pod-viewer
rules:
- apiGroups: [""]
  # Accessing 'pods' for list/get, and 'pods/log' specifically for tracing
  resources: ["pods", "pods/log", "pods/status", "events", "nodes", "services"]
  verbs: ["get", "list", "watch"]
---
# 3. The Connection
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: mcp-server-binding
subjects:
- kind: ServiceAccount
  name: mcp-server-sa
  namespace: default
roleRef:
  kind: ClusterRole
  name: mcp-pod-viewer
  apiGroup: rbac.authorization.k8s.io

		

3. Updating your Deployment

Finally, ensure your MCP server deployment uses this serviceAccountName:

YAML

			
spec:
  template:
    spec:
      serviceAccountName: mcp-server-sa
      containers:
      - name: mcp-server
        image: your-mcp-image:latest
        # ... other config ...

		

4. Why this is “Safe” for your Client

When you explain this to the company, emphasize these three points:

Namespace Scoping: Even though it’s a ClusterRole, you can swap it for a Role if you only want the AI to see specific namespaces (e.g., only production-frontend).
No Secrets Access: Notice that secrets is not in the list of resources. The AI literally cannot see the database passwords, even if it tries.
Auditability: Every time the AI agent asks the MCP server for logs, Kubernetes logs that action under the mcp-server-sa identity. You have a perfect audit trail.

Putting it all together

You now have the Terraform for infrastructure, the Python for the server, the Docker for the container, and the RBAC for security.

You’re ready to pitch this as a “Self-Healing AI Operations Layer.”

Feature	Kubenet	Azure CNI
IP assignment	NAT	Real VNet IP
Scalability	Better	Limited by subnet
Complexity	Lower	Higher
Use case	Small clusters	Enterprise

Feature	Disk	File
Access	Single pod	Multiple pods
Performance	High	Moderate

Understanding Azure RBAC vs Kubernetes RBAC

April 24, 2026April 24, 2026 techhadoop AKS, azure azure, cloud, devops, kubernetes, technology

When explaining this to a client, it is helpful to describe it as the difference between who can touch the physical server (the building) versus who can edit the files on the computer inside (the office).

In 2026, the industry standard is to use Azure RBAC for both, but they still operate on two distinct “control planes.”

1. The Two Control Planes

In AKS, access is split into two layers:

The Azure Control Plane (Azure RBAC): This governs the “outside” of the cluster. It’s about the Kubernetes resource itself as it exists in your Azure portal.
The Kubernetes Control Plane (Kubernetes RBAC): This governs the “inside” of the cluster. It’s about the pods, namespaces, and deployments running on the nodes.

2. Side-by-Side Comparison

Feature	Azure RBAC	Kubernetes RBAC
Scope	Subscription / Resource Group / AKS Resource	Cluster / Namespace / Specific Pods
Managed Via	Azure Portal, CLI, Terraform	`kubectl`, YAML manifests, Helm
Typical Actions	Scaling nodes, Upgrading K8s version, Deleting the cluster.	Creating a Pod, Editing a Service, Viewing logs in a Namespace.
Identity Source	Microsoft Entra ID (Azure AD)	Service Accounts (or Entra ID via integration)

3. The “Hybrid” Option (Azure RBAC for K8s Authorization)

This is the most confusing part for beginners, but the most important for you to propose to your client.

You can now use Azure RBAC to manage internal Kubernetes permissions. Instead of writing complex RoleBinding YAML files for every user, you assign them a built-in Azure role that Kubernetes understands.

Key Built-in Roles (2026 Standards):

AKS RBAC Viewer: Can see resources in a namespace but can’t see secrets or change anything.
AKS RBAC Writer: Can deploy apps and edit resources.
AKS RBAC Admin: Full control over a namespace.
AKS RBAC Cluster Admin: The “God Mode” for the entire cluster.

4. How to Explain the Workflow to Your Manager

“Think of it like a bank:

Azure RBAC is the security guard at the front door. He checks your ID (Entra ID) and decides if you’re even allowed in the building. He also decides who can add more teller windows (Scale nodes) or renovate the lobby (Upgrade cluster).
Kubernetes RBAC is the permissions on the safe. Once you’re inside, it decides if you can open Drawer A (Namespace ‘Dev’) or Drawer B (Namespace ‘Prod’).”

Pro-Tip: Recommendation

If you want to provide “Gold Standard” support, propose disabling local accounts and moving entirely to Azure RBAC for Kubernetes Authorization. > Why? Because when an employee leaves the company and their Entra ID (Azure AD) is deleted, their access to the Kubernetes cluster is instantly revoked. No orphaned RoleBindings to worry about.

Understanding how a developer goes from their laptop to a running container in a secured AKS environment is the best way to prove the value of your setup.

Here is the step-by-step lifecycle of a developer’s request in a Zero-Trust AKS environment.

The Access Lifecycle (Step-by-Step)

1. Authentication (The Gatekeeper)

The developer doesn’t have a “Kubernetes password.” Instead, they run:

Bash

			
az login
az aks get-credentials --resource-group rg-prod --name aks-01

At this moment, Azure RBAC checks if their Entra ID account has permission to even download the cluster configuration.

2. Authorization (The Office Door)

The developer tries to deploy a new microservice:

Bash

kubectl apply -f my-app.yaml

The AKS API Server intercepts this. Since we are using Azure RBAC for Kubernetes Authorization, it asks Entra ID: “Does this user have the ‘AKS RBAC Writer’ role for the ‘Production’ namespace?” * If Yes: The request proceeds.

If No: The request is blocked with a 403 Forbidden error.

3. Policy Validation (The Safety Inspector)

Before the pod is actually scheduled, Azure Policy (the Admission Controller) scans the my-app.yaml.

It checks: “Is this container trying to run as root? Does it have CPU limits?” * If the YAML is “lazy” (insecure), Azure Policy rejects it immediately, even though the developer has “Writer” permissions.

4. Identity & Secrets (The Secure Handshake)

Once the pod starts, it needs to talk to the database.

The pod presents its Workload Identity (a managed identity) to the Azure Key Vault.
Key Vault verifies the pod’s identity and hands over the database string via the CSI Driver.
The password is never stored in a file or an environment variable where a human could see it.

Summary Table for Your Proposal

To wrap this up for your client, you can present this “Success Path” to show them exactly what they are paying for:

Stage	Security Layer	Purpose
Login	Entra ID	Ensures only active employees can connect.
Action	Azure RBAC	Limits what a developer can do (e.g., Read vs. Write).
Deploy	Azure Policy	Forces best practices (No root, resource limits).
Connect	Workload Identity	Eliminates hardcoded passwords in the code.

Pro-Tip: The “Audit” Hook

Tell your client: “With this setup, we can generate a report at any time showing exactly who accessed the production cluster and what they changed. This makes SOC2 or ISO27001 audits a breeze.”

Understanding Azure RBAC vs Kubernetes RBAC

April 24, 2026April 24, 2026 techhadoop AKS, azure azure, cloud, devops, kubernetes, technology

In 2026, the industry standard is to use Azure RBAC for both, but they still operate on two distinct “control planes.”

1. The Two Control Planes

In AKS, access is split into two layers:

The Azure Control Plane (Azure RBAC): This governs the “outside” of the cluster. It’s about the Kubernetes resource itself as it exists in your Azure portal.
The Kubernetes Control Plane (Kubernetes RBAC): This governs the “inside” of the cluster. It’s about the pods, namespaces, and deployments running on the nodes.

2. Side-by-Side Comparison

Feature	Azure RBAC	Kubernetes RBAC
Scope	Subscription / Resource Group / AKS Resource	Cluster / Namespace / Specific Pods
Managed Via	Azure Portal, CLI, Terraform	`kubectl`, YAML manifests, Helm
Typical Actions	Scaling nodes, Upgrading K8s version, Deleting the cluster.	Creating a Pod, Editing a Service, Viewing logs in a Namespace.
Identity Source	Microsoft Entra ID (Azure AD)	Service Accounts (or Entra ID via integration)

3. The “Hybrid” Option (Azure RBAC for K8s Authorization)

This is the most confusing part for beginners, but the most important for you to propose to your client.

Key Built-in Roles (2026 Standards):

AKS RBAC Viewer: Can see resources in a namespace but can’t see secrets or change anything.
AKS RBAC Writer: Can deploy apps and edit resources.
AKS RBAC Admin: Full control over a namespace.
AKS RBAC Cluster Admin: The “God Mode” for the entire cluster.

4. How to Explain the Workflow to Your Manager

“Think of it like a bank:

Azure RBAC is the security guard at the front door. He checks your ID (Entra ID) and decides if you’re even allowed in the building. He also decides who can add more teller windows (Scale nodes) or renovate the lobby (Upgrade cluster).
Kubernetes RBAC is the permissions on the safe. Once you’re inside, it decides if you can open Drawer A (Namespace ‘Dev’) or Drawer B (Namespace ‘Prod’).”

Pro-Tip: Recommendation

Understanding how a developer goes from their laptop to a running container in a secured AKS environment is the best way to prove the value of your setup.

Here is the step-by-step lifecycle of a developer’s request in a Zero-Trust AKS environment.

The Access Lifecycle (Step-by-Step)

1. Authentication (The Gatekeeper)

The developer doesn’t have a “Kubernetes password.” Instead, they run:

Bash

			
az login
az aks get-credentials --resource-group rg-prod --name aks-01

At this moment, Azure RBAC checks if their Entra ID account has permission to even download the cluster configuration.

2. Authorization (The Office Door)

The developer tries to deploy a new microservice:

Bash

kubectl apply -f my-app.yaml

If No: The request is blocked with a 403 Forbidden error.

3. Policy Validation (The Safety Inspector)

Before the pod is actually scheduled, Azure Policy (the Admission Controller) scans the my-app.yaml.

It checks: “Is this container trying to run as root? Does it have CPU limits?” * If the YAML is “lazy” (insecure), Azure Policy rejects it immediately, even though the developer has “Writer” permissions.

4. Identity & Secrets (The Secure Handshake)

Once the pod starts, it needs to talk to the database.

The pod presents its Workload Identity (a managed identity) to the Azure Key Vault.
Key Vault verifies the pod’s identity and hands over the database string via the CSI Driver.
The password is never stored in a file or an environment variable where a human could see it.

Summary Table for Your Proposal

To wrap this up for your client, you can present this “Success Path” to show them exactly what they are paying for:

Stage	Security Layer	Purpose
Login	Entra ID	Ensures only active employees can connect.
Action	Azure RBAC	Limits what a developer can do (e.g., Read vs. Write).
Deploy	Azure Policy	Forces best practices (No root, resource limits).
Connect	Workload Identity	Eliminates hardcoded passwords in the code.

Pro-Tip: The “Audit” Hook

Tell your client: “With this setup, we can generate a report at any time showing exactly who accessed the production cluster and what they changed. This makes SOC2 or ISO27001 audits a breeze.”

Deploy AKS Clusters with Terraform: Best Practices

April 24, 2026April 24, 2026 techhadoop AKS, azure ai, azure, cloud, devops, technology

To deploy a production-ready AKS cluster using Terraform, it is best practice to separate your Network (VNet/Subnet) from the AKS Cluster resource. This ensures that if you ever need to destroy the cluster, your networking infrastructure remains intact.

Here is a clean, modular example using the AzureRM provider.

1. The Provider Configuration

First, create a main.tf to define your requirements.

Terraform

			
terraform {
  required_version = ">= 1.5.0"
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 3.0" # Or 4.x if using the latest 2026 releases
    }
  }
}
provider "azurerm" {
  features {}
}

		

2. Networking Resources

AKS needs a dedicated subnet. We’ll use Azure CNI (Advanced Networking) as it’s the standard for enterprise security.

Terraform

			
resource "azurerm_resource_group" "aks_rg" {
  name     = "rg-production-aks"
  location = "East US"
}
resource "azurerm_virtual_network" "aks_vnet" {
  name                = "vnet-aks-prod"
  location            = azurerm_resource_group.aks_rg.location
  resource_group_name = azurerm_resource_group.aks_rg.name
  address_space       = ["10.0.0.0/16"]
}
resource "azurerm_subnet" "aks_subnet" {
  name                 = "snet-aks-nodes"
  resource_group_name  = azurerm_resource_group.aks_rg.name
  virtual_network_name = azurerm_virtual_network.aks_vnet.name
  address_prefixes     = ["10.0.1.0/24"]
}

		

3. The AKS Cluster Resource

This block includes the security features we discussed: System Assigned Identity, Azure RBAC, and Azure Linux as the OS.

Terraform

			
resource "azurerm_kubernetes_cluster" "aks" {
  name                = "aks-prod-01"
  location            = azurerm_resource_group.aks_rg.location
  resource_group_name = azurerm_resource_group.aks_rg.name
  dns_prefix          = "aksprod"
  
  # Enable Azure RBAC for Kubernetes
  azure_policy_enabled = true
  local_account_disabled = true
  default_node_pool {
    name       = "systempool"
    node_count = 3
    vm_size    = "Standard_DS2_v2"
    vnet_subnet_id = azurerm_subnet.aks_subnet.id
    
    # Use Azure Linux for better security/performance
    os_sku = "AzureLinux" 
    
    # Enable auto-scaling for production
    enable_auto_scaling = true
    min_count           = 3
    max_count           = 5
  }
  identity {
    type = "SystemAssigned"
  }
  network_profile {
    network_plugin    = "azure"
    load_balancer_sku = "standard"
    network_policy    = "azure" # Enables Kubernetes Network Policies
  }
  tags = {
    Environment = "Production"
    ManagedBy   = "Terraform"
  }
}

		

4. Essential Outputs

You’ll need the cluster configuration to connect via kubectl.

Terraform

			
output "client_certificate" {
  value     = azurerm_kubernetes_cluster.aks.kube_config.0.client_certificate
  sensitive = true
}
output "kube_config" {
  value     = azurerm_kubernetes_cluster.aks.kube_config_raw
  sensitive = true
}

		

Key Implementation Steps

Initialize: Run terraform init to download the Azure provider.
Plan: Run terraform plan -out=main.tfplan to preview the 4 resources being created.
Apply: Run terraform apply "main.tfplan".
Connect: Once finished, use the Azure CLI to get your credentials:Bashaz aks get-credentials --resource-group rg-production-aks --name aks-prod-01

Why this is a “Support Pro” Move

By delivering this in Terraform, you are telling the company: “I don’t just click buttons in the portal. I provide Infrastructure as Code that is version-controlled, repeatable, and documented.” This makes it much easier to propose a “Disaster Recovery” service later on.

Integrating the Azure Key Vault (AKV) Secrets Store CSI Driver into your Terraform code is the final step in removing sensitive data (like database passwords or API keys) from your Kubernetes manifests.

Here is the additional code to enable the driver and set up the necessary permissions.

1. Enable the CSI Driver in AKS

In your azurerm_kubernetes_cluster resource block (from the previous code), you need to add the key_vault_secrets_provider block:

Terraform

			
resource "azurerm_kubernetes_cluster" "aks" {
  # ... existing config ...
  key_vault_secrets_provider {
    secret_rotation_enabled = true
    secret_rotation_interval = "2m"
  }
}

		

2. Create the Key Vault

You need a vault to actually store the secrets.

Terraform

			
resource "azurerm_key_vault" "kv" {
  name                        = "kv-prod-aks-01"
  location                    = azurerm_resource_group.aks_rg.location
  resource_group_name         = azurerm_resource_group.aks_rg.name
  enabled_for_disk_encryption = true
  tenant_id                   = data.azurerm_client_config.current.tenant_id
  sku_name                    = "standard"
  # Best practice: Don't use access policies, use RBAC
  enable_rbac_authorization = true 
}
data "azurerm_client_config" "current" {}

		

3. Link AKS to Key Vault (The “Magic” Link)

When you enable the CSI driver, AKS creates a “Secret Provider Class” identity. You must give that identity permission to read from the Key Vault.

Terraform

			
# Identify the Managed Identity created by the AKS CSI Driver
resource "azurerm_role_assignment" "aks_kv_reader" {
  scope                = azurerm_key_vault.kv.id
  role_definition_name = "Key Vault Secrets User"
  principal_id         = azurerm_kubernetes_cluster.aks.key_vault_secrets_provider[0].secret_identity[0].object_id
}

		

4. Usage: The SecretProviderClass (K8s Manifest)

Terraform sets up the infrastructure, but you still need a small Kubernetes object to tell the pod which secrets to pull. You can apply this via kubectl or a Terraform kubernetes_manifest resource:

YAML

			
apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
  name: azure-kv-provider
  namespace: production
spec:
  provider: azure
  parameters:
    usePodIdentity: "false"
    useVMManagedIdentity: "true"
    userAssignedIdentityID: "<AKS_CSI_CLIENT_ID>" # Output this from Terraform
    keyvaultName: "kv-prod-aks-01"
    objects:  |
      array:
        - |
          objectName: db-password
          objectType: secret
    tenantId: "<YOUR_TENANT_ID>"

		

Why this is a “Gold Standard” Setup

By using this approach, your Linux servers and Docker microservices become significantly more secure:

No “Cleartext” Secrets: Developers never see the production password.
Auto-Rotation: If you change the password in the Azure Portal/Key Vault, the CSI driver automatically updates the file inside the running Docker container within 2 minutes.
Audit Trail: Every time a pod accesses a secret, it’s logged in Azure Monitor.

Pro-Tip for your Proposal

When talking to the client, use this phrasing:

“I am implementing a Zero-Trust Secret Architecture. This ensures that sensitive credentials never touch our code repository or container images, and they are rotated automatically to prevent long-term credential leakage.”

Understanding OADP: A Guide to OpenShift API for Data Protection

April 22, 2026May 2, 2026 techhadoop kubernetes, OCP azure, cloud, devops, kubernetes

Here’s a thorough explanation of OADP — what it is, how it works, and how to use it.

What OADP is

The OpenShift API for Data Protection (OADP) product safeguards customer applications on OpenShift Container Platform. It offers comprehensive disaster recovery protection, covering OpenShift Container Platform applications, application-related cluster resources, persistent volumes, and internal images. OADP is also capable of backing up both containerized applications and virtual machines. However, OADP does not serve as a disaster recovery solution for etcd or OpenShift Operators.

In plain terms: OADP is the application-layer backup tool for OCP. Where etcd backup protects the cluster skeleton (all resource definitions), OADP protects what’s running inside namespaces — the actual workloads and their data.

OADP is the OpenShift API for Data Protection operator. This open source operator sets up and installs Velero on the OpenShift platform, allowing users to backup and restore applications.

ArchitectureHere’s a comprehensive explanation of OADP across all its key dimensions.

What OADP is

The OpenShift API for Data Protection (OADP) provides a comprehensive solution for backing up and restoring applications, persistent volumes, and custom resources across various environments. OADP is the OpenShift API for Data Protection operator — this open source operator sets up and installs Velero on the OpenShift platform, allowing users to backup and restore applications.

In short: OADP = Velero + OpenShift-specific plugins + OLM lifecycle management. Everything is driven by Kubernetes CRs.

What OADP protects

Data that can be protected with OADP includes Kubernetes resource objects, persistent volumes, and internal images. More specifically:

Kubernetes objects — all resources in selected namespaces: Deployments, Services, ConfigMaps, Secrets, Routes, PVCs, RoleBindings, etc.
Internal container images — images stored in the OCP internal registry (built by S2I/Tekton and not pushed externally)
Persistent volume data — via CSI snapshots, cloud-native snapshots, or file-system backup (Kopia)
OpenShift Virtualization VMs — OADP can quiesce VMs, snapshot their disks, and restore them fully

What it does NOT protect: OADP does not serve as a disaster recovery solution for etcd or OpenShift Operators. OADP support is applicable to customer workload namespaces and cluster scope resources. Full cluster backup and restore are not supported.

Core components

Component	Role
OADP Operator	Installs/manages Velero and all CRDs via OLM. Runs in `openshift-adp`
Velero	The backup engine — serialises K8s resources, coordinates PV backup
Node agent (Kopia)	DaemonSet on every node — handles file-level PV backup
`openshift` plugin	OCP-specific handling for Routes, SCCs, internal registry images
`csi` plugin	Integrates with CSI VolumeSnapshot API for fast PV snapshots

Step 1 — Install

Install from OperatorHub or via CLI:

			
cat <<EOF | oc apply -f -
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: redhat-oadp-operator
  namespace: openshift-adp
spec:
  channel: stable-1.5
  name: redhat-oadp-operator
  source: redhat-operators
  sourceNamespace: openshift-marketplace
  installPlanApproval: Automatic
EOF

		

Step 2 — Configure via DataProtectionApplication CR

The DataProtectionApplication (DPA) is the master config CR. It tells OADP where to store backups, which plugins to load, and how to handle PV backup:

			
apiVersion: oadp.openshift.io/v1alpha1
kind: DataProtectionApplication
metadata:
  name: dpa-cluster
  namespace: openshift-adp
spec:
  configuration:
    velero:
      defaultPlugins:
        - openshift   # required — handles Routes, SCCs, internal images
        - aws         # swap for gcp or azure as needed
        - csi         # enables CSI volume snapshots
    nodeAgent:
      enable: true
      uploaderType: kopia   # preferred over restic since OADP 1.3
  backupLocations:
    - name: default
      velero:
        provider: aws
        default: true
        credential:
          name: cloud-credentials
          key: cloud
        objectStorage:
          bucket: my-ocp-backups
          prefix: cluster-prod
        config:
          region: ca-central-1

		

For on-prem with ODF/NooBaa, use provider: aws with a custom s3Url pointing to the NooBaa S3 Route — no cloud account required.

Step 3 — Take backups

			
# One-time backup with a pre-hook to quiesce PostgreSQL
apiVersion: velero.io/v1
kind: Backup
metadata:
  name: my-app-backup
  namespace: openshift-adp
spec:
  includedNamespaces: [my-app, my-app-db]
  excludedResources: [events, events.events.k8s.io]
  defaultVolumesToFsBackup: true   # Kopia for PVs
  storageLocation: default
  ttl: 720h0m0s                    # 30-day retention
  hooks:
    resources:
      - name: quiesce-db
        includedNamespaces: [my-app-db]
        labelSelector:
          matchLabels:
            app: postgresql
        pre:
          - exec:
              container: postgresql
              command: ["/bin/bash", "-c", "psql -c 'CHECKPOINT'"]
              timeout: 30s

		

You can schedule backups at specified intervals. You can use hooks to run commands in a container on a pod, for example fsfreeze to freeze a file system. You can configure a hook to run before or after a backup or restore.

			
# Scheduled daily backup
apiVersion: velero.io/v1
kind: Schedule
metadata:
  name: daily-backup
  namespace: openshift-adp
spec:
  schedule: "0 2 * * *"
  template:
    includedNamespaces: ["*"]
    excludedNamespaces: [openshift-*, kube-*, openshift-adp]
    defaultVolumesToFsBackup: true
    storageLocation: default
    ttl: 168h0m0s   # 7-day retention

		

Step 4 — Restore

			
apiVersion: velero.io/v1
kind: Restore
metadata:
  name: my-app-restore
  namespace: openshift-adp
spec:
  backupName: my-app-backup
  restorePVs: true
  existingResourcePolicy: none   # skip resources that already exist

		

For cross-cluster disaster recovery, point the destination cluster’s DPA at the same S3 bucket with accessMode: ReadOnly, then create the Restore CR. OADP auto-creates the target namespace — don’t pre-create it, as that causes SCC conflicts.

PV backup — three strategies

The underlying mechanism within OADP that allows the backup and restore of persistent volumes is either Restic, Kopia, CSI snapshots, or CSI dataMover. Backups are incremental by default.

Strategy	Speed	Works on-prem	How
CSI snapshots	Fastest	Yes (Ceph RBD/FS)	Label a `VolumeSnapshotClass` with `velero.io/csi-volumesnapshot-class: "true"`
Native cloud snapshots	Fast	No	Configure `snapshotLocations` in DPA
Kopia (file-system backup)	Slower, incremental	Yes (any PV)	Set `defaultVolumesToFsBackup: true` in Backup CR

OADP 1.3 includes a built-in Data Mover that uses Kopia as the uploader mechanism to read snapshot data and write to a Unified Repository, allowing you to restore stateful applications from a remote object store if a failure or cluster corruption occurs.

Key limits and best practices

Always exclude events, pods, and replicasets from backups — they are recreated automatically
Test restores monthly — an untested backup is not a backup
Pair with etcd backup — OADP covers application data, etcd covers the cluster skeleton; both are needed for full DR
Use hooks for stateful apps (databases, message queues) to get crash-consistent backups
Monitor Velero’s Prometheus metrics at /metrics on the Velero pod and alert on backup.status.phase != Completed

Azure DR Test: Restore with OpenShift & OADP

April 22, 2026April 22, 2026 techhadoop OCP azure, cloud, devops, kubernetes, technology

Here’s a realistic Azure-specific DR test using
OpenShift Container Platform +
OpenShift API for Data Protection (OADP).

We’ll simulate a namespace + data loss and walk through a full restore using Azure Blob + Disk snapshots.

Scenario (Azure DR test)

❌ my-app namespace deleted
❌ PVC + data gone
❌ Need full recovery from backup

Environment:

OADP configured with Azure Blob
CSI snapshots enabled (Azure Disk)

What we’re restoring

Kubernetes resources (deployments, services, routes)
Persistent volumes (via Azure snapshots)
Application data

Flow overview

			
Backup (Blob + Disk Snapshot)
        ↓
Namespace deleted ❌
        ↓
Velero Restore triggered
        ↓
Resources recreated
        ↓
PVC restored from snapshot
        ↓
App back online ✅

		

Step-by-step restore

Step 1: Confirm failure

oc get ns my-app

Should show:

NotFound

Step 2: List available backups

oc get backup -n openshift-adp

Example:

azure-backup   Completed

Step 3: Create restore

			
apiVersion: velero.io/v1
kind: Restore
metadata:
  name: restore-my-app
  namespace: openshift-adp
spec:
  backupName: azure-backup
  includedNamespaces:
    - my-app

		

Apply:

oc apply -f restore.yaml

Step 4: Watch restore progress

oc get restore -n openshift-adp

Detailed:

oc describe restore restore-my-app -n openshift-adp

Step 5: Verify namespace restored

oc get ns my-app

Then:

oc get pods -n my-app

Step 6: Verify PVC restoration

oc get pvc -n my-app

Check:

Status = Bound

Step 7: Verify Azure disk restore

In Azure:

az disk list --resource-group <rg>

You should see:

restored disk from snapshot

Step 8: Check application

oc get routes -n my-app

Test:

curl http://<route>

What just happened

OADP pulled metadata from Azure Blob
Recreated Kubernetes objects
Triggered Azure disk snapshot restore
Reattached volumes to pods

Full app recovery

Real-world variations

Case 1: Partial restore

Restore only one resource:

			
includedResources:
  - deployments

Case 2: Restore to different namespace

			
namespaceMapping:
  my-app: my-app-restore

Case 3: Restore without volumes

restorePVs: false

Azure-specific pitfalls

1. Missing snapshot permissions

→ restore fails silently or PVC stuck

2. Storage class mismatch

→ PVC stays Pending

3. Region mismatch

→ snapshot cannot attach

4. Private cluster networking

→ cannot reach Blob storage

Troubleshooting

Check restore logs

oc logs -n openshift-adp deployment/velero

Check events

oc get events -n my-app

Check PVC issues

oc describe pvc <pvc-name> -n my-app

Pro DR test (recommended)

Simulate:

Backup app
Delete namespace
Restore
Validate data integrity

Do this quarterly

Advanced Azure DR test

Try:

Restore to new cluster in different region
Reconnect DNS
Validate external integrations

Key takeaway

Azure DR = Blob (metadata) + Disk snapshot (data)
OADP restores both together
Works for full or partial recovery

Step-by-Step Guide to Install OADP on OpenShift

April 21, 2026May 19, 2026 techhadoop Uncategorized azure, cloud, devops, kubernetes, technology

Here’s a practical step-by-step OADP install for OpenShift, using AWS S3 as the backup location. This is the most common pattern and maps to Red Hat’s current OADP flow: install the OADP Operator, create the default credentials secret, then create a DataProtectionApplication (DPA). OADP is the supported OpenShift path for application backup/restore, and for PV snapshots your provider must support native snapshots or CSI snapshots. (Red Hat Documentation)

1. Prereqs

You need:

cluster-admin access
an S3 bucket
AWS credentials with access to the bucket
snapshot support if you want PV snapshots
oc logged into the cluster. OADP also requires a default credentials secret during installation. (Red Hat Documentation)

2. Create the OADP namespace

oc create namespace openshift-adp

Red Hat’s OADP examples use openshift-adp as the namespace. (Red Hat Documentation)

3. Install the OADP Operator

In the OpenShift web console:

go to Operators → OperatorHub
search for OADP
open OpenShift API for Data Protection
click Install
install it into openshift-adp

Wait for the operator pod to be running:

oc get pods -n openshift-adp

The Red Hat flow is to install the OADP Operator first, then configure credentials and the DPA. (Red Hat Documentation)

4. Create the AWS credentials file

Create a local file named credentials-velero:

			
cat <<'EOF' > credentials-velero
[default]

aws_access_key_id=YOUR_AWS_ACCESS_KEY_ID

aws_secret_access_key=YOUR_AWS_SECRET_ACCESS_KEY

EOF

Red Hat documents this credentials-velero pattern for AWS-backed OADP installs. (Red Hat Documentation)

5. Create the default OADP secret

Create the required secret in openshift-adp:

			
oc create secret generic cloud-credentials \
  -n openshift-adp \
  --from-file cloud=./credentials-velero

For AWS, the default secret name is cloud-credentials. Red Hat notes that the DPA install expects a default secret; otherwise installation fails. (Red Hat Documentation)

6. Create the DataProtectionApplication

Apply a DPA like this:

			
apiVersion: oadp.openshift.io/v1alpha1
kind: DataProtectionApplication
metadata:
  name: dpa
  namespace: openshift-adp
spec:
  backupLocations:
    - velero:
        provider: aws
        default: true
        objectStorage:
          bucket: YOUR_S3_BUCKET
          prefix: ocp-backups
        config:
          region: us-east-1
  snapshotLocations:
    - velero:
        provider: aws
        config:
          region: us-east-1
  configuration:
    velero:
      defaultPlugins:
        - openshift
        - aws
        - csi

		

Apply it:

oc apply -f dpa.yaml

The DPA is the main OADP custom resource that wires backup storage and snapshot locations, and current OpenShift docs describe these OADP objects as the supported app backup path. (Red Hat Documentation)

7. Wait for OADP to become ready

Check the DPA and pods:

			
oc get dpa -n openshift-adp
oc get pods -n openshift-adp

You want the DPA to move to a ready state before creating backups. Red Hat’s backup flow requires the DataProtectionApplication to be Ready before backup CRs are used. (Red Hat Documentation)

8. Create your first backup

Once OADP is ready, back up a namespace:

			
apiVersion: velero.io/v1
kind: Backup
metadata:
  name: app-backup
  namespace: openshift-adp
spec:
  includedNamespaces:
    - my-app
  snapshotVolumes: true
  ttl: 720h

		

Apply it:

oc apply -f backup.yaml

OADP uses Velero backup CRs for application backup and supports filtering by namespace, labels, or resource type. (Red Hat Documentation)

9. Check backup status

			
oc get backup -n openshift-adp
oc describe backup app-backup -n openshift-adp

This confirms whether the backup finished and whether volume snapshots were taken.

10. Optional: schedule automatic backups

			
apiVersion: velero.io/v1
kind: Schedule
metadata:
  name: daily-backup
  namespace: openshift-adp
spec:
  schedule: "0 2 * * *"
  template:
    includedNamespaces:
      - my-app
    snapshotVolumes: true
    ttl: 720h

		

Apply it:

oc apply -f schedule.yaml

OADP supports scheduled Velero backups through Schedule objects. (Red Hat Documentation)

11. Common mistakes

No default cloud-credentials secret
wrong bucket region
no snapshot support for your storage class
assuming OADP backs up etcd; it does not
installing into a namespace with an overly long name can cause secret-labeling issues in some OADP cases. (Red Hat Documentation)

12. Minimal install checklist

			
oc create namespace openshift-adp
# install OADP Operator from OperatorHub
oc create secret generic cloud-credentials -n openshift-adp --from-file cloud=./credentials-velero
oc apply -f dpa.yaml
oc get dpa -n openshift-adp
oc apply -f backup.yaml
oc get backup -n openshift-adp

		

Kubernetes API Operations Through the ARO Private Endpoint

Every Operation Is a REST Call

Category 1 — Human CLI Operations (kubectl + oc)

kubectl — standard Kubernetes operations

oc CLI — OpenShift-specific additions

Category 2 — Operators and Controllers

The watch loop — how operators work

Built-in OpenShift operators that run this loop continuously

Category 3 — Kubelet (Node Agent)

Category 4 — CI/CD Pipelines

Category 5 — Admission Webhooks

Category 6 — Monitoring and Observability

The Request Pipeline — What Happens Inside

API Groups — Kubernetes vs OpenShift

Key Takeaway

OpenShift Container Platform on Azure — ARO Best Practices

1. Networking Best Practices

Always deploy a private cluster

Subnet sizing — get this right before deployment (cannot resize after)

Egress lockdown via Azure Firewall

Use a custom DNS server

2. Availability and Resilience Best Practices

Spread across all three Availability Zones

Enable cluster autoscaler

Use zone-redundant storage for persistent volumes

3. Security Best Practices

Use Workload Identity (pod-level Azure RBAC)

Integrate Azure Key Vault for secrets via CSI driver

Integrate with Azure Container Registry via private endpoint

Apply OpenShift Security Context Constraints (SCC)

Enable Microsoft Defender for Containers

4. Observability Best Practices

Forward logs to Azure Monitor / Log Analytics

Use Azure Monitor alerts for cluster health

5. Day-2 Operations Best Practices

Cluster upgrade strategy

Worker node upgrade — use surge capacity

Summary — ARO Best Practice Checklist

1. The Production Dockerfile

2. The requirements.txt

3. Deploying to AKS (The “Support” Strategy)

4. How to Pitch the “AI Operations” Tier

One final piece of the puzzle

1. The RBAC Strategy

2. The RBAC YAML (mcp-rbac.yaml)

3. Updating your Deployment

4. Why this is “Safe” for your Client

Putting it all together

1. AKS Fundamentals

What is Azure Kubernetes Service (AKS)?

Difference between AKS and Kubernetes?

2. Architecture & Components

What are the main components of AKS?

What is a node pool?

System node pool vs user node pool?

3. Networking (VERY IMPORTANT)

How does networking work in AKS?

Kubenet vs Azure CNI?

What is a private AKS cluster?

How do you expose applications?

4. Identity & Security

How does AKS handle identity?

What is pod identity?

How do you secure AKS?

5. Scaling & Availability

How do you scale AKS?

How do you ensure high availability?

6. Storage

How does storage work in AKS?

Azure Disk vs Azure File?

7. CI/CD & Deployment

How do you deploy apps to AKS?

What is Helm?

8. Monitoring & Troubleshooting

How do you monitor AKS?

Pod is not starting—what do you check?

Node is not ready—what do you check?

9. Real-World Scenario Questions

Your app is not reachable externally—what do you check?

Pods cannot reach database in Azure SQL

2. The `requirements.txt`

2. The RBAC YAML (`mcp-rbac.yaml`)