GKE vs AKS vs EKS: Comprehensive Security Analysis

May 11, 2026 techhadoop AKS, aws, azure, GCP, GKE

GKE vs AKS vs EKS Security Deep Dive

Quick verdict

Area	Strongest
Secure-by-default Kubernetes	GKE Autopilot
Enterprise identity/governance	AKS
AWS-native workload IAM	EKS
Runtime threat detection	AKS + Defender / EKS + GuardDuty
Supply-chain enforcement	GKE Binary Authorization
Network customization	EKS
Easiest production baseline	GKE Autopilot / AKS Automatic

1. Identity & Access

Feature	GKE	AKS	EKS
Cloud identity	Google IAM	Microsoft Entra ID	AWS IAM
Pod identity	Workload Identity Federation	Microsoft Entra Workload ID	IRSA / EKS Pod Identity
Cluster RBAC	Kubernetes RBAC + IAM	Kubernetes RBAC + Azure RBAC	Kubernetes RBAC + IAM mappings
Best fit	Clean GCP-native identity	Enterprise AD/Entra shops	AWS IAM-heavy environments

Deep point:
GKE Workload Identity Federation lets pods access Google Cloud APIs without service account keys. AKS integrates tightly with Microsoft Entra ID and Azure RBAC. EKS uses IAM Roles for Service Accounts so pods can call AWS APIs without static credentials. (Google Cloud Documentation)

2. Network Security

Area	GKE	AKS	EKS
Private cluster	Strong	Strong	Strong
Network policy	GKE Dataplane / Calico options	Azure/Cilium/Calico options	AWS VPC CNI + network policy options
Cloud firewall	VPC Firewall	NSG / Azure Firewall	Security Groups / NACLs
Ingress WAF	Cloud Armor	Azure WAF	AWS WAF
Service mesh	Anthos Service Mesh	Istio/OSM-style options	App Mesh/Istio

Deep point:
EKS usually gives the most AWS network-level flexibility, especially with VPC CNI, security groups, and subnet routing. AKS is strong when integrated into hub-spoke with Azure Firewall and Private DNS. GKE is clean and secure when paired with private clusters, Cloud NAT, VPC Service Controls, and Cloud Armor.

3. Workload Security

Control	GKE	AKS	EKS
Pod Security Standards	Yes	Yes	Yes
Sandbox isolation	GKE Sandbox / gVisor	Kata-style options depending setup	Bottlerocket / Firecracker ecosystem
Managed secure mode	Autopilot	AKS Automatic	EKS Auto Mode
Node hardening	Shielded GKE Nodes	Azure Linux / Ubuntu hardening	Bottlerocket / AL2023

Best default: GKE Autopilot
Autopilot applies many security controls by default, including managed node security and Workload Identity support. (Google Cloud Documentation)

Best enterprise Windows/Linux estate: AKS
AKS fits well when your company already uses Microsoft Defender, Entra ID, Azure Policy, and Log Analytics.

Best low-level control: EKS
EKS is powerful but more DIY. You can build a very secure platform, but you must configure more pieces yourself.

4. Policy & Governance

Area	GKE	AKS	EKS
Kubernetes policy	Policy Controller / Gatekeeper	Azure Policy for AKS	Kyverno / Gatekeeper / OPA
Cloud governance	Org Policy	Azure Policy	AWS Organizations / SCP
Compliance posture	Security Command Center	Defender for Cloud	Security Hub / GuardDuty

AKS is strongest for enterprise governance because Azure Policy can enforce AKS controls centrally, and Defender for Containers provides posture management, runtime detection, image vulnerability assessment, and recommendations. (Microsoft Learn)

5. Runtime Threat Detection

Platform	Native detection
GKE	Security Command Center + Cloud Logging/Monitoring
AKS	Microsoft Defender for Containers
EKS	GuardDuty EKS Runtime Monitoring

Defender for Containers provides Kubernetes runtime threat protection, image vulnerability assessment, posture insights, and alerts across AKS, EKS, and GKE. (Microsoft Learn)

EKS has strong AWS-native runtime detection through GuardDuty EKS Runtime Monitoring, which collects runtime signals such as process execution, file access, and network connections from EKS workloads. (AWS Documentation)

6. Secrets Management

Platform	Recommended approach
GKE	Secret Manager + Workload Identity
AKS	Azure Key Vault CSI Driver + Workload ID
EKS	AWS Secrets Manager / SSM Parameter Store + IRSA

Avoid Kubernetes Secrets for sensitive production credentials unless encrypted with KMS and tightly RBAC-controlled.

7. Image & Supply Chain Security

Area	GKE	AKS	EKS
Registry	Artifact Registry	Azure Container Registry	Amazon ECR
Image scanning	Artifact Analysis	Defender/ACR scanning	ECR scanning / Inspector
Deployment enforcement	Binary Authorization	Azure Policy / Gatekeeper	Kyverno/Gatekeeper + signing
Best supply-chain control	GKE	AKS	EKS

GKE wins supply-chain enforcement because Binary Authorization is a strong native control for allowing only trusted/signed images into clusters.

Best Platform by Scenario

Choose GKE when:

You want the most secure managed Kubernetes experience with less operational burden.

Best for:

GCP-native workloads
Strong secure defaults
Autopilot
Binary Authorization
Workload Identity Federation

Choose AKS when:

You are an enterprise Microsoft shop.

Best for:

Entra ID integration
Azure Policy
Defender for Cloud
Sentinel/Log Analytics
Hub-spoke landing zones
Regulated enterprise governance

Choose EKS when:

You need deep AWS control and flexibility.

Best for:

AWS IAM-heavy workloads
VPC-native networking
Security groups
GuardDuty
Bottlerocket
Fine-grained AWS architecture control

Final Ranking

Category	Winner
Secure defaults	GKE Autopilot
Enterprise governance	AKS
Cloud-native IAM flexibility	EKS
Runtime detection	AKS / EKS
Supply-chain enforcement	GKE
Network control	EKS
Hybrid enterprise SOC integration	AKS
Simplicity	GKE
Customization	EKS

Interview answer:
“GKE is strongest for secure defaults and supply-chain controls, AKS is strongest for enterprise governance and Microsoft security integration, and EKS is strongest for AWS-native IAM/network flexibility. In production, I would secure all three with private clusters, workload identity, network policies, pod security standards, secrets manager integration, image scanning, admission control, runtime threat detection, and centralized audit logging.”

Understanding Azure AI Search: Features and Benefits

May 6, 2026May 6, 2026 techhadoop ai, azure

Azure AI Search

What is Azure AI Search?

Azure AI Search (formerly Azure Cognitive Search) is a fully managed cloud search service that provides full-text search, vector search, semantic ranking, and AI enrichment over your own content — think of it as a smart, enterprise-grade search engine you point at your data.

			
Your Data                 Azure AI Search              Your App
──────────                ───────────────              ────────
Blob Storage  ──────────▶ Index + Embeddings ────────▶ Search Results
SQL Database  (indexing)  Vector Store        (query)  RAG Answers
SharePoint               AI Enrichment                 Recommendations
CosmosDB                 Semantic Ranking              Autocomplete
Custom API               Hybrid Search

		

Core Concepts

			
┌─────────────────────────────────────────────────────────────┐
│                    AZURE AI SEARCH SERVICE                   │
│                                                              │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────────┐   │
│  │    Index     │  │   Indexer    │  │  Skillset (AI)   │   │
│  │              │  │              │  │                  │   │
│  │ - Fields     │  │ - Pulls from │  │ - OCR            │   │
│  │ - Embeddings │  │   data source│  │ - Entity extract │   │
│  │ - ACL fields │  │ - Schedules  │  │ - Translation    │   │
│  │ - Schema     │  │ - Transforms │  │ - Key phrases    │   │
│  └──────────────┘  └──────────────┘  │ - Custom skills  │   │
│                                      └──────────────────┘   │
│  ┌──────────────────────────────────────────────────────┐   │
│  │                   QUERY ENGINE                        │   │
│  │  Full-text │ Vector │ Hybrid │ Semantic │ Filters    │   │
│  └──────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────┘

		

Concept	What it is
Index	A collection of searchable documents (like a table in a DB)
Field	A property of a document (searchable, filterable, facetable)
Indexer	Automated pipeline that pulls data from a source into the index
Data Source	Connection to where your raw data lives
Skillset	AI enrichment pipeline applied during indexing
Scoring Profile	Custom relevance boosting rules
Semantic Config	L2 reranking using language understanding

Search Types

Azure AI Search supports four search modes — often combined:

			
Query: "What is our refund policy for international orders?"
Full-text Search        Vector Search           Hybrid Search
────────────────        ─────────────           ─────────────
Keyword matching        Semantic similarity     Full-text + Vector
BM25 algorithm          Embedding comparison    Combined score (RRF)
"refund policy"         Similar meaning docs    Best of both worlds
international           Even if words differ
orders
                                   +
                        Semantic Reranking (L2)
                        Re-orders results using
                        deep language model
                        understanding

		

Full-Text Search

Classic keyword search using BM25 ranking algorithm:

			
POST /indexes/documents/docs/search?api-version=2024-07-01
{
  "search": "refund policy international orders",
  "queryType": "full",
  "searchMode": "all",
  "searchFields": ["content", "title"],
  "select": "id, title, content, source_file",
  "top": 5,
  "count": true
}

		

Supports:

			
Simple query:  "refund policy"
Phrase query:  "\"refund policy\""         exact phrase
Wildcard:      "refund*"                   prefix match
Fuzzy:         "refund~1"                  1 edit distance
Proximity:     "\"refund policy\"~5"       within 5 words
Boolean:       "refund AND (policy OR terms)"
Boosting:      "refund^3 policy"           boost refund term

		

Vector Search

Search by semantic meaning using embeddings — finds relevant docs even when exact keywords don’t match:

			
"How do I get my money back?"
        ↓
Embedding model (text-embedding-3-large)
        ↓
[0.023, -0.412, 0.891, ...]  1536-dimension vector
        ↓
Cosine similarity search in index
        ↓
Finds: "Refund and return policy" (similar meaning)
        even though no words match

		

			
POST /indexes/documents/docs/search?api-version=2024-07-01
{
  "vectorQueries": [
    {
      "kind": "vector",
      "vector": [0.023, -0.412, 0.891, ...],
      "fields": "embedding",
      "k": 5,
      "exhaustive": false
    }
  ],
  "select": "id, title, content",
  "top": 5
}

		

Vector Algorithm Options

Algorithm	Speed	Accuracy	Use case
HNSW	Fast	High	Production — approximate nearest neighbor
Exhaustive KNN	Slow	Perfect	Small indexes or testing

			
// Index vector field config
{
  "name": "embedding",
  "type": "Collection(Edm.Single)",
  "searchable": true,
  "dimensions": 1536,
  "vectorSearchProfile": "hnsw-profile"
}

		

Hybrid Search (Best Quality)

Combines full-text + vector scores using Reciprocal Rank Fusion (RRF):

			
Query: "refund policy international"
BM25 Results:              Vector Results:
1. Refund Policy Doc       1. Return & Refund Guide
2. International FAQ       2. Customer Service Policy
3. Terms & Conditions      3. International Orders FAQ
        ↓ RRF merges both ranked lists ↓
Hybrid Results (best quality):
1. Refund Policy Doc       (top in both)
2. Return & Refund Guide   (high vector score)
3. International FAQ       (high BM25 score)

		

			
POST /indexes/documents/docs/search?api-version=2024-07-01
{
  "search": "refund policy international orders",
  "vectorQueries": [
    {
      "kind": "vector",
      "vector": [0.023, -0.412, ...],
      "fields": "embedding",
      "k": 50
    }
  ],
  "queryType": "simple",
  "select": "id, title, content, source_file",
  "top": 5
}

		

Semantic Ranking

A second-pass reranking layer using a Microsoft language model — reads and understands the top results to re-order them by actual relevance:

			
Hybrid Search → Top 50 results
                      ↓
              Semantic Ranker
              (reads each result,
               understands meaning,
               compares to query intent)
                      ↓
              Reranked Top 5
              + Semantic captions
              + Semantic answers
              (extracted key passages)

		

			
{
  "search": "refund policy international orders",
  "vectorQueries": [...],
  "queryType": "semantic",
  "semanticConfiguration": "my-semantic-config",
  "captions": "extractive",          // extract relevant snippets
  "answers": "extractive|count-3",   // extract direct answers
  "top": 5
}

		

Response includes:

			
{
  "value": [
    {
      "@search.rerankerScore": 2.847,
      "@search.captions": [
        {
          "text": "International orders are eligible for refund within 30 days",
          "highlights": "International orders...refund within 30 days"
        }
      ],
      "content": "Full document content...",
      "title": "Refund Policy"
    }
  ],
  "@search.answers": [
    {
      "text": "International orders are eligible for refund within 30 days of purchase",
      "score": 0.94
    }
  ]
}

		

Index Schema Design

			
{
  "name": "rag-index",
  "fields": [
    {
      "name": "chunk_id",
      "type": "Edm.String",
      "key": true,
      "searchable": false,
      "filterable": true
    },
    {
      "name": "content",
      "type": "Edm.String",
      "searchable": true,          // full-text search
      "filterable": false,
      "retrievable": true,
      "analyzer": "en.microsoft"   // language analyzer
    },
    {
      "name": "title",
      "type": "Edm.String",
      "searchable": true,
      "filterable": true,
      "retrievable": true
    },
    {
      "name": "source_file",
      "type": "Edm.String",
      "searchable": false,
      "filterable": true,          // filter by document
      "retrievable": true,
      "facetable": true            // faceted navigation
    },
    {
      "name": "sensitivity",
      "type": "Edm.String",
      "filterable": true,          // filter by label
      "facetable": true
    },
    {
      "name": "allowed_groups",
      "type": "Collection(Edm.String)",
      "filterable": true           // document-level security
    },
    {
      "name": "last_modified",
      "type": "Edm.DateTimeOffset",
      "filterable": true,
      "sortable": true
    },
    {
      "name": "embedding",
      "type": "Collection(Edm.Single)",
      "searchable": true,
      "retrievable": false,        // don't return raw vectors
      "dimensions": 3072,          // text-embedding-3-large
      "vectorSearchProfile": "hnsw-profile"
    }
  ],
  "vectorSearch": {
    "algorithms": [
      {
        "name": "hnsw-algo",
        "kind": "hnsw",
        "hnswParameters": {
          "metric": "cosine",
          "m": 4,                  // connections per layer
          "efConstruction": 400,   // build quality
          "efSearch": 500          // query quality
        }
      }
    ],
    "profiles": [
      {
        "name": "hnsw-profile",
        "algorithm": "hnsw-algo",
        "vectorizer": "azure-openai-vectorizer"
      }
    ],
    "vectorizers": [
      {
        "name": "azure-openai-vectorizer",
        "kind": "azureOpenAI",
        "azureOpenAIParameters": {
          "resourceUri": "https://myoai.openai.azure.com",
          "deploymentId": "text-embedding-3-large",
          "modelName": "text-embedding-3-large"
        }
      }
    ]
  },
  "semantic": {
    "configurations": [
      {
        "name": "my-semantic-config",
        "prioritizedFields": {
          "titleField": { "fieldName": "title" },
          "contentFields": [
            { "fieldName": "content" }
          ]
        }
      }
    ]
  }
}

		

Indexers and Data Sources

Indexers automatically pull data from Azure sources on a schedule:

			
Data Sources supported:
├── Azure Blob Storage      (PDF, Word, Excel, HTML, JSON)
├── Azure Data Lake Gen2
├── Azure SQL Database
├── Azure Cosmos DB
├── Azure Table Storage
├── SharePoint Online
└── OneLake (Fabric)

		

			
// Data source connection
{
  "name": "blob-datasource",
  "type": "azureblob",
  "credentials": {
    "connectionString": "ResourceId=/subscriptions/.../storageAccounts/mystg"
  },
  "container": {
    "name": "documents",
    "query": "processed/"    // only index this folder
  },
  "dataDeletionDetectionPolicy": {
    "@odata.type": "#Microsoft.Azure.Search.SoftDeleteColumnDeletionDetectionPolicy",
    "softDeleteColumnName": "IsDeleted",
    "softDeleteMarkerValue": "true"
  }
}
// Indexer — runs on schedule
{
  "name": "blob-indexer",
  "dataSourceName": "blob-datasource",
  "targetIndexName": "rag-index",
  "skillsetName": "ai-enrichment-skillset",
  "schedule": {
    "interval": "PT1H"       // run every hour
  },
  "parameters": {
    "batchSize": 10,
    "configuration": {
      "dataToExtract": "contentAndMetadata",
      "parsingMode": "default"
    }
  },
  "fieldMappings": [
    {
      "sourceFieldName": "metadata_storage_name",
      "targetFieldName": "source_file"
    }
  ],
  "outputFieldMappings": [
    {
      "sourceFieldName": "/document/content/pages/*/embedding",
      "targetFieldName": "embedding"
    }
  ]
}

		

AI Enrichment Skillsets

Skillsets are AI pipelines applied at index time — transform raw documents into enriched, searchable content:

			
Raw PDF
   ↓
OCR Skill              → extracts text from scanned images
   ↓
Split Skill            → chunks text into 512-token pieces
   ↓
Entity Recognition     → extracts people, orgs, locations
   ↓
Key Phrase Extraction  → identifies main topics
   ↓
Language Detection     → detects document language
   ↓
Translation Skill      → translates to English if needed
   ↓
Embedding Skill        → generates vectors via Azure OpenAI
   ↓
Index

		

			
{
  "name": "ai-enrichment-skillset",
  "skills": [
    {
      "@odata.type": "#Microsoft.Skills.Vision.OcrSkill",
      "name": "ocr-skill",
      "inputs": [{ "name": "image", "source": "/document/normalized_images/*" }],
      "outputs": [{ "name": "text", "targetName": "extracted_text" }]
    },
    {
      "@odata.type": "#Microsoft.Skills.Text.SplitSkill",
      "name": "split-skill",
      "textSplitMode": "pages",
      "maximumPageLength": 512,
      "pageOverlapLength": 50,
      "inputs": [{ "name": "text", "source": "/document/content" }],
      "outputs": [{ "name": "textItems", "targetName": "pages" }]
    },
    {
      "@odata.type": "#Microsoft.Skills.Text.AzureOpenAIEmbeddingSkill",
      "name": "embedding-skill",
      "resourceUri": "https://myoai.openai.azure.com",
      "deploymentId": "text-embedding-3-large",
      "modelName": "text-embedding-3-large",
      "inputs": [{ "name": "text", "source": "/document/content/pages/*" }],
      "outputs": [{ "name": "embedding", "targetName": "embedding" }]
    },
    {
      "@odata.type": "#Microsoft.Skills.Text.EntityRecognitionSkill",
      "name": "entity-skill",
      "categories": ["Person", "Organization", "Location"],
      "inputs": [{ "name": "text", "source": "/document/content" }],
      "outputs": [
        { "name": "persons", "targetName": "persons" },
        { "name": "organizations", "targetName": "organizations" }
      ]
    }
  ],
  "knowledgeStore": {
    "storageConnectionString": "...",
    "projections": [
      {
        "tables": [
          {
            "tableName": "enrichedDocuments",
            "source": "/document"
          }
        ]
      }
    ]
  }
}

		

Filtering and Facets

			
# Security filter — document-level ACL
security_filter = (
    f"allowed_groups/any(g: search.in(g, '{','.join(user_groups)}'))"
    f" or allowed_users/any(u: u eq '{user_id}')"
)
# Combined search with filters
results = search_client.search(
    search_text=query,
    vector_queries=[vector_query],
    # Filter — applied before scoring (fast)
    filter=f"sensitivity ne 'HighlyConfidential' and ({security_filter})",
    # Facets — for navigation UI
    facets=["sensitivity", "source_file", "last_modified,interval:year"],
    # Ordering
    order_by=["@search.score desc", "last_modified desc"],
    # Pagination
    skip=0,
    top=10,
    # Which fields to return
    select=["id", "title", "content", "source_file", "last_modified"],
    # Highlight matching terms
    highlight_fields="content-3",       # 3 fragments
    highlight_pre_tag="<mark>",
    highlight_post_tag="</mark>"
)
# Facet results for navigation
for facet in results.get_facets().get("sensitivity", []):
    print(f"{facet['value']}: {facet['count']} docs")

		

Integrated Vectorization (Preview)

Newest feature — AI Search handles embedding automatically, no separate embedding calls:

			
# Old way — embed query yourself then search
embedding = openai_client.embeddings.create(
    input=query,
    model="text-embedding-3-large"
).data[0].embedding
results = search_client.search(
    vector_queries=[VectorizedQuery(vector=embedding, ...)]
)
# New way — integrated vectorization
# Search service embeds query automatically
results = search_client.search(
    search_text=query,
    vector_queries=[VectorizableTextQuery(
        text=query,               # ← pass text, not vector
        fields="embedding",
        k_nearest_neighbors=5
    )]
)

		

Scoring Profiles (Custom Relevance)

Boost certain fields or freshness in ranking:

			
{
  "scoringProfiles": [
    {
      "name": "boost-recent-and-title",
      "text": {
        "weights": {
          "title": 5,           // title matches worth 5x
          "content": 1
        }
      },
      "functions": [
        {
          "type": "freshness",
          "fieldName": "last_modified",
          "boost": 3,
          "freshness": {
            "boostingDuration": "P30D"  // boost docs < 30 days old
          }
        },
        {
          "type": "tag",
          "fieldName": "tags",
          "boost": 2,
          "tag": {
            "tagsParameter": "userTags"  // boost matching user tags
          }
        }
      ],
      "functionAggregation": "sum"
    }
  ],
  "defaultScoringProfile": "boost-recent-and-title"
}

		

Python SDK — Complete RAG Example

			
from azure.search.documents import SearchClient
from azure.search.documents.models import (
    VectorizedQuery,
    QueryType,
    QueryCaptionType,
    QueryAnswerType
)
from azure.identity import DefaultAzureCredential
from openai import AzureOpenAI
credential = DefaultAzureCredential()
search_client = SearchClient(
    endpoint=SEARCH_ENDPOINT,
    index_name="rag-index",
    credential=credential
)
openai_client = AzureOpenAI(
    azure_endpoint=OPENAI_ENDPOINT,
    azure_ad_token_provider=get_token_provider(credential)
)
def hybrid_search_with_security(
    query: str,
    user_groups: list,
    user_id: str,
    top: int = 5
) -> list:
    # 1. Embed query
    query_embedding = openai_client.embeddings.create(
        input=query,
        model="text-embedding-3-large"
    ).data[0].embedding
    # 2. Build security filter
    group_filter = " or ".join(

		

[f”allowed_groups/any(g: g eq ‘{g}’)” for g in user_groups]

) security_filter = f”({group_filter}) or allowed_users/any(u: u eq ‘{user_id}’)” # 3. Hybrid search with semantic reranking results = search_client.search( search_text=query, vector_queries=[ VectorizedQuery( vector=query_embedding, k_nearest_neighbors=50, fields=”embedding” ) ], filter=security_filter, query_type=QueryType.SEMANTIC, semantic_configuration_name=”my-semantic-config”, query_caption=QueryCaptionType.EXTRACTIVE, query_answer=QueryAnswerType.EXTRACTIVE, top=top, select=[“chunk_id”, “content”, “title”, “source_file”] ) # 4. Extract results chunks = [] for result in results: chunks.append({ “content”: result[“content”], “title”: result[“title”], “source”: result[“source_file”], “score”: result[“@search.reranker_score”], “caption”: result.get(“@search.captions”, [{}])[0].get(“text”, “”) }) return chunks

SKU / Pricing Tiers

Tier	Use case	Vector index size	Replicas
Free	Dev / POC	0.5 GB	1
Basic	Small prod	2 GB	3 max
Standard S1	General prod	25 GB	12 max
Standard S2	Large prod	100 GB	12 max
Standard S3	Enterprise	200 GB	12 max
Storage Optimized L1/L2	Huge indexes	2 TB	12 max

Scale with replicas (HA + throughput) and partitions (storage + index capacity):

			
Total capacity = replicas × partitions
S1 with 3 replicas + 2 partitions = 6 search units (SU)

Best Practices

Practice	Why
Always use hybrid search	Better quality than either alone
Add semantic reranking	Significant quality improvement for top results
Set security filter at retrieval	Never rely on post-filter security
Use `retrievable: false` on embeddings	Save bandwidth — raw vectors not needed in response
Index in batches of 1000 documents	Optimal indexing throughput
Use managed identity — no API keys	Security best practice
Set `efSearch` ≥ 500 for HNSW	Better recall at cost of slight latency
Use separate indexes per environment	Avoid dev data polluting prod
Monitor throttling (503 errors)	Add replicas if seeing throttle
Use `@search.score` threshold	Filter low-confidence results

Azure AI Search is the centerpiece of enterprise RAG on Azure — it handles full-text, vector, hybrid, and semantic search in one managed service, with built-in security filtering, AI enrichment, and deep Azure integration.

Understanding Azure Account Types: A Complete Guide

May 2, 2026May 2, 2026 techhadoop azure

Types of Accounts in Azure

Azure has a layered identity model — several distinct account and identity types, each serving a different purpose across the platform.

Category 1 — Human / Interactive Accounts

1. Microsoft Account (MSA)

A personal Microsoft account used to access Azure on a personal or pay-as-you-go basis.

			
Examples:   john@outlook.com
            jane@hotmail.com
            user@live.com
Use cases:
  - Personal Azure subscriptions
  - Free tier / Visual Studio subscriptions
  - Individual developers exploring Azure
Limitations:
  - Not managed by an organisation
  - No Conditional Access policies
  - No central IT control
  - Not suitable for enterprise use

		

2. Work or School Account (Entra ID Member)

The standard enterprise identity — managed by your organisation’s Entra ID tenant. This is the default account type for all corporate Azure users.

			
Format:     user@contoso.com
            admin@contoso.onmicrosoft.com
Properties:
  - Managed by IT admin
  - MFA enforced via Conditional Access
  - Password policies applied
  - Licences assigned (M365, Defender, etc.)
  - PIM eligible for elevated roles
  - Accessible via SSPR (self-service password reset)
Authentication:
  - Password + MFA (Authenticator app, FIDO2, SMS)
  - Certificate-based authentication
  - Windows Hello for Business
  - Passwordless (FIDO2 hardware key)

		

3. Guest Account (B2B — Business to Business)

External users from other organisations or personal email addresses invited into your tenant. They retain their home identity but can access resources you explicitly grant.

			
Examples:   partner@vendor.com (has their own Entra ID tenant)
            contractor@gmail.com (personal account)
How it works:
  You invite → They accept → Shadow account created in your tenant
  They authenticate with THEIR identity provider
  They access only what you grant via RBAC
Guest limitations by default:
  - Cannot enumerate users/groups in your tenant
  - Cannot register new applications
  - Cannot access the Azure portal (unless explicitly granted)
  - Subject to your Conditional Access policies (optional)
Use cases:
  - External consultants needing resource access
  - Partner company developers
  - Auditors needing read access

		

4. Break-Glass Emergency Access Account

A special account kept outside normal identity governance — used only when all normal admin access fails (MFA outage, Conditional Access misconfiguration locking out all admins).

			
Characteristics:
  - Cloud-only account (never synced from on-premises)
  - Not assigned to any individual person
  - Excluded from ALL Conditional Access policies
  - Uses very long, complex password (stored in physical vault)
  - No MFA device dependency
  - Assigned Global Administrator role
Monitoring:
  - ANY sign-in to this account → immediate alert to CISO
  - Azure Monitor alert on SigninLogs:
    | where UserPrincipalName == "breakglass@contoso.com"
    → PagerDuty + email + SMS to security team
Best practice: Have exactly 2 break-glass accounts.
               Store credentials in separate physical vaults.
               Test quarterly (sign in, verify access, sign out).

		

Category 2 — Non-Human / Workload Identities

5. Service Principal

The identity used by applications, scripts, and automation tools to authenticate to Azure. Created from an App Registration in Entra ID.

			
Components:
  App Registration (global, 1 per app)
    ├── Application ID (client_id): "a1b2c3d4-..."
    ├── Client secret or certificate
    └── API permissions
  Service Principal (per tenant, can be multi-tenant)
    └── The actual identity object in your tenant
Authentication:
  client_id + client_secret  → access token
  client_id + certificate    → access token (more secure)
Use cases:
  - CI/CD pipelines (Azure DevOps, GitHub Actions)
  - Terraform / infrastructure automation
  - External applications calling Azure APIs
  - Cross-tenant access
Risks:
  - Secrets expire — need rotation process
  - Secrets can be leaked in code/logs
  - Over-privileged SPs are a major attack vector

		

Service principal in a CI/CD pipeline:

			
# Authenticate as service principal
az login \
  --service-principal \
  --username $CLIENT_ID \
  --password $CLIENT_SECRET \
  --tenant $TENANT_ID
# Or with certificate (recommended)
az login \
  --service-principal \
  --username $CLIENT_ID \
  --certificate /path/to/cert.pem \
  --tenant $TENANT_ID

		

6. System-Assigned Managed Identity

An identity automatically created and managed by Azure for a specific resource. When the resource is deleted, the identity is deleted too — they share a lifecycle.

			
Supported resources:
  Virtual Machines, VM Scale Sets
  App Service, Azure Functions
  AKS (kubelet identity)
  ARO (cluster identity)
  Logic Apps, API Management
  Container Instances, Container Apps
  Azure Data Factory, Synapse
How it works:
  1. Enable on VM:
     az vm identity assign --resource-group rg --name my-vm
  2. Azure creates a service principal in Entra ID
     (you never see the secret — Azure manages it)
  3. VM gets an instance metadata endpoint:
     http://169.254.169.254/metadata/identity/oauth2/token
  4. Code on the VM requests a token:
     curl http://169.254.169.254/metadata/identity/oauth2/token
          ?api-version=2018-02-01
          &resource=https://vault.azure.net
     → Returns: { "access_token": "eyJ0..." }
  5. Token used to call Azure services:
     No password, no secret, no rotation needed
Characteristics:
  One identity per resource
  Deleted when resource is deleted
  Cannot be shared across resources
  Best for: single-purpose resource identities

		

7. User-Assigned Managed Identity

A standalone Entra ID resource — an identity that exists independently and can be assigned to multiple resources simultaneously.

			
Creation:
  az identity create \
    --resource-group rg-identities \
    --name id-payment-processor
    # Creates: /subscriptions/.../resourceGroups/rg-identities/
    #          providers/Microsoft.ManagedIdentity/
    #          userAssignedIdentities/id-payment-processor
Assignment to multiple resources:
  az vm identity assign \
    --resource-group rg-prod \
    --name vm-app-01 \
    --identities id-payment-processor
  az vm identity assign \
    --resource-group rg-prod \
    --name vm-app-02 \
    --identities id-payment-processor
  az functionapp identity assign \
    --resource-group rg-prod \
    --name func-payment \
    --identities id-payment-processor
Characteristics:
  Independent lifecycle (survives resource deletion)
  Shared across multiple resources
  Consistent RBAC assignments across all resources
  Best for: shared identity across a fleet of resources
RBAC assignment (one assignment covers all resources):
  az role assignment create \
    --assignee <principal-id-of-identity> \
    --role "Key Vault Secrets User" \
    --scope /subscriptions/.../vaults/kv-prod

		

System-Assigned vs User-Assigned — When to Use Which

Scenario	Use
Single VM needs Key Vault access	System-assigned
20 VMs all need the same storage access	User-assigned
App Service accessing one SQL database	System-assigned
Microservices fleet sharing one ACR identity	User-assigned
Resource deleted frequently (dev/test)	User-assigned (identity survives)
Consistent RBAC across multiple resources	User-assigned

Category 3 — Administrative Accounts (RBAC Roles)

Azure RBAC vs Entra ID Roles — Two Separate Systems

			
Entra ID roles:              Azure RBAC roles:
Controls Entra ID itself     Controls Azure resources
─────────────────────────    ────────────────────────
Global Administrator         Owner
User Administrator           Contributor
Security Administrator       Reader
Billing Administrator        Custom roles
Application Administrator    Resource-specific roles
A Global Admin has NO Azure resource access by default.
A Subscription Owner has NO Entra ID admin access by default.
These are completely separate permission systems.

		

Built-in Azure RBAC roles

Role	Can do	Cannot do
Owner	Everything — full control	N/A
Contributor	Create/manage all resources	Change RBAC assignments
Reader	View all resources	Any write operation
User Access Administrator	Manage RBAC assignments	Manage resources
Resource-specific	e.g. VM Contributor, Storage Blob Reader	Other resource types

Privileged Identity Management (PIM)

Best practice — never assign permanent privileged roles. Use PIM for just-in-time access:

			
Normal state:   user has Reader role (permanent)
Need access:    user requests Contributor for 2 hours
Approval:       manager approves in 5 minutes
Active window:  user has Contributor for 2 hours
Expires:        automatically reverts to Reader
Audit:          full log of why, when, approved by whom

		

Category 4 — Special / Hybrid Account Types

8. Hybrid Synced Account (Entra Connect)

On-premises Active Directory accounts synchronised to Entra ID via Microsoft Entra Connect (formerly Azure AD Connect):

			
On-premises AD:    CONTOSO\jsmith  (john.smith@contoso.local)
Synced to Entra:   john.smith@contoso.com
Sync scope:
  - Username, display name, UPN
  - Group memberships
  - Password hash (optional — for password hash sync)
  - Not: local admin rights, on-prem group policies
Authentication options:
  Password Hash Sync (PHS)  — hash synced to cloud, auth in cloud
  Pass-Through Auth (PTA)   — auth forwarded to on-prem AD
  Federation (ADFS)         — full on-prem auth, token issued

		

9. B2C Consumer Account

For customer-facing applications — completely separate from corporate B2B. Customers sign up with social identities or email:

			
Supported identity providers:
  Google, Facebook, Apple, Amazon (social login)
  Local email + password (managed by your B2C tenant)
  Any OpenID Connect provider
Use case:
  Public app where customers register themselves
  Retail, banking, insurance customer portals
  NOT for employee or partner access (use B2B for that)

		

10. Federated Workload Identity

A modern alternative to service principal secrets — allows external workloads (GitHub Actions, Kubernetes pods) to authenticate to Azure using their own platform’s identity token, with no secrets stored anywhere:

			
GitHub Actions example:
  1. GitHub issues OIDC token to pipeline run
  2. Pipeline exchanges GitHub token for Azure access token
  3. Azure validates: "is this the right repo/branch/environment?"
  4. Access granted — no client secret ever created
Configuration:
  az ad app federated-credential create \
    --id <app-id> \
    --parameters '{
      "name": "github-prod-deploy",
      "issuer": "https://token.actions.githubusercontent.com",
      "subject": "repo:contoso/my-app:environment:production",
      "audiences": ["api://AzureADTokenExchange"]
    }'
GitHub Actions workflow:
  permissions:
    id-token: write    # request OIDC token
  steps:
    - uses: azure/login@v2
      with:
        client-id: ${{ secrets.AZURE_CLIENT_ID }}
        tenant-id: ${{ secrets.AZURE_TENANT_ID }}
        subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
        # No client-secret needed!

		

Account Type Decision Tree

			
Need to give access to...
A person in your organisation?
  └── Work/school account in Entra ID + PIM for admin roles
A person outside your organisation?
  └── B2B Guest account (they keep their own identity)
A public-facing customer?
  └── Azure AD B2C consumer account
A VM or Azure resource?
  └── Managed Identity (system or user-assigned)
  └── Never use a service principal for Azure resource auth
An external app / CI-CD pipeline?
  └── Service Principal (with certificate, not secret)
  └── Or Federated Workload Identity (no secret at all — best practice)
Emergency admin access?
  └── Break-glass account (2 accounts, physical vault, monitored)
Hybrid on-premises users?
  └── Entra Connect sync (password hash sync recommended)

		

Security Best Practices Summary

Practice	Why
Use Managed Identity over Service Principal for Azure resources	No secrets to manage or leak
Use Federated Identity over SP secrets for CI/CD	No secrets stored anywhere
Enable PIM for all privileged roles	No permanent standing access
Require MFA for all human accounts	Stops 99.9% of credential attacks
Have exactly 2 break-glass accounts	Survive Conditional Access lockout
Exclude break-glass from Conditional Access	Must be usable when CA breaks
Monitor break-glass sign-ins with alerts	Detect misuse immediately
Never share accounts between people	Maintain audit trail per-person
Set service principal secrets to expire ≤ 12 months	Limit exposure window
Assign roles at resource group scope, not subscription	Principle of least privilege

Microsoft Sentinel: Automating Threat Response in Azure

May 1, 2026May 1, 2026 techhadoop azure ai, azure, cloud, security, technology

Azure Sentinel (Microsoft Sentinel)

Microsoft Sentinel is Azure’s cloud-native SIEM (Security Information and Event Management) and SOAR (Security Orchestration, Automation and Response) platform — a single service that collects security data from across your entire estate, detects threats using AI and analytics, investigates incidents, and automates responses.

What Sentinel Actually Is — SIEM + SOAR Combined

			
SIEM (Security Information and Event Management)
  → Collects logs from everything
  → Correlates events across sources
  → Detects threats using rules + AI
  → Surfaces alerts and incidents
SOAR (Security Orchestration, Automation and Response)
  → Automates response to detected threats
  → Runs playbooks (Logic Apps) automatically
  → Integrates with ticketing, ITSM, and remediation tools
  → Reduces mean time to respond (MTTR)
Sentinel = both in one service, built on Log Analytics

		

The Four Pillars

Pillar 1 — Collect

Data flows into Sentinel through data connectors — pre-built integrations that normalise log formats and write to Log Analytics tables:

Azure native connectors (free ingestion):

Microsoft Defender for Cloud
Entra ID sign-in and audit logs
Azure Activity logs (ARM operations)
Azure Firewall logs
NSG flow logs
Key Vault audit logs
Azure Kubernetes Service (AKS/ARO)

Microsoft 365 connectors:

Microsoft 365 Defender (XDR)
Office 365 (Exchange, SharePoint, Teams)
Microsoft Defender for Endpoint
Microsoft Defender for Identity
Microsoft Defender for Cloud Apps

Third-party connectors:

Palo Alto, Fortinet, Check Point firewalls
Cisco ASA, Umbrella, Meraki
Okta, CrowdStrike, SentinelOne
AWS CloudTrail, S3 access logs
GCP audit logs

On-premises via agents:

			
Windows VMs → Log Analytics Agent → SecurityEvent table
Linux VMs   → Syslog → Syslog table
Network devices → CEF → AMA agent → CommonSecurityLog table

Pillar 2 — Detect

Sentinel detects threats through five types of analytics rules:

Scheduled rules — KQL queries on a timer

			
// Detect impossible travel — same user, two countries, <1 hour apart
let threshold_minutes = 60;
SigninLogs
| where TimeGenerated > ago(1d)
| where ResultType == 0  // successful sign-in
| project TimeGenerated, UserPrincipalName,
          Location, IPAddress,
          Latitude = toreal(LocationDetails.geoCoordinates.latitude),
          Longitude = toreal(LocationDetails.geoCoordinates.longitude)
| sort by UserPrincipalName, TimeGenerated asc
| extend PrevLocation = prev(Location, 1),
         PrevTime = prev(TimeGenerated, 1),
         PrevUser = prev(UserPrincipalName, 1)
| where UserPrincipalName == PrevUser
| extend TimeDiff = datetime_diff('minute', TimeGenerated, PrevTime)
| where TimeDiff < threshold_minutes
| where Location != PrevLocation
| project UserPrincipalName, Location, PrevLocation,
          TimeDiff, IPAddress, TimeGenerated

		

Near Real-Time (NRT) rules — sub-1-minute detection

			
// Detect Azure Firewall blocking connections to known malicious IPs
AzureDiagnostics
| where Category == "AzureFirewallNetworkRule"
| where msg_s has "Deny"
| parse msg_s with * "from " SourceIP ":" SourcePort
    " to " DestIP ":" DestPort ". Action: " Action
| join kind=inner (
    ThreatIntelligenceIndicator
    | where Active == true
    | project NetworkIP, ThreatType, ConfidenceScore
) on $left.DestIP == $right.NetworkIP
| project TimeGenerated, SourceIP, DestIP, ThreatType, ConfidenceScore

		

Microsoft Security rules — auto-create incidents from Defender alerts

These automatically promote Defender for Cloud, Defender for Endpoint, and Defender for Identity alerts into Sentinel incidents with no KQL needed.

Fusion rules — ML-based multi-stage attack detection

Fusion uses machine learning to correlate low-severity signals across multiple products that individually look benign but together indicate an attack:

			
Signal 1: Entra ID — suspicious sign-in from anonymising proxy
Signal 2: Office 365 — mass email forwarding rule created
Signal 3: Azure — new service principal with owner role
Individual signals: low severity, easy to miss
Fusion correlation: HIGH severity — likely BEC (Business Email Compromise) attack

		

Anomaly rules — baseline + deviation detection

Sentinel builds behavioural baselines and alerts on deviations:

Unusual volume of data downloaded by a user
Login at an unusual time of day for this account
Process execution pattern not seen before on this host

Pillar 3 — Investigate

Incidents

Every triggered analytics rule creates an alert. Sentinel groups related alerts into incidents — the unit of work for a SOC analyst:

			
Incident: Possible BEC attack — john.smith@contoso.com
  Severity: High
  Status:   New
  Assigned: SOC Analyst 2
  Alerts:
    ├── Impossible travel detected (Entra ID)
    ├── Mass forwarding rule created (Office 365)
    └── New privileged service principal (Azure Activity)
  Entities:
    ├── User: john.smith@contoso.com
    ├── IP: 185.220.101.45 (Tor exit node)
    └── Host: LAPTOP-JSmith
  MITRE ATT&CK:
    ├── T1078 — Valid accounts
    ├── T1114 — Email collection
    └── T1098 — Account manipulation

		

Investigation graph

A visual relationship map automatically built from incident entities — shows how a user, IP, host, and mailbox are connected without manual correlation:

			
185.220.101.45 (Tor IP)
        ↓ signed in as
john.smith@contoso.com (user)
        ↓ created
Forward-all-mail rule (Office 365)
        ↓ same session created
sp-finance-automation (service principal)
        ↓ granted
Owner role on subscription

		

Entity pages

Every entity (user, IP, host, app) gets a timeline page showing all activity across all data sources — 90 days of context assembled automatically:

			
User: john.smith@contoso.com
  Last 90 days:
  ├── Sign-ins: 847 (normal pattern: Mon-Fri 8am-6pm EST)
  ├── Anomalous sign-ins: 3 (Tor, Russia, Ukraine)
  ├── Files accessed: 12,847
  ├── Emails sent: 2,341
  ├── Azure resource operations: 156
  └── Risk score: 94/100 (UEBA)

		

Pillar 4 — Respond (SOAR)

Playbooks are Azure Logic Apps triggered automatically when an incident is created or updated. They automate the first-response actions that would otherwise require a human:

Playbook 1 — Block compromised user automatically

			
Trigger: Sentinel incident created
  Condition: Severity == High AND Entity type == User
Actions:
  1. Get user details from Entra ID
  2. Disable user account in Entra ID
  3. Revoke all active sessions (MFA re-auth required)
  4. Send Teams message to SOC channel:
     "User john.smith auto-disabled — incident #1234"
  5. Create ServiceNow ticket with incident details
  6. Add comment to Sentinel incident:
     "User account disabled at 14:32 UTC by playbook"

		

Playbook 2 — Isolate compromised VM

			
Trigger: Sentinel incident created
  Condition: Severity == High AND Entity type == Host
Actions:
  1. Get VM resource ID from entity
  2. Apply isolation NSG (deny all inbound + outbound except Bastion)
     az network nsg rule create --name ISOLATE --priority 100
       --access Deny --direction Inbound --source-address-prefix *
  3. Take VM disk snapshot (forensic preservation)
  4. Tag VM: {"Status": "Isolated", "IncidentId": "1234"}
  5. Notify SOC team via email + Teams
  6. Create Jira ticket for IR team

		

Playbook 3 — Enrich IP with threat intelligence

			
Trigger: Sentinel alert contains IP entity
Actions:
  1. Query VirusTotal API for IP reputation
  2. Query Shodan for open ports and services
  3. Query AbuseIPDB for abuse reports
  4. Add enrichment comment to incident:
     "IP 185.220.101.45:
      VirusTotal: 47/92 vendors flagged malicious
      AbuseIPDB: 847 reports, 100% confidence malicious
      Shodan: Tor exit node — AS16276 OVH"
  5. If malicious score > 80:
     → Add IP to Azure Firewall deny list automatically

		

KQL — The Query Language of Sentinel

Everything in Sentinel is queried with KQL (Kusto Query Language):

			
// Find all failed logins followed by success from same IP
// (credential stuffing pattern)
let failed_logins = SigninLogs
    | where TimeGenerated > ago(1h)
    | where ResultType != 0  // failed
    | summarize FailCount = count() by IPAddress, UserPrincipalName
    | where FailCount > 10;
let successful_logins = SigninLogs
    | where TimeGenerated > ago(1h)
    | where ResultType == 0  // success
    | project IPAddress, UserPrincipalName, SuccessTime = TimeGenerated;
successful_logins
| join kind=inner failed_logins on IPAddress
| project IPAddress, UserPrincipalName,
          FailCount, SuccessTime
| order by FailCount desc

		

			
// Detect Azure privilege escalation — new owner role assignment
AzureActivity
| where TimeGenerated > ago(1d)
| where OperationNameValue == "MICROSOFT.AUTHORIZATION/ROLEASSIGNMENTS/WRITE"
| where ActivityStatusValue == "Success"
| extend RoleDefinitionId = tostring(
    parse_json(Properties).requestbody.properties.roleDefinitionId)
| where RoleDefinitionId contains "8e3af657-a8ff-443c-a75c-2fe8c4bcb635"  // Owner
| project TimeGenerated, Caller, ResourceGroup,
          SubscriptionId, RoleDefinitionId

		

			
// Hunt for lateral movement via PsExec or WMI
SecurityEvent
| where TimeGenerated > ago(7d)
| where EventID in (4688, 4624)  // process create + logon
| where ProcessName has_any ("psexec", "wmic", "winrm")
    or CommandLine has_any ("\\\\", "invoke-wmimethod", "wmiexec")
| summarize count() by Computer, Account, ProcessName, CommandLine
| order by count_ desc

		

MITRE ATT&CK Integration

Sentinel maps every analytics rule to MITRE ATT&CK tactics and techniques — giving you a visual coverage matrix:

Tactic	Example Technique	Sentinel Detection
Initial Access	T1078 Valid Accounts	Impossible travel rule
Persistence	T1098 Account Manipulation	New owner role assignment
Privilege Escalation	T1134 Token Impersonation	Service principal abuse
Defence Evasion	T1562 Impair Defences	Diagnostic setting deleted
Credential Access	T1110 Brute Force	Failed login threshold
Lateral Movement	T1021 Remote Services	PsExec / WMI detection
Exfiltration	T1048 Exfil over Alt Protocol	Large blob download
Impact	T1486 Data Encrypted	Ransomware file extension

Sentinel in Hub and Spoke Context

In an enterprise hub and spoke topology, Sentinel sits at the subscription/tenant level — above the network, collecting from everything:

			
Microsoft Sentinel (Log Analytics Workspace)
         ↑
         │ data connectors
    ┌────┴──────────────────────────────────┐
    │                                       │
Hub VNet                             Spoke VNets
 Azure Firewall logs                  AKS/ARO audit logs
 VPN Gateway logs                     VM security events
 Bastion session logs                 NSG flow logs
 DNS resolver logs                    App Gateway WAF logs
    │
On-premises (via MMA/AMA agent)
 Windows Security Events
 Linux Syslog
 Network device CEF

		

Sentinel vs Defender for Cloud

	Microsoft Sentinel	Defender for Cloud
Type	SIEM + SOAR	CSPM + CWPP
Focus	Threat detection + response	Posture management + workload protection
Scope	Cross-tenant, multi-cloud	Azure resources + connected clouds
Data	All log sources	Azure resource configuration + telemetry
Output	Incidents + playbooks	Recommendations + alerts
Use together	Defender feeds alerts into Sentinel	Sentinel adds SOAR response to Defender alerts

They are designed to work together — Defender for Cloud detects threats at the resource level and feeds high-fidelity alerts into Sentinel, which correlates them with signals from every other source and automates the response.

Pricing Model

Sentinel pricing has two components:

Log Analytics ingestion — pay per GB ingested:

Pay-as-you-go: ~$2.76/GB
Commitment tiers: 100 GB/day → 500 GB/day → lower per-GB rate

Sentinel capacity reservation — flat daily rate above the free Log Analytics tier:

First 10 GB/day per workspace: free
Above 10 GB/day: ~$100–$400/day depending on tier

Free data sources — no ingestion charge for:

Microsoft Defender alerts
Entra ID audit + sign-in logs (Basic SKU)
Azure Activity logs
Office 365 management activity

Key Takeaway

Microsoft Sentinel is the security brain of your Azure estate — it ingests logs from every corner of your infrastructure (Azure, Microsoft 365, on-premises, third-party), correlates signals using AI and KQL-based rules, groups related alerts into actionable incidents mapped to MITRE ATT&CK, and automates first-response actions through Logic App playbooks. In a hub and spoke network, it sits above the topology collecting from every layer — firewall, gateway, Bastion, ARO, VMs, and on-premises — giving your SOC a single pane of glass across the entire estate.

Understanding ARO’s Kubernetes API Operations

May 1, 2026May 1, 2026 techhadoop AKS, azure, OCP ai, cloud, devops, kubernetes, technology

Kubernetes API Operations Through the ARO Private Endpoint

Every interaction with an ARO cluster — whether from a human, a tool, or an automated controller — flows through a single TCP connection to port 6443 on the API server private endpoint. The API server is the absolute centre of gravity for all cluster operations.

Every Operation Is a REST Call

The Kubernetes API server exposes a RESTful HTTP/2 API over TLS. Every tool — kubectl, oc, operators, kubelet — translates its work into one of five HTTP verbs against a resource path:

			
GET     /api/v1/namespaces/payments/pods          list pods
GET     /api/v1/namespaces/payments/pods/web-1    get single pod
POST    /api/v1/namespaces/payments/pods          create pod
PUT     /api/v1/namespaces/payments/pods/web-1    replace pod
PATCH   /api/v1/namespaces/payments/pods/web-1    partial update
DELETE  /api/v1/namespaces/payments/pods/web-1    delete pod
GET     /api/v1/namespaces/payments/pods?watch=1  watch stream

		

Every one of these travels as TLS-encrypted HTTP/2 to 10.1.0.8:6443.

Category 1 — Human CLI Operations (kubectl + oc)

kubectl — standard Kubernetes operations

			
# Every one of these becomes a REST call through the private endpoint
# LIST pods → GET /api/v1/namespaces/default/pods
kubectl get pods -n payments
# CREATE deployment → POST /apps/v1/namespaces/payments/deployments
kubectl apply -f deployment.yaml
# EXEC into pod → POST + UPGRADE to SPDY/WebSocket
kubectl exec -it web-1 -- /bin/bash
# PORT-FORWARD → POST + WebSocket tunnel
kubectl port-forward svc/my-app 8080:80
# LOGS → GET /api/v1/namespaces/payments/pods/web-1/log
kubectl logs web-1 --follow
# WATCH resources → GET with ?watch=1 (long-lived streaming connection)
kubectl get pods --watch

		

oc CLI — OpenShift-specific additions

oc wraps kubectl completely and adds calls to OpenShift-specific API groups:

			
# OpenShift Route → POST /apis/route.openshift.io/v1/namespaces/.../routes
oc expose svc/my-app
# Project (OpenShift namespace wrapper)
# → POST /apis/project.openshift.io/v1/projectrequests
oc new-project my-team
# ImageStream → GET /apis/image.openshift.io/v1/namespaces/.../imagestreams
oc get imagestreams
# BuildConfig → POST /apis/build.openshift.io/v1/namespaces/.../builds
oc start-build my-app
# DeploymentConfig (legacy OpenShift resource)
# → GET /apis/apps.openshift.io/v1/namespaces/.../deploymentconfigs
oc rollout latest dc/my-app
# SCC inspection → GET /apis/security.openshift.io/v1/securitycontextconstraints
oc get scc

		

Category 2 — Operators and Controllers

Operators are long-running processes inside the cluster that maintain perpetual watch connections to the API server — the busiest category of API consumers by connection count.

The watch loop — how operators work

			
// Every operator runs this pattern against the API server
// Connection: persistent HTTP/2 stream to 10.1.0.8:6443
// 1. LIST — get current state (one-time at startup)
GET /apis/apps/v1/namespaces/payments/deployments
→ Returns: all deployments + resourceVersion: 48291
// 2. WATCH — subscribe to changes (permanent long-poll)
GET /apis/apps/v1/namespaces/payments/deployments?watch=1&resourceVersion=48291
→ Server keeps connection open indefinitely
→ Pushes events as they occur:
   {"type":"MODIFIED","object":{"metadata":{"name":"web"},...}}
   {"type":"ADDED","object":{"metadata":{"name":"worker"},...}}
   {"type":"DELETED","object":{"metadata":{"name":"old"},...}}
// 3. RECONCILE — when event received, fix actual → desired state
PATCH /apis/apps/v1/namespaces/payments/replicasets/web-abc
→ Creates/deletes pods to match desired replicas
// 4. STATUS UPDATE — write observed state back
PATCH /apis/apps/v1/namespaces/payments/deployments/web/status
→ {"observedGeneration": 5, "availableReplicas": 3}

		

Built-in OpenShift operators that run this loop continuously

Operator	What it watches	What it does
`openshift-apiserver-operator`	`apiservers.config.openshift.io`	Manages API server config and certs
`cluster-version-operator`	`clusterversions.config.openshift.io`	Drives cluster upgrades
`machine-config-operator`	`machineconfigs`, `machineconfigpools`	Applies RHCOS config to nodes
`ingress-operator`	`ingresses.config.openshift.io`	Manages router deployments
`dns-operator`	`dnses.config.openshift.io`	Manages CoreDNS config
`network-operator`	`networks.config.openshift.io`	Manages OVN-Kubernetes
`image-registry-operator`	`configs.imageregistry.operator.openshift.io`	Manages internal registry
`authentication-operator`	`authentications.config.openshift.io`	Manages OAuth server

Every one of these has persistent watch connections open to the API server at all times — a healthy ARO cluster typically has 40–80 active watch streams running 24/7.

Category 3 — Kubelet (Node Agent)

Every worker node runs a kubelet process that maintains its own connection to the API server — reporting node health and receiving pod assignments:

			
Worker node kubelet → 10.1.0.8:6443
Outbound (kubelet → API server):
  POST /api/v1/nodes/worker-1/status          every 10 seconds — node heartbeat
  PATCH /api/v1/namespaces/app/pods/web-1/status  when pod state changes
  POST /api/v1/events                         kubelet events (OOM, image pull)
Inbound (API server → kubelet port 10250):
  GET  https://worker-1:10250/exec/...        kubectl exec forwarding
  GET  https://worker-1:10250/log/...         kubectl logs forwarding
  GET  https://worker-1:10250/metrics         Prometheus scraping

		

If the kubelet loses its connection to the API server for more than the node-monitor-grace-period (default 40 seconds), the node is marked NotReady and pods begin eviction.

Category 4 — CI/CD Pipelines

Self-hosted CI/CD runners inside the VNet authenticate to the API server using a service account token:

			
# Service account for CI/CD — scoped to specific namespace
apiVersion: v1
kind: ServiceAccount
metadata:
  name: cicd-deployer
  namespace: payments
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: deployer
  namespace: payments
rules:
  - apiGroups: ["apps"]
    resources: ["deployments", "replicasets"]
    verbs: ["get", "list", "create", "update", "patch"]
  - apiGroups: [""]
    resources: ["pods", "services", "configmaps"]
    verbs: ["get", "list", "create", "update", "patch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: cicd-deployer-binding
  namespace: payments
roleRef:
  kind: Role
  name: deployer
subjects:
  - kind: ServiceAccount
    name: cicd-deployer
    namespace: payments

		

GitHub Actions pipeline using this service account:

			
- name: Deploy to ARO
  run: |
    # Authenticate with service account token — all traffic to 10.1.0.8:6443
    oc login ${{ secrets.ARO_API_URL }} \
      --token ${{ secrets.CICD_SA_TOKEN }}
    # Each command = REST call through private endpoint
    oc set image deployment/web \
      web=acrprod.azurecr.io/my-app:${{ github.sha }} \
      -n payments
    oc rollout status deployment/web -n payments

		

Category 5 — Admission Webhooks

Admission webhooks add an external hop during the API server request pipeline — the API server calls out to your webhook service before persisting any object:

			
kubectl apply -f pod.yaml
        ↓
API server receives POST /api/v1/namespaces/payments/pods
        ↓
Authn + RBAC pass
        ↓
Mutating admission webhook:
  API server → POST https://gatekeeper-webhook.gatekeeper-system.svc:443/mutate
  Webhook adds labels, sets resource limits, injects sidecars
  → Returns mutated pod spec
        ↓
Validating admission webhook:
  API server → POST https://gatekeeper-webhook.gatekeeper-system.svc:443/validate
  Checks policy: must have resource limits, no root, valid image registry
  → Returns: allowed: true (or denied with reason)
        ↓
Persist to etcd → notify watchers → return 201 Created

		

Common admission webhooks in ARO:

Webhook	Purpose
OPA Gatekeeper	Policy enforcement — block non-compliant resources
Kyverno	Policy as code — mutate, validate, generate
Istio / OpenShift Service Mesh	Inject Envoy sidecar into pods automatically
Red Hat ACM	Multi-cluster governance policies
Cert-manager	Inject TLS certificates into resources

Category 6 — Monitoring and Observability

			
# Prometheus scrapes API server metrics via the API endpoint
GET https://10.1.0.8:6443/metrics
# Returns: apiserver_request_total, apiserver_request_duration_seconds,
#          etcd_request_duration_seconds, workqueue_depth, ...
# Health endpoints checked by Azure ARO service monitor
GET https://10.1.0.8:6443/healthz    → "ok"
GET https://10.1.0.8:6443/readyz     → "ok"
GET https://10.1.0.8:6443/livez      → "ok"
# OpenShift console reads cluster state continuously
GET /apis/config.openshift.io/v1/clusterversions/version
GET /api/v1/namespaces?limit=500
GET /apis/project.openshift.io/v1/projects

		

The Request Pipeline — What Happens Inside

Every request through the private endpoint traverses this exact pipeline inside kube-apiserver:

			
TLS handshake on 10.1.0.8:6443
        ↓
1. AUTHENTICATION — who are you?
   • OIDC token (Entra ID) → extract user + groups
   • x509 client cert → extract CN as username
   • Bearer token → look up service account
   • Failure → 401 Unauthorized
2. AUTHORIZATION (RBAC) — are you allowed?
   • Check: user + groups + verb + resource + namespace
   • ClusterRoleBinding / RoleBinding lookup
   • OpenShift SCC evaluation for pods
   • Failure → 403 Forbidden
3. ADMISSION CONTROL — is this allowed by policy?
   • Mutating webhooks (modify the object)
   • Built-in admission plugins (ResourceQuota, LimitRanger)
   • Validating webhooks (accept or reject)
   • Failure → 400/403 with reason
4. VALIDATION — is the object schema correct?
   • OpenAPI schema validation
   • CRD schema validation
   • Field immutability checks
   • Failure → 422 Unprocessable Entity
5. PERSIST TO etcd
   • Serialise to protobuf
   • Encrypt at rest (AES-GCM, ARO managed)
   • Write to etcd with optimistic concurrency (resourceVersion)
   • Failure → 409 Conflict (resourceVersion mismatch)
6. NOTIFY WATCHERS
   • Push event to all active watch streams matching the resource
   • Controllers, operators, scheduler, kubelet all receive notification
7. RETURN RESPONSE
   • 200 OK (GET)
   • 201 Created (POST)
   • 200 OK with updated object (PATCH/PUT)
   • 404 Not Found
   • Streaming response for watch/exec/logs/port-forward

		

API Groups — Kubernetes vs OpenShift

The API server serves two parallel API surfaces — Kubernetes core APIs and OpenShift extension APIs — all through the same 10.1.0.8:6443 endpoint:

			
Kubernetes core APIs:
  /api/v1/                          pods, services, configmaps, secrets, nodes
  /apis/apps/v1/                    deployments, replicasets, statefulsets, daemonsets
  /apis/batch/v1/                   jobs, cronjobs
  /apis/rbac.authorization.k8s.io/  clusterroles, rolebindings
  /apis/storage.k8s.io/             storageclasses, persistentvolumes
  /apis/networking.k8s.io/          ingresses, networkpolicies
OpenShift extension APIs:
  /apis/route.openshift.io/         routes (OpenShift ingress primitive)
  /apis/project.openshift.io/       projects (namespace + RBAC wrapper)
  /apis/build.openshift.io/         buildconfigs, builds
  /apis/image.openshift.io/         imagestreams, imagestreamtags
  /apis/apps.openshift.io/          deploymentconfigs (legacy)
  /apis/security.openshift.io/      securitycontextconstraints
  /apis/config.openshift.io/        cluster-wide config (DNS, network, auth)
  /apis/operator.openshift.io/      operator configuration resources
  /apis/machine.openshift.io/       machines, machinesets (MachineAPI)

		

Key Takeaway

The ARO API server private endpoint at 10.1.0.8:6443 is not just the entry point for human CLI commands — it is the nervous system of the entire cluster. Every automated process — the 40+ built-in OpenShift operators maintaining cluster state, every kubelet heartbeating from every worker node every 10 seconds, every CI/CD deployment, every admission webhook validation, every Prometheus health check — flows through this single TLS endpoint. Making it private eliminates the internet attack surface entirely, while the seven-stage request pipeline inside the API server ensures every operation is authenticated, authorised, policy-checked, validated, and durably persisted before any response is returned.

Understanding ARO’s Private DNS Zones Setup

May 1, 2026May 1, 2026 techhadoop azure azure, cloud, microsoft, security, technology

Private DNS Zones Created by ARO

When you deploy a private ARO cluster, Azure automatically creates two private DNS zones in the ARO-managed resource group — one for the API server and one for application ingress. You own neither; they are managed by the ARO service, but you are responsible for linking them to every VNet that needs to resolve them.

The Two Zones ARO Creates

ARO creates both zones inside the ARO managed resource group — the resource group whose name starts with aro- that Azure creates automatically alongside your cluster. You cannot modify or delete these zones without breaking the cluster.

Zone 1 — API Server

			
Zone name:   cluster.<unique-id>.<region>.aroapp.io
Example:     cluster.a1b2c3d4.eastus.aroapp.io
A records:
  api   →   10.1.0.8   (private endpoint NIC IP)
Full FQDN:   api.cluster.a1b2c3d4.eastus.aroapp.io:6443

		

Zone 2 — Application Ingress

Zone name:   cluster.<unique-id>.<region>.aroapp.io
             (same parent zone, different record)

A records:
  *.apps   →   10.1.1.100   (internal load balancer frontend IP)

Example resolutions:
  my-app.apps.cluster.a1b2c3d4.eastus.aroapp.io  →  10.1.1.100
  console.apps.cluster.a1b2c3d4.eastus.aroapp.io →  10.1.1.100
  grafana.apps.cluster.a1b2c3d4.eastus.aroapp.io →  10.1.1.100

Inspecting the Zones After Deployment

# Get the ARO managed resource group
MANAGED_RG=$(az aro show \
  --resource-group rg-aro \
  --name aro-prod \
  --query clusterProfile.resourceGroupId \
  --output tsv | xargs basename)

# List all private DNS zones ARO created
az network private-dns zone list \
  --resource-group $MANAGED_RG \
  --query "[].{Zone:name, Records:numberOfRecordSets}" \
  --output table

# Output:
# Zone                                          Records
# ─────────────────────────────────────────── ─────────
# cluster.a1b2c3d4.eastus.aroapp.io           4

# List all A records in the zone
az network private-dns record-set a list \
  --resource-group $MANAGED_RG \
  --zone-name cluster.a1b2c3d4.eastus.aroapp.io \
  --output table

# Output:
# Name     TTL    ARecords
# ──────   ────   ──────────────────────
# api      300    [{'ipv4Address': '10.1.0.8'}]
# *.apps   300    [{'ipv4Address': '10.1.1.100'}]

# List VNet links on the zone
az network private-dns link vnet list \
  --resource-group $MANAGED_RG \
  --zone-name cluster.a1b2c3d4.eastus.aroapp.io \
  --output table

# Output:
# Name                    VirtualNetwork          RegistrationEnabled
# ─────────────────────── ──────────────────────  ───────────────────
# aro-spoke-vnet-link     aro-spoke-vnet          false   ← auto-created

The VNet Linking Problem

This is the most common post-deployment mistake. ARO automatically links the private DNS zone to only one VNet — the ARO spoke VNet. Every other VNet that needs to resolve the API server or app routes must be manually linked:

ZONE_NAME="cluster.a1b2c3d4.eastus.aroapp.io"

# Link to hub VNet (required for jump host, Bastion, CI/CD runners)
az network private-dns link vnet create \
  --resource-group $MANAGED_RG \
  --zone-name $ZONE_NAME \
  --name "link-to-hub-vnet" \
  --virtual-network $(az network vnet show \
      --resource-group rg-hub \
      --name hub-vnet \
      --query id -o tsv) \
  --registration-enabled false

# Link to other spoke VNets if they need to call ARO-hosted APIs
az network private-dns link vnet create \
  --resource-group $MANAGED_RG \
  --zone-name $ZONE_NAME \
  --name "link-to-spoke2-vnet" \
  --virtual-network $(az network vnet show \
      --resource-group rg-spoke2 \
      --name spoke2-vnet \
      --query id -o tsv) \
  --registration-enabled false

registration-enabled false is correct here — you are linking for resolution only, not to auto-register VM hostnames into the zone.

On-Premises DNS Conditional Forwarding

On-premises DNS servers cannot be linked to Azure private DNS zones directly — they resolve through the DNS Private Resolver inbound endpoint using conditional forwarding:

			
On-premises Windows DNS Server:
  Conditional Forwarder:
    Domain:      aroapp.io
    Forward to:  10.0.5.4  (DNS Private Resolver inbound endpoint)
On-premises BIND (Linux):
  zone "aroapp.io" {
      type forward;
      forwarders { 10.0.5.4; };
  };

		

With this in place, an on-premises developer running oc login gets the full resolution chain:

			
oc login https://api.cluster.a1b2c3d4.eastus.aroapp.io:6443
Laptop DNS → corp DNS server
Corp DNS: aroapp.io → forward to 10.0.5.4
DNS Private Resolver checks linked private DNS zones
Finds: api.cluster.a1b2c3d4.eastus.aroapp.io → 10.1.0.8
Returns 10.1.0.8 to laptop
oc connects to 10.1.0.8:6443 via ExpressRoute / VPN tunnel
Login succeeds ✅

		

OpenShift Console DNS

The OpenShift web console runs as an application on the cluster, so it resolves through the *.apps wildcard record:

# Get console URL
az aro show \
  --resource-group rg-aro \
  --name aro-prod \
  --query consoleProfile.url \
  --output tsv

# Output:
# https://console-openshift-console.apps.cluster.a1b2c3d4.eastus.aroapp.io

# DNS resolution:
# console-openshift-console.apps.cluster.a1b2c3d4.eastus.aroapp.io
#   → matched by *.apps wildcard A record
#   → returns 10.1.1.100 (internal LB)
#   → browser connects via VPN/ER or Bastion proxy

Custom Domain — Replacing aroapp.io

If you want your own domain (e.g. openshift.contoso.com) instead of aroapp.io, you create a custom private DNS zone and manage the records yourself:

# Create your own private DNS zone
az network private-dns zone create \
  --resource-group rg-aro-network \
  --name openshift.contoso.com

# Add API server A record
az network private-dns record-set a add-record \
  --resource-group rg-aro-network \
  --zone-name openshift.contoso.com \
  --record-set-name api \
  --ipv4-address 10.1.0.8

# Add wildcard apps A record
az network private-dns record-set a add-record \
  --resource-group rg-aro-network \
  --zone-name openshift.contoso.com \
  --record-set-name "*.apps" \
  --ipv4-address 10.1.1.100

# Link to all VNets
az network private-dns link vnet create \
  --resource-group rg-aro-network \
  --zone-name openshift.contoso.com \
  --name link-hub \
  --virtual-network /subscriptions/.../hub-vnet \
  --registration-enabled false

Then update the ARO cluster to use the custom domain during deployment:

az aro create \
  --resource-group rg-aro \
  --name aro-prod \
  --vnet aro-spoke-vnet \
  --master-subnet master-subnet \
  --worker-subnet worker-subnet \
  --apiserver-visibility Private \
  --ingress-visibility Private \
  --domain openshift.contoso.com \    # ← custom domain
  --pull-secret @pull-secret.txt

With a custom domain the API server becomes api.openshift.contoso.com and apps become *.apps.openshift.contoso.com — owned and managed entirely by you, with no dependency on aroapp.io.

Key Takeaway

ARO automatically creates a private DNS zone under aroapp.io with two critical records — api pointing to the API server private endpoint IP and *.apps pointing to the internal load balancer frontend IP. The zone is auto-linked only to the ARO spoke VNet — you must manually link it to the hub VNet, any other spoke VNets, and configure on-premises DNS conditional forwarding to the DNS Private Resolver for the complete name resolution chain to work end-to-end across your hub and spoke estate.

Understanding ARO API Server Private Endpoint IPs

May 1, 2026May 1, 2026 techhadoop azure azure, cloud, microsoft, security, technology

ARO API Server Private Endpoint IP

The ARO API server private endpoint IP is a private IP address automatically assigned by Azure from your master subnet’s address space when the cluster is created — it becomes the sole network entry point for all Kubernetes API traffic in a private cluster.

How the IP Is Assigned — the Full Mechanism

Azure subnet IP allocation order

Every Azure subnet reserves the first 5 IPs unconditionally:

			
1.0.0   — Network address (unusable)
1.0.1   — Default gateway
1.0.2   — Azure DNS
1.0.3   — Azure future use
1.0.4   — Broadcast address
─────────────────────────────────
1.0.5   → First assignable IP

		

After cluster provisioning, the master subnet fills up in this order:

			
1.0.5   → Master node 1 VM NIC (AZ1)
1.0.6   → Master node 2 VM NIC (AZ2)
1.0.7   → Master node 3 VM NIC (AZ3)
1.0.8   → API server private endpoint NIC  ← auto-assigned
1.0.9   → Internal LB health probe IP
1.0.10+ → Future ARO platform components

		

The exact IP depends on provisioning order — Azure assigns the next available IP dynamically. You cannot pre-specify it, but once assigned it is static for the lifetime of the cluster.

The Private Endpoint NIC in Detail

The private endpoint is an Azure resource called a Private Endpoint — distinct from the VM NICs of the master nodes. You can inspect it:

			
# Find the ARO managed resource group (contains cluster infrastructure)
MANAGED_RG=$(az aro show \
  --resource-group rg-aro \
  --name aro-prod \
  --query clusterProfile.resourceGroupId -o tsv)
# List private endpoints in the managed resource group
az network private-endpoint list \
  --resource-group $MANAGED_RG \
  --output table
# Output:
# Name                          ResourceGroup        Location
# ───────────────────────────── ──────────────────── ─────────
# aro-prod-pe-apiserver         aro-prod-cluster-rg  eastus
# Get the private IP
az network private-endpoint show \
  --resource-group $MANAGED_RG \
  --name aro-prod-pe-apiserver \
  --query 'customDnsConfigs[0].ipAddresses[0]' \
  --output tsv
# Output: 10.1.0.8

		

What the Private Endpoint NIC Actually Is

The private endpoint is not a VM — it is a read-only synthetic NIC injected into your subnet by Azure’s network fabric. It has no OS, no compute, no management plane — it is purely a network construct:

			
Private Endpoint Resource
├── Name:         aro-prod-pe-apiserver
├── Type:         Microsoft.Network/privateEndpoints
├── NIC IP:       10.1.0.8 (from master subnet)
├── Target:       ARO API server internal load balancer
│                 (in Microsoft-managed ARO infrastructure)
├── Protocol:     TCP
├── Port:         6443
├── Managed by:   Microsoft / Red Hat (not customer)
└── Deletable:    No — deleting breaks the cluster

		

Traffic arriving at 10.1.0.8:6443 is forwarded over Azure’s private backbone to the actual API server processes running on the master nodes — the customer never sees or touches the internal path.

How DNS Wires the IP to the FQDN

ARO automatically creates a Private DNS Zone and inserts an A record pointing the API server FQDN to the private endpoint IP:

			
# Find the private DNS zone
az network private-dns zone list \
  --resource-group $MANAGED_RG \
  --query "[].name" -o tsv
# Output:
# cluster.eastus.aroapp.io
# Inspect the A records
az network private-dns record-set a list \
  --resource-group $MANAGED_RG \
  --zone-name cluster.eastus.aroapp.io \
  --output table
# Output:
# Name    TTL    Records
# ─────   ────   ──────────
# api     300    10.1.0.8
# *.apps  300    10.1.1.100

		

The DNS zone is linked to the ARO spoke VNet automatically. You must manually link it to any other VNet (hub VNet, other spokes) that needs to resolve it:

			
# Link private DNS zone to hub VNet
az network private-dns link vnet create \
  --resource-group $MANAGED_RG \
  --zone-name cluster.eastus.aroapp.io \
  --name link-hub-vnet \
  --virtual-network /subscriptions/.../resourceGroups/rg-hub/providers/
                    Microsoft.Network/virtualNetworks/hub-vnet \
  --registration-enabled false
# Verify resolution from jump host
nslookup api.cluster.eastus.aroapp.io 10.0.5.4
# Server:  10.0.5.4 (DNS Private Resolver)
# Address: 10.1.0.8  ← private endpoint IP returned ✅

		

Getting the API Server URL and IP Programmatically

			
# Get the full API server URL
API_URL=$(az aro show \
  --resource-group rg-aro \
  --name aro-prod \
  --query apiserverProfile.url \
  --output tsv)
echo $API_URL
# https://api.cluster.eastus.aroapp.io:6443
# Extract just the hostname
API_HOST=$(echo $API_URL | sed 's|https://||' | sed 's|:6443||')
echo $API_HOST
# api.cluster.eastus.aroapp.io
# Resolve to private IP (from inside VNet or connected network)
dig +short $API_HOST
# 10.1.0.8
# Verify TCP reachability on port 6443
nc -zv $API_HOST 6443
# Connection to api.cluster.eastus.aroapp.io 6443 port [tcp] succeeded!
# Login using oc
CREDS=$(az aro list-credentials \
  --resource-group rg-aro \
  --name aro-prod)
oc login $API_URL \
  --username $(echo $CREDS | jq -r .kubeadminUsername) \
  --password $(echo $CREDS | jq -r .kubeadminPassword) \
  --insecure-skip-tls-verify=false

		

What Happens If You Try to Reach It From the Internet

The private endpoint IP (10.1.0.8) is a RFC 1918 private address — it is not routable on the public internet. From outside Azure, two things happen:

			
Scenario 1 — Public DNS lookup:
  nslookup api.cluster.eastus.aroapp.io (from internet)
  → Returns NXDOMAIN or no answer
  → Public DNS has no record (zone is private only)
Scenario 2 — Direct TCP to port 6443:
  curl https://api.cluster.eastus.aroapp.io:6443 (from internet)
  → DNS fails → connection never established
  → Even if DNS were somehow resolved, 10.1.0.8 is unreachable
     from internet — packets dropped at Azure network edge

		

There is no public IP, no public DNS record, and no network path from the internet to the private endpoint — the attack surface is zero.

IP Stability — Does It Ever Change?

Once assigned at cluster creation time, the private endpoint IP is permanent for the cluster lifetime:

			
Cluster created:   10.1.0.8 assigned to API private endpoint
Cluster running:   10.1.0.8 (unchanged — days, months, years)
Master node reboot: 10.1.0.8 (private endpoint NIC is independent of master VMs)
ARO version upgrade: 10.1.0.8 (control plane upgrades don't change the PE IP)
Cluster deleted:   10.1.0.8 released back to subnet pool

		

This stability is intentional — UDRs, NSG rules, firewall rules, and DNS records all reference this IP. If it changed, every network policy referencing it would break. Azure guarantees it for the cluster lifetime without any reservation or static IP configuration needed on your part.

Key Takeaway

The ARO API server private endpoint IP is the next available IP after the master node NICs in your master subnet — automatically assigned by Azure during cluster provisioning, registered in a private DNS zone under aroapp.io, and permanently stable for the cluster lifetime. It is a synthetic NIC with no compute behind it — just a network fabric construct that forwards TCP port 6443 traffic over Azure’s private backbone to the actual API server processes on the master nodes. From the public internet it is completely invisible — no DNS record, no routable IP, no open port.

Benefits of a Private ARO Cluster in Azure

April 30, 2026April 30, 2026 techhadoop azure, kubernetes azure, cloud, microsoft, security, technology

Private ARO Cluster

A private ARO cluster removes all public IP addresses from both the Kubernetes API server and the ingress router — making the cluster completely unreachable from the internet. Every connection to the cluster must travel over Azure’s private network backbone via VNet peering, ExpressRoute, or VPN.

Public vs Private ARO — What Changes

Component	Public cluster	Private cluster
API server endpoint	Public IP + DNS	Private endpoint IP only
Ingress router	Public load balancer	Internal load balancer
Worker node IPs	Private (always)	Private (always)
Master node IPs	Private (always)	Private (always)
Access method	Any internet browser	VPN / ER / Bastion only
DNS resolution	Public DNS	Private DNS zone
Attack surface	API port 6443 exposed	Zero public exposure

How the API Server Is Hidden — The Mechanism

When you deploy a private ARO cluster, Azure does three things automatically:

1. API Server gets a Private Endpoint NIC

Instead of a public load balancer frontend, the API server is exposed exclusively through a private endpoint — a NIC in your VNet subnet with a private IP:

Public cluster:
  api.cluster.eastus.aroapp.io → 20.x.x.x (public IP)
  Anyone on internet can reach :6443

Private cluster:
  api.cluster.eastus.aroapp.io → 10.1.3.4 (private IP in your VNet)
  Only reachable from within the VNet or peered networks

The private endpoint is deployed into your ARO master subnet automatically during cluster creation. No public IP is allocated.

2. A Private DNS Zone Is Created Automatically

ARO creates a private DNS zone linked to your VNet so the API server FQDN resolves to the private endpoint IP:

Private DNS zone: cluster.eastus.aroapp.io
  A record: api → 10.1.3.4
  A record: *.apps → 10.1.4.8   (ingress internal LB)

Linked to: ARO spoke VNet + hub VNet

This means any VM in a peered VNet can resolve api.cluster.eastus.aroapp.io and get 10.1.3.4 — no public DNS lookup ever occurs.

3. Ingress Router Gets an Internal Load Balancer

The OpenShift ingress router (which handles *.apps.cluster.aroapp.io routes) is fronted by an Azure Internal Load Balancer with a private frontend IP:

Public cluster:   *.apps → Azure Public LB → 20.x.x.x
Private cluster:  *.apps → Azure Internal LB → 10.1.4.8

Applications running on the cluster are only reachable from inside the VNet or connected networks.

Deploying a Private ARO Cluster

# 1. Create resource group and VNet
az group create --name rg-aro --location eastus

az network vnet create \
  --resource-group rg-aro \
  --name aro-spoke-vnet \
  --address-prefixes 10.1.0.0/16

# 2. Create master subnet — disable private endpoint network policies
az network vnet subnet create \
  --resource-group rg-aro \
  --vnet-name aro-spoke-vnet \
  --name master-subnet \
  --address-prefixes 10.1.0.0/24 \
  --disable-private-link-service-network-policies true  # ← required for ARO

# 3. Create worker subnet
az network vnet subnet create \
  --resource-group rg-aro \
  --vnet-name aro-spoke-vnet \
  --name worker-subnet \
  --address-prefixes 10.1.1.0/23

# 4. Deploy private ARO cluster
az aro create \
  --resource-group rg-aro \
  --name aro-prod \
  --vnet aro-spoke-vnet \
  --master-subnet master-subnet \
  --worker-subnet worker-subnet \
  --apiserver-visibility Private \     # ← API server private
  --ingress-visibility Private \       # ← ingress private
  --master-vm-size Standard_D8s_v3 \
  --worker-vm-size Standard_D16s_v3 \
  --worker-count 3 \
  --pull-secret @pull-secret.txt

# Takes ~35 minutes to complete

The Three Access Paths

Path 1 — Azure Bastion + Jump Host (most common)

The simplest pattern — a small Linux VM in the hub VNet with oc and kubectl installed, accessed securely via Bastion:

# 1. Admin opens Azure portal → connects via Bastion to jump-host-vm
# 2. On jump host — get cluster credentials
az aro list-credentials \
  --resource-group rg-aro \
  --name aro-prod

# Output:
# kubeadminPassword: "XXXXX-XXXXX-XXXXX-XXXXX"
# kubeadminUsername: "kubeadmin"

# 3. Get API server URL
API_URL=$(az aro show \
  --resource-group rg-aro \
  --name aro-prod \
  --query apiserverProfile.url -o tsv)

# 4. Login — works because jump host is in peered VNet
oc login $API_URL \
  --username kubeadmin \
  --password XXXXX-XXXXX-XXXXX-XXXXX

# 5. Verify
oc get nodes
oc get clusterversion

Path 2 — ExpressRoute / VPN from on-premises

On-premises developers access the private API server directly from their workstations — but DNS must be configured to resolve the ARO private DNS zone:

			
On-premises developer workstation
        ↓
Corporate DNS server: api.cluster.eastus.aroapp.io
        ↓ conditional forward to Azure DNS Private Resolver (10.0.5.4)
Azure DNS Private Resolver
        ↓ linked private DNS zone: aroapp.io → 10.1.3.4
Returns: 10.1.3.4
Developer runs: oc login https://api.cluster.eastus.aroapp.io:6443
Traffic travels: workstation → MPLS → ER Gateway → hub VNet peering → ARO spoke → API server

		

On-premises DNS server conditional forwarder:

			
Zone: aroapp.io
Forward to: 10.0.5.4 (DNS Private Resolver inbound endpoint)

Path 3 — CI/CD Pipeline (GitHub Actions / Azure DevOps)

For automated deployments, pipelines must also reach the private API server. Use a self-hosted runner inside the VNet:

# GitHub Actions — self-hosted runner in hub VNet
name: Deploy to ARO
on: [push]
jobs:
  deploy:
    runs-on: self-hosted    # ← runner VM inside Azure VNet
    steps:
      - uses: actions/checkout@v4

      - name: Login to ARO
        run: |
          oc login ${{ secrets.ARO_API_URL }} \
            --token ${{ secrets.ARO_SERVICE_ACCOUNT_TOKEN }}

      - name: Deploy application
        run: |
          oc apply -f k8s/
          oc rollout status deployment/my-app

The self-hosted runner is a VM in the hub VNet — it can resolve the private API server DNS and reach port 6443 over VNet peering.

Private DNS — The Critical Detail

After cluster creation, ARO automatically creates a private DNS zone. You must link this zone to every VNet that needs to resolve the API server — including the hub VNet where your jump host and DNS Private Resolver live:

# ARO creates this automatically — linked to ARO spoke VNet
# You must manually link it to the hub VNet

PRIVATE_ZONE=$(az network private-dns zone list \
  --resource-group $(az aro show -g rg-aro -n aro-prod \
    --query clusterProfile.resourceGroupId -o tsv | tr -d '\n') \
  --query "[?contains(name,'aroapp.io')].name" -o tsv)

# Link to hub VNet
az network private-dns link vnet create \
  --resource-group <aro-managed-rg> \
  --zone-name $PRIVATE_ZONE \
  --name link-to-hub-vnet \
  --virtual-network $(az network vnet show \
    --resource-group rg-hub \
    --name hub-vnet --query id -o tsv) \
  --registration-enabled false

Without this link, VMs in the hub VNet cannot resolve api.cluster.eastus.aroapp.io — DNS queries fall through to public DNS which returns NXDOMAIN for a private cluster.

Entra ID (AAD) Integration for Developer Access

Replace the kubeadmin local account with Entra ID authentication — developers log in with their corporate credentials:

# Configure AAD OAuth on ARO
az aro update \
  --resource-group rg-aro \
  --name aro-prod \
  --client-id <app-registration-client-id> \
  --client-secret <app-registration-secret>

# Grant cluster-admin to an AAD group
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: aro-cluster-admins
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
  - kind: Group
    apiGroup: rbac.authorization.k8s.io
    name: <aad-group-object-id>    # e.g. Platform Engineering team
---
# Grant view-only to a developer group
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: dev-view
  namespace: my-app
roleRef:
  kind: ClusterRole
  name: view
subjects:
  - kind: Group
    name: <dev-aad-group-object-id>

Developers now login via:

			
oc login $API_URL    # redirects to Microsoft login page
# Enter corporate credentials → MFA → issued a token

Egress from a Private Cluster

A private cluster still needs outbound internet access for Red Hat image registries and update servers. Force all egress through Azure Firewall via UDR on both subnets:

# Route table for master and worker subnets
az network route-table create \
  --resource-group rg-aro-network \
  --name rt-aro-subnets

az network route-table route create \
  --resource-group rg-aro-network \
  --route-table-name rt-aro-subnets \
  --name force-to-firewall \
  --address-prefix 0.0.0.0/0 \
  --next-hop-type VirtualAppliance \
  --next-hop-ip-address 10.0.1.4     # Azure Firewall private IP

# Associate with master subnet
az network vnet subnet update \
  --resource-group rg-aro \
  --vnet-name aro-spoke-vnet \
  --name master-subnet \
  --route-table rt-aro-subnets

# Associate with worker subnet
az network vnet subnet update \
  --resource-group rg-aro \
  --vnet-name aro-spoke-vnet \
  --name worker-subnet \
  --route-table rt-aro-subnets

Required Azure Firewall FQDN allow rules for private ARO:

quay.io                          # Red Hat image registry
registry.redhat.io               # Red Hat registry
registry.access.redhat.com       # RHEL content
cdn.quay.io                      # CDN for quay
*.blob.core.windows.net          # Azure storage (etcd backups, images)
*.servicebus.windows.net         # ARO monitoring
management.azure.com             # Azure ARM API
login.microsoftonline.com        # Entra ID auth

Key Takeaway

A private ARO cluster achieves zero public attack surface by replacing the public API server load balancer with a VNet-internal private endpoint, and replacing the public ingress load balancer with an internal one. DNS resolution of both endpoints stays entirely within Azure’s private network. The only access paths are Azure Bastion for interactive access, ExpressRoute or VPN for on-premises connectivity, and self-hosted CI/CD runners for automation — all travelling over encrypted private paths without a single packet touching the public internet.

Optimize Azure Traffic Flow with UDR

April 30, 2026May 1, 2026 techhadoop azure azure, cloud, cybersecurity, security, technology

Traffic Flow Through Azure Firewall via UDR

The UDR (User Defined Route) is the mechanism that forces all spoke traffic through Azure Firewall — overriding Azure’s default system routes which would otherwise send traffic directly between peered VNets, bypassing inspection entirely.

Why UDRs Are Necessary

Azure VNet peering by default creates direct routing between peered VNets — packets travel peer-to-peer without touching any intermediate device. This means without UDRs, a VM in Spoke 1 talking to a VM in Spoke 2 completely bypasses Azure Firewall:

Default behaviour (NO UDR):
  Spoke 1 VM (10.1.1.4) → Spoke 2 VM (10.2.1.4)
  System route: 10.2.0.0/16 → VNet peering (direct)
  Result: traffic bypasses firewall entirely 

With UDR applied to spoke subnet:
  Spoke 1 VM (10.1.1.4) → Spoke 2 VM (10.2.1.4)
  UDR overrides: 0.0.0.0/0 → 10.0.1.4 (Firewall private IP)
  Result: traffic hits firewall → inspected → forwarded

The UDR wins because custom routes always override system routes — Azure’s route selection priority is custom UDR first, then BGP routes, then system routes.

Route Table Structure

A route table is an Azure resource associated with one or more subnets. Every subnet that needs inspection gets the same core UDR:

Route Table: rt-spoke1-subnets
Associated to: Spoke 1 subnet A, Spoke 1 subnet B

Routes:
  Name              Prefix          Next hop type        Next hop IP
  ─────────────────────────────────────────────────────────────────
  force-to-fw       0.0.0.0/0       Virtual Appliance    10.0.1.4

Deployed via ARM / Bicep:

resource routeTable 'Microsoft.Network/routeTables@2023-04-01' = {
  name: 'rt-spoke1-subnets'
  location: location
  properties: {
    disableBgpRoutePropagation: true   // ← critical — explained below
    routes: [
      {
        name: 'force-to-firewall'
        properties: {
          addressPrefix: '0.0.0.0/0'
          nextHopType: 'VirtualAppliance'
          nextHopIpAddress: '10.0.1.4'   // Azure Firewall private IP
        }
      }
    ]
  }
}

// Associate with spoke subnet
resource subnetAssociation 'Microsoft.Network/virtualNetworks/subnets@2023-04-01' = {
  name: 'spoke1/snet-app'
  properties: {
    addressPrefix: '10.1.1.0/24'
    routeTable: {
      id: routeTable.id
    }
  }
}

The Three Traffic Paths

Path 1 — North-South Outbound (spoke VM → internet)

			
Step 1: Spoke 1 VM (10.1.1.4) sends packet to 8.8.8.8
Step 2: Subnet route table consulted
        UDR match: 0.0.0.0/0 → next hop 10.0.1.4 (Firewall)
Step 3: Packet arrives at Azure Firewall private IP
Step 4: Firewall evaluates application rules
        Rule: allow src=10.1.0.0/16 dest=*.google.com proto=HTTPS → ALLOW
Step 5: Firewall SNATs packet
        Source IP changed: 10.1.1.4 → Firewall public IP (20.x.x.x)
Step 6: Packet exits to internet from Firewall public IP
        Return traffic arrives at Firewall public IP
Step 7: Firewall translates back → forwards to 10.1.1.4

		

SNAT is automatic for internet-bound traffic — the spoke VM’s private IP is never exposed to the internet. Azure Firewall’s public IP is the only address the internet sees.

Path 2 — North-South Inbound (internet → spoke VM)

			
Step 1: External client sends to Firewall public IP 20.x.x.x:443
Step 2: Firewall DNAT rule fires:
        dest 20.x.x.x:443 → translated to 10.1.4.5:443 (spoke VM)
Step 3: Firewall forwards to 10.1.4.5 via VNet peering path
Step 4: Packet arrives at spoke VM — no public IP needed on VM
Step 5: VM responds to Firewall private IP (it sees FW as source)
        UDR on VM subnet ensures return goes back through Firewall
Step 6: Firewall forwards return to external client

		

Path 3 — East-West (spoke 1 VM → spoke 2 VM)

This is the most important path for security — lateral movement between spokes must be inspected:

			
Step 1: Spoke 1 VM (10.1.1.4) sends packet to Spoke 2 VM (10.2.1.8)
Step 2: Spoke 1 subnet route table consulted
        UDR: 0.0.0.0/0 → 10.0.1.4 (matches — more specific than system route)
Step 3: Packet arrives at Azure Firewall
Step 4: Firewall evaluates network rules
        Rule: allow src=10.1.0.0/16 dest=10.2.1.8 port=443 → ALLOW
        (or deny if no rule matches)
Step 5: Firewall forwards to 10.2.1.8 via peering to Spoke 2
Step 6: Spoke 2 subnet route table:
        UDR: 0.0.0.0/0 → 10.0.1.4
        Return traffic: 10.2.1.8 → 10.1.1.4
        UDR forces return through Firewall too
Step 7: Firewall forwards return packet to Spoke 1 VM

		

Both directions of every connection traverse the Firewall — request and response. This is essential for stateful inspection — if only one direction went through the Firewall, the session state table would be incomplete.

disableBgpRoutePropagation — Why It Matters

Every route table has a disableBgpRoutePropagation flag. On spoke subnets this must be set to true:

			
disableBgpRoutePropagation: false (default)
  → VPN Gateway pushes on-premises routes into spoke effective routes
  → Spoke VM sends on-premises traffic directly to Gateway
  → Bypasses Firewall for on-premises bound traffic ❌
disableBgpRoutePropagation: true (required for spoke subnets)
  → VPN Gateway routes suppressed on spoke subnets
  → Only UDR routes active: 0.0.0.0/0 → Firewall
  → All traffic including on-premises bound goes through Firewall ✅

		

Forgetting this setting is one of the most common misconfiguration errors in hub and spoke deployments — on-premises traffic silently bypasses the Firewall even though the UDR looks correct.

UDR on GatewaySubnet — On-Premises to Spoke

To inspect traffic arriving from on-premises destined for spoke VNets, a UDR must also be applied to the GatewaySubnet:

			
Route Table: rt-gateway-subnet
Associated to: GatewaySubnet
Routes:
  Name            Prefix           Next hop type       Next hop IP
  ────────────────────────────────────────────────────────────────
  to-spoke1       10.1.0.0/16      VirtualAppliance    10.0.1.4
  to-spoke2       10.2.0.0/16      VirtualAppliance    10.0.1.4
  to-spoke3       10.3.0.0/16      VirtualAppliance    10.0.1.4

		

Note this uses specific spoke prefixes rather than 0.0.0.0/0 — applying a default route to GatewaySubnet breaks the gateway’s ability to communicate with Azure control plane endpoints.

Effective Route Inspection

You can verify UDRs are working correctly by checking a VM’s effective routes in the Azure portal or CLI:

			
az network nic show-effective-route-table \
  --resource-group rg-spoke1 \
  --name vm-prod-01-nic \
  --output table
Source    State    Address Prefix    Next Hop Type       Next Hop IP
────────  ───────  ────────────────  ──────────────────  ──────────
Default   Active   10.1.0.0/16       VnetLocal
Default   Invalid  0.0.0.0/0         Internet            ← overridden
User      Active   0.0.0.0/0         VirtualAppliance    10.0.1.4  ✅
Default   Active   10.0.0.0/16       VNetPeering
Default   Active   10.2.0.0/16       VNetPeering

		

The default 0.0.0.0/0 → Internet system route shows as Invalid — it has been overridden by the custom UDR pointing to the Firewall. This confirms all traffic is force-tunnelled correctly.

Common Misconfiguration Pitfalls

Forgetting disableBgpRoutePropagation — gateway-learned routes override UDRs for on-premises prefixes, silently bypassing Firewall for hybrid traffic.

Missing return path UDR — if Spoke 2 subnet has no UDR, return traffic goes directly back to Spoke 1 via peering, creating an asymmetric routing loop that breaks TCP sessions.

Applying UDR to AzureBastionSubnet — Bastion requires direct internet connectivity for its management plane. A UDR with 0.0.0.0/0 → Firewall on AzureBastionSubnet breaks Bastion entirely. Bastion subnet must have no UDR or a very specific one that excludes Bastion management ranges.

Applying 0.0.0.0/0 UDR to GatewaySubnet — breaks gateway health probes and control plane communication. Always use specific spoke prefixes on GatewaySubnet, never a default route.

Firewall private IP not static — Azure Firewall’s private IP should be configured as static during deployment. If it changes, every UDR next-hop entry becomes invalid and traffic black-holes.

Key Takeaway

The UDR is a deceptively simple mechanism — a single route entry 0.0.0.0/0 → Virtual Appliance → 10.0.1.4 — that transforms Azure’s default direct peering behaviour into a fully inspected, security-enforced network. Applied correctly to every spoke subnet with disableBgpRoutePropagation enabled, it ensures no traffic — outbound internet, inbound DNAT, or lateral east-west — can bypass Azure Firewall, giving you complete visibility and control over your entire hub and spoke estate.

Simplify Your Azure Networking with Route Server

April 30, 2026April 30, 2026 techhadoop azure azure, azure-networking, cloud, networking, technology

Azure Route Server

Azure Route Server is a fully managed service that acts as a BGP route reflector inside your hub VNet — it exchanges routes dynamically between your Network Virtual Appliances (NVAs) and Azure’s software-defined network, eliminating the need to manually maintain User Defined Routes every time your network topology changes.

The Problem It Solves

Without Route Server, every time an NVA learns a new route (a new branch office, a new subnet, a new peer) you had to manually update UDR tables on every spoke subnet:

			
Old approach — manual UDR maintenance:
  NVA learns new branch: 192.168.50.0/24
        ↓
  Engineer must manually add UDR to:
    - Spoke 1 subnet A route table
    - Spoke 1 subnet B route table
    - Spoke 2 subnet A route table
    - Spoke 2 subnet B route table
    - ... every subnet in every spoke
  Miss one → black hole routing → outage
With Route Server:
  NVA advertises 192.168.50.0/24 via BGP to Route Server
        ↓
  Route Server automatically programs the route
  into effective routes of all peered spoke VNets
        ↓
  Done — zero manual intervention

		

How BGP Exchange Works — Step by Step

Step 1 — Route Server deploys into RouteServerSubnet

Route Server requires a dedicated subnet named exactly RouteServerSubnet with a minimum /27:

			
Hub VNet: 10.0.0.0/16
  RouteServerSubnet: 10.0.6.0/27
    → Route Server Instance 0: 10.0.6.4
    → Route Server Instance 1: 10.0.6.5
    → Virtual IP (peering):    10.0.6.6

		

Route Server always deploys as two instances for high availability, both in the same subnet. Azure assigns them IPs automatically. Both instances must be peered with your NVA — you peer with each IP individually.

Route Server always uses ASN 65515 — this is fixed and cannot be changed.

Step 2 — NVA establishes eBGP sessions with both instances

Your NVA (Cisco CSR, Palo Alto VM-Series, Fortinet FortiGate, etc.) opens two BGP sessions — one to each Route Server instance. This is standard external BGP (eBGP) — the NVA and Route Server are in different ASNs:

			
NVA (ASN 65001) ←—eBGP—→ Route Server Instance 0 (ASN 65515, 10.0.6.4)
NVA (ASN 65001) ←—eBGP—→ Route Server Instance 1 (ASN 65515, 10.0.6.5)

Configuration on a Cisco CSR NVA:

			
router bgp 65001
  bgp router-id 10.0.4.4
  bgp log-neighbor-changes
  ! Peer with Route Server instance 0
  neighbor 10.0.6.4 remote-as 65515
  neighbor 10.0.6.4 ebgp-multihop 2
  neighbor 10.0.6.4 soft-reconfiguration inbound
  ! Peer with Route Server instance 1
  neighbor 10.0.6.5 remote-as 65515
  neighbor 10.0.6.5 ebgp-multihop 2
  neighbor 10.0.6.5 soft-reconfiguration inbound
  ! Advertise on-premises routes learned from VPN
  network 192.168.0.0 mask 255.255.0.0
  network 172.16.0.0 mask 255.255.0.0

		

The ebgp-multihop 2 is required because the NVA and Route Server are not directly connected at Layer 2 — they communicate over the VNet fabric.

Step 3 — NVA advertises routes to Route Server

The NVA tells Route Server about routes it knows — on-premises prefixes learned via VPN tunnels, SD-WAN routes, or any custom prefixes:

			
NVA → Route Server:
  ADVERTISE 192.168.0.0/16   (on-premises HQ network)
  ADVERTISE 172.16.0.0/12    (branch offices)
  ADVERTISE 10.100.0.0/16    (SD-WAN overlay)

Route Server accepts these advertisements and stores them.

Step 4 — Route Server programs spoke VNet effective routes

Route Server takes the NVA-advertised routes and automatically injects them into the effective routes of every peered spoke VNet — no UDR required:

			
Spoke 1 VM effective routes (auto-programmed):
0.0.0/16      → VNet local
1.0.0/16      → VNet local
168.0.0/16   → 10.0.4.4 (NVA primary)   ← from Route Server
16.0.0/12    → 10.0.4.4 (NVA primary)   ← from Route Server
100.0.0/16    → 10.0.4.4 (NVA primary)   ← from Route Server
0.0.0/0        → Internet

		

When a spoke VM sends traffic to 192.168.10.5 (an on-premises host), the effective route points it to the NVA, which forwards it through the appropriate VPN tunnel.

Step 5 — Route Server advertises Azure routes back to NVA

The exchange is bidirectional. Route Server tells the NVA about Azure VNet address spaces:

			
Route Server → NVA:
  ADVERTISE 10.0.0.0/16   (hub VNet)
  ADVERTISE 10.1.0.0/16   (spoke 1 VNet)
  ADVERTISE 10.2.0.0/16   (spoke 2 VNet)
  ADVERTISE 10.3.0.0/16   (spoke 3 VNet)

		

The NVA now knows all Azure prefixes and can route on-premises traffic destined for Azure correctly through its VPN tunnels — without anyone manually configuring static routes on the NVA.

Branch-to-Branch — The Key Feature

When branch-to-branch is enabled on Route Server, it becomes a route reflector between VPN Gateway and NVA, allowing on-premises sites to reach each other through Azure:

			
Branch A (192.168.1.0/24) ←—VPN—→ VPN Gateway
                                         ↕  BGP
                                    Route Server
                                         ↕  BGP
Branch B (192.168.2.0/24) ←—VPN—→ NVA
With branch-to-branch ENABLED:
  Route Server reflects Branch A routes → NVA → Branch B
  Route Server reflects Branch B routes → VPN GW → Branch A
  Result: Branch A can reach Branch B through Azure hub

		

This is how Azure Route Server enables transit routing — Azure becomes the backbone connecting your branch offices, without needing a separate SD-WAN overlay.

			
# Enable branch-to-branch
az network routeserver update \
  --resource-group rg-hub-network \
  --name hub-route-server \
  --allow-b2b-traffic true

		

Active-Active NVA Pattern

Route Server is the enabler for active-active NVA deployments — both NVA instances advertise the same routes, and Route Server programs both next-hops into spoke effective routes using ECMP (Equal-Cost Multi-Path):

			
Both NVAs advertise: 192.168.0.0/16
Spoke VM effective routes:
  192.168.0.0/16 → 10.0.4.4 (NVA primary)    ← ECMP
  192.168.0.0/16 → 10.0.4.5 (NVA secondary)  ← ECMP
Traffic load-balanced across both NVAs
If one NVA fails → BGP session drops →
Route Server withdraws that next-hop →
All traffic shifts to remaining NVA automatically

		

This gives you sub-second failover without any manual intervention — the BGP hold-down timer (typically 90 seconds, tunable to as low as 1 second with BFD) triggers automatic route withdrawal.

RouteServerSubnet Requirements

Property	Requirement
Subnet name	Must be exactly `RouteServerSubnet`
Minimum size	`/27` (32 addresses)
Dedicated	No other resources in this subnet
Delegation	None required
NSG	Not supported on RouteServerSubnet
UDR	Not supported on RouteServerSubnet

The restriction on NSGs and UDRs on the RouteServerSubnet is intentional — Azure manages all routing within this subnet internally, and applying UDRs would break the BGP sessions.

Route Server vs Manual UDRs — When to Use Each

Scenario	Use Route Server	Use Manual UDRs
NVA in hub for inspection	✅	❌
Dynamic on-premises routes via BGP	✅	❌
SD-WAN integration	✅	❌
Static force-tunnel to Azure Firewall	❌	✅
Simple hub with Azure Firewall only	❌	✅
Frequently changing branch topology	✅	❌
No NVA — just Azure native services	❌	✅

Route Server shines when you have a third-party NVA with dynamic routing requirements. If your hub uses only Azure Firewall (which does not speak BGP), stick with UDRs — Route Server adds no value without a BGP-capable NVA to peer with.

Key Limits

Limit	Value
BGP peers (NVAs) per Route Server	8
Routes advertised by each NVA	1,000
Routes propagated to spoke VNets	1,000 per VNet
Route Server ASN	65515 (fixed)
NVA ASN restrictions	Cannot use 65515, 65520, 12076
VNets the Route Server can peer with	Unlimited (same region)

The 1,000 route limit per NVA is important for large enterprises with many branch offices — if you have more than 1,000 prefixes, use route summarisation on the NVA before advertising to Route Server.

Key Takeaway

Azure Route Server is the dynamic routing backbone of a hub and spoke network containing third-party NVAs. It replaces fragile, manually maintained UDR tables with automated BGP route exchange — the NVA advertises what it knows, Route Server programs every spoke automatically, and the whole network converges in seconds when topology changes. Combined with active-active NVAs and branch-to-branch enabled, it gives you a carrier-grade routing infrastructure entirely within Azure.