AZ DNS

DNS Architecture

Q: Can you explain the difference between Azure Public DNS and Azure Private DNS Zones, and when you would use each?

Azure Public DNS is used to host publicly resolvable domain names — for example, resolving http://www.yourcompany.com from the internet. Anyone on the internet can query it.

Azure Private DNS Zones, on the other hand, are only resolvable within a VNet or linked VNets. They are used for internal name resolution — for example, resolving a private endpoint for a storage account like mystorageaccount.privatelink.blob.core.windows.net from inside your network, without exposing it publicly.

You use Public DNS when you need external-facing resolution, and Private DNS Zones when you need secure, internal name resolution for resources that should never be reachable from the internet.


Q: How does DNS resolution work for a VM inside a VNet — what is the default behavior, and when would you override it?

By default, Azure provides a built-in DNS resolver at the special IP 168.63.129.16. Every VM in a VNet uses this address automatically. It can resolve Azure-internal hostnames and any Private DNS Zones linked to that VNet.

You would override this default when:

  • You need to resolve on-premises hostnames from Azure (hybrid scenarios)
  • You need conditional forwarding to route specific domain queries to specific DNS servers
  • You are using a centralized custom DNS server (e.g., a DNS forwarder VM or Azure DNS Private Resolver) to control and log all DNS traffic across the environment

In those cases, you configure a custom DNS server address at the VNet level, pointing VMs to your centralized resolver instead.


Q: What is conditional forwarding, and how would you set it up to resolve on-premises domain names from Azure?

Conditional forwarding is a DNS rule that says: “For queries matching this specific domain, forward them to this specific DNS server instead of resolving them normally.”

For example, if your on-premises domain is corp.contoso.local, you would configure your Azure DNS resolver to forward any query for corp.contoso.local to your on-premises DNS server IP.

The setup typically looks like this:

  • Deploy Azure DNS Private Resolver with an outbound endpoint in your Hub VNet
  • Create a DNS forwarding ruleset with a rule: corp.contoso.local → forward to on-premises DNS IP
  • Associate the ruleset with the relevant VNets
  • Ensure the on-premises DNS server can be reached over ExpressRoute or VPN

Q: A client reports that their Azure VM cannot resolve a private endpoint hostname. What are the first things you check?

I would systematically check the following:

  1. Private DNS Zone linkage — Is the Private DNS Zone (e.g., privatelink.blob.core.windows.net) linked to the VNet the VM is in? Without the link, the zone is invisible to that VNet.
  2. A record presence — Does the Private DNS Zone actually contain an A record pointing to the private endpoint’s IP?
  3. Custom DNS configuration — If the VNet uses a custom DNS server, is that server forwarding queries for privatelink.* domains to Azure’s resolver (168.63.129.16)? This is a very common misconfiguration.
  4. nslookup / dig from the VM — Run nslookup <hostname> on the VM to see what IP is being returned. If it returns the public IP instead of the private IP, the DNS zone is not being picked up correctly.
  5. Network connectivity — Even if DNS resolves correctly, confirm NSG rules and routing aren’t blocking traffic to the private endpoint IP.

Q: How would you use Azure DNS Private Resolver, and how does it differ from a traditional DNS forwarder running on a VM?

Azure DNS Private Resolver is a fully managed, highly available DNS service that handles inbound and outbound DNS resolution without requiring you to manage VMs.

  • The inbound endpoint allows on-premises clients to send DNS queries into Azure and resolve Private DNS Zones — something that wasn’t possible before without a forwarder VM.
  • The outbound endpoint with forwarding rulesets allows Azure VMs to forward specific domain queries (e.g., on-premises domains) to external DNS servers.

Compared to a forwarder VM, DNS Private Resolver is:

  • Fully managed — no patching, no VM maintenance, no availability concerns
  • Scalable — handles high query volumes automatically
  • Integrated — natively understands Azure Private DNS Zones without extra configuration
  • More secure — no need to open management ports on a VM

The main reason teams still use forwarder VMs is legacy architecture or specific advanced configurations not yet supported by Private Resolver.


🔵 VNet Peering

Q: What is the difference between regional and global VNet peering? Are there any restrictions with global peering?

Regional VNet peering connects two VNets within the same Azure region. Global VNet peering connects VNets across different Azure regions.

Restrictions with global peering:

  • Basic Load Balancer — Resources behind a Basic Load Balancer in one VNet cannot be reached from a globally peered VNet. Standard Load Balancer works fine.
  • Latency — Global peering crosses region boundaries, so latency is higher than regional peering. You need to account for this in latency-sensitive workloads.
  • Cost — Global peering incurs data transfer charges in both directions, whereas regional peering charges are lower.
  • No transitive routing — Same as regional peering, traffic does not flow transitively through a peered VNet without additional configuration.

Q: Can peered VNets communicate transitively by default? How would you work around this?

No — transitive routing is not supported natively in VNet peering. If Spoke A is peered to the Hub, and Spoke B is peered to the Hub, Spoke A cannot reach Spoke B directly through the Hub by default.

To work around this, you have two main options:

  1. Azure Firewall or NVA in the Hub — Route traffic from Spoke A through the Hub firewall, which then forwards it to Spoke B. This requires User Defined Routes (UDRs) on both Spokes pointing their traffic to the firewall’s private IP as the next hop. This is the most common enterprise approach and has the added benefit of traffic inspection.
  2. Azure Virtual WAN — Virtual WAN supports transitive routing natively, making it a cleaner option when you have many Spokes and don’t want to manage UDRs manually.

Q: Spoke A and Spoke B are peered to the Hub. Can Spoke A reach Spoke B? What needs to be in place?

Not by default. To enable this:

  • Deploy Azure Firewall (or an NVA) in the Hub VNet
  • Create a UDR on Spoke A’s subnet with a route: destination = Spoke B’s address space, next hop = Azure Firewall private IP
  • Create a mirror UDR on Spoke B’s subnet: destination = Spoke A’s address space, next hop = Azure Firewall private IP
  • Configure Azure Firewall network rules to allow the traffic between Spoke A and Spoke B
  • Enable “Use Remote Gateway” or “Allow Gateway Transit” on the peering connections as needed for routing to propagate correctly

This gives you transitive connectivity with centralized inspection — a core benefit of Hub-and-Spoke.


Q: When would you choose VNet peering over VPN Gateway or ExpressRoute for VNet-to-VNet connectivity?

  • VNet Peering — Best for Azure-to-Azure connectivity. It uses the Microsoft backbone, offers the lowest latency, highest throughput, and is the simplest to configure. Use it whenever both VNets are in Azure.
  • VPN Gateway (VNet-to-VNet) — Used when you need encrypted tunnels between VNets, or when connecting across different Azure tenants/subscriptions where peering may be complex. Higher latency and limited throughput compared to peering.
  • ExpressRoute — Used for on-premises to Azure connectivity over a private, dedicated circuit. Not typically used for VNet-to-VNet unless traffic must flow through on-premises for compliance or inspection reasons.

In short: always prefer peering for Azure-to-Azure, and reserve VPN/ExpressRoute for hybrid or cross-tenant scenarios.


🔵 Hub-and-Spoke Network Design

Q: Explain the Hub-and-Spoke topology. What lives in the Hub, and what lives in the Spokes?

Hub-and-Spoke is a network design pattern where a central VNet (the Hub) acts as the connectivity and security backbone, and multiple Spoke VNets connect to it via peering.

The Hub hosts shared, centralized services:

  • Azure Firewall or NVA for traffic inspection and internet egress control
  • VPN Gateway or ExpressRoute Gateway for on-premises connectivity
  • Azure DNS Private Resolver
  • Bastion for secure VM access
  • Shared monitoring and logging infrastructure

The Spokes host workload-specific resources:

  • Application VMs, AKS clusters, App Services
  • Databases and storage
  • Each Spoke is isolated — it can only communicate outside its boundary through the Hub, which enforces security policies

This model gives you centralized governance and security without duplicating shared services in every workload environment.


Q: How do you enforce traffic inspection through the Hub for Spoke-to-internet traffic?

  • Deploy Azure Firewall in the Hub VNet
  • On each Spoke subnet, create a UDR with a default route: 0.0.0.0/0 → next hop = Azure Firewall private IP
  • This forces all outbound internet traffic from Spoke VMs through the firewall before it exits
  • On the Hub, configure Azure Firewall application and network rules to define what traffic is allowed out
  • Enable Azure Firewall DNS proxy if you want centralized DNS logging as well

For Spoke-to-Spoke, additional UDRs point inter-spoke traffic to the firewall as described earlier.


Q: A new business unit needs to be onboarded into your existing Hub-and-Spoke architecture. Walk me through the steps.

  1. IP planning — Allocate a non-overlapping address space for the new Spoke VNet from the enterprise IP plan
  2. Create the Spoke VNet — Deploy it in the appropriate subscription under the correct Management Group
  3. Establish peering — Create bidirectional peering between the new Spoke and the Hub (allow gateway transit on Hub side, use remote gateway on Spoke side if needed)
  4. Configure UDRs — Apply route tables on the Spoke subnets to direct internet and cross-spoke traffic through the Hub firewall
  5. DNS configuration — Point the Spoke VNet’s DNS settings to the centralized DNS Private Resolver in the Hub
  6. Firewall rules — Add rules in Azure Firewall to permit the business unit’s required traffic flows
  7. Azure Policy — Ensure the new subscription inherits enterprise policies (e.g., no public IPs, required tags, allowed regions)
  8. Private DNS Zone links — Link relevant Private DNS Zones to the new Spoke VNet for private endpoint resolution
  9. Connectivity testing — Validate DNS resolution, internet egress, and any required on-premises connectivity

🔵 Landing Zones & Enterprise Network Governance

Q: What is an Azure Landing Zone, and how does networking fit into it?

An Azure Landing Zone is a pre-configured, governed Azure environment that provides the foundation for hosting workloads securely and at scale. It is designed following Microsoft’s Cloud Adoption Framework (CAF) and covers identity, governance, security, networking, and management.

Networking is one of the most critical components. In the CAF Landing Zone model:

  • A Connectivity subscription hosts the Hub VNet, gateways, firewall, and DNS infrastructure
  • Landing Zone subscriptions host Spoke VNets for individual workloads or business units
  • All networking is governed centrally — workload teams cannot create arbitrary public IPs or peer VNets outside the approved architecture
  • Azure Policy enforces these constraints automatically

Q: What role do Azure Policy and Management Groups play in enforcing network governance?

Management Groups create a hierarchy of subscriptions (e.g., Root → Platform → Landing Zones → Business Units). Policies applied at a Management Group level automatically inherit down to all subscriptions beneath it.

Azure Policy enforces guardrails such as:

  • Deny creation of public IP addresses in Spoke subscriptions
  • Require all VNets to use a specific custom DNS server
  • Deny VNet peering unless it connects to the approved Hub
  • Enforce NSG association on every subnet
  • Require private endpoints for PaaS services like Storage and SQL

Together, they ensure that even if a workload team has Contributor access to their subscription, they cannot violate the network architecture — the policies block non-compliant actions automatically.


Q: How would you manage IP address space allocation across multiple subscriptions to avoid conflicts?

This is an area where discipline and tooling are both essential:

  • Centralized IP plan — Maintain a master IP address management (IPAM) document or tool (e.g., Azure’s native IPAM feature in preview, or third-party tools like Infoblox or NetBox) that tracks all allocated ranges across subscriptions
  • Non-overlapping ranges per Spoke — Assign each Landing Zone a dedicated, non-overlapping CIDR block from a master supernet (e.g., 10.0.0.0/8 split into /16 per region, then /24 per Spoke)
  • Azure Policy — Use policy to deny VNet creation if the address space conflicts with known ranges or falls outside the approved supernet
  • Automation — When onboarding new Landing Zones via Pulumi or other IaC, automatically pull the next available range from the IPAM system rather than relying on manual assignment

🔵 Hybrid DNS Resolution

Q: On-premises clients need to resolve privatelink.blob.core.windows.net. What DNS architecture changes are needed?

This is one of the most common hybrid DNS challenges. By default, privatelink.blob.core.windows.net resolves to a public IP from on-premises. To make it resolve to the private endpoint IP, you need:

On the Azure side:

  • Create a Private DNS Zone for privatelink.blob.core.windows.net and link it to the Hub VNet
  • Ensure the private endpoint A record is registered in the zone
  • Deploy Azure DNS Private Resolver with an inbound endpoint in the Hub VNet — this gives on-premises clients a routable IP to send DNS queries into Azure

On the on-premises side:

  • Configure your on-premises DNS server with a conditional forwarder: privatelink.blob.core.windows.net → forward to the DNS Private Resolver inbound endpoint IP
  • Ensure the inbound endpoint IP is reachable over ExpressRoute or VPN from on-premises

Result: On-premises clients query their local DNS → conditional forwarder redirects to Azure DNS Private Resolver → Private Resolver checks the linked Private DNS Zone → returns the private endpoint IP → traffic flows privately over ExpressRoute/VPN.


Q: You’re migrating from a custom DNS forwarder VM to Azure DNS Private Resolver. How do you ensure zero DNS disruption?

  1. Deploy Private Resolver in parallel — Set up the inbound and outbound endpoints and configure forwarding rulesets to mirror the existing forwarder VM’s rules exactly
  2. Test thoroughly — Validate resolution of all key domains (on-premises, private endpoints, public) from test VMs pointing to the new resolver
  3. Staged migration — Update the custom DNS server setting on VNets one at a time, starting with non-production VNets, monitoring for any resolution failures
  4. Update on-premises conditional forwarders — Once Azure-side is validated, update on-premises DNS to point to the Private Resolver inbound endpoint instead of the old forwarder VM IP
  5. Monitor — Use Azure Monitor and DNS metrics on the Private Resolver to confirm query volumes are healthy
  6. Decommission the VM — Only after all VNets and on-premises forwarders are updated and validated, remove the forwarder VM

The key principle is run both in parallel, migrate gradually, and never cut over until validation is complete.


Enterprise RAG Pipeline & Internal AI Assistant Azure Ecosystem: ADF, ADLS Gen2, Databricks, AI Search, OpenAI


1. The Project Title

Enterprise RAG Pipeline & Internal AI Assistant Azure Ecosystem: ADF, ADLS Gen2, Databricks, AI Search, OpenAI


2. Impact-Driven Bullet Points

Use the C-A-R (Context-Action-Result) method. Choose 3-4 from this list:

  • Architecture: Architected and deployed a multi-stage data lake (Medallion Architecture) using ADLS Gen2 and Terraform, reducing data fragmentation across internal departments.
  • Orchestration: Developed automated Azure Data Factory (ADF) pipelines with event-based triggers to ingest and preprocess 5,000+ internal documents (PDF/Office) with 99% reliability.
  • AI Engineering: Built a Databricks processing engine to perform recursive character chunking and vector embedding using text-embedding-3-large, optimizing retrieval context for a GPT-4o powered chatbot.
  • Search Optimization: Implemented Hybrid Search (Vector + Keyword) and Semantic Ranking in Azure AI Search, improving answer relevance by 35% compared to traditional keyword-only search.
  • Security & Governance: Integrated Microsoft Entra ID and ACL-based Security Trimming to ensure the AI assistant respects document-level permissions, preventing unauthorized data exposure.
  • Cost Management: Optimized cloud spend by 40% through Databricks Serverless compute and automated ADLS Lifecycle Management policies (Hot-to-Cold tiering).

3. Skills Section (Keywords for ATS)

  • Cloud & Data: Azure Data Factory (ADF), ADLS Gen2, Azure Databricks, Spark (PySpark), Medallion Architecture, Delta Lake.
  • AI & Search: Retrieval-Augmented Generation (RAG), Azure AI Search, Vector Databases, Semantic Ranking, Hybrid Retrieval.
  • LLMs: Azure OpenAI (GPT-4o), Embeddings, Prompt Engineering, LangChain/LlamaIndex.
  • DevOps/IaC: Terraform, Azure DevOps (CI/CD), Managed Identities, Unity Catalog.

4. The “Interview Hook”

In your Professional Summary or Project Description, add one sentence that proves you know the real-world challenges of AI:

“Implemented a production-ready RAG system that solves for LLM hallucinations by enforcing strict citation requirements and PII redaction during the ingestion phase.”


Pro-Tip for 2026:

Hiring managers currently care deeply about “Day 2 Operations.” If they ask about this project in an interview, mention how you monitored it for Cost (Azure Budgets) and Quality (using an evaluation framework like Ragas or Azure AI Content Safety). This proves you aren’t just a “tutorial follower” but a production-ready engineer.

To deploy Azure Databricks using Terraform, you need to set up three main components: a Resource Group, the Databricks Workspace, and the Network Security Group (optional but recommended).

Below is a clean, modular example. This configuration uses the “standard” pricing tier, which is usually sufficient for internal RAG testing.

1. The Terraform Configuration (main.tf)

Terraform

# 1. Define the Provider
terraform {
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "~> 3.0" # Use the latest stable 3.x version
}
}
}
provider "azurerm" {
features {}
}
# 2. Create a Resource Group
resource "azurerm_resource_group" "rg" {
name = "rg-databricks-internal-rag"
location = "East US"
}
# 3. Create the Azure Databricks Workspace
resource "azurerm_databricks_workspace" "example" {
name = "dbw-internal-ai-processor"
resource_group_name = azurerm_resource_group.rg.name
location = azurerm_resource_group.rg.location
sku = "premium" # Premium is required for Unity Catalog & Security Trimming
# Managed Resource Group is where Databricks puts its internal VMs/Storage
managed_resource_group_name = "rg-dbw-managed-resources"
tags = {
Environment = "Dev"
Project = "Internal-RAG"
}
}
# 4. Output the Workspace URL (You'll need this to log in)
output "databricks_host" {
value = azurerm_databricks_workspace.example.workspace_url
}

2. Crucial Architecture Notes for Terraform

  • The Managed Resource Group: Databricks is a “Managed Service.” When you run the code above, Azure will automatically create a second resource group (e.g., rg-dbw-managed-resources). You should not manually delete items inside that group; Terraform and Databricks manage them.
  • SKU Selection:
    • Standard: Cheaper, but lacks advanced security.
    • Premium: Required if you want to use Unity Catalog or User-level Permissions, which are essential for your internal RAG system’s security.
  • VNet Injection (Advanced): If your company requires the Databricks clusters to stay entirely off the public internet, you would need to add a custom_parameters block to the resource to “inject” it into your own Virtual Network.

3. How to Deploy

  1. Initialize: Download the Azure providers.Bashterraform init
  2. Plan: Preview what will be created.Bashterraform plan
  3. Apply: Deploy to Azure (you will be prompted to type yes).Bashterraform apply

4. Next Steps for your RAG Pipeline

Once this workspace is deployed, you will likely want to use Terraform to create Personal Access Tokens (PATs) or Service Principals so that Azure Data Factory (ADF) has permission to trigger Databricks jobs.

II . Add the ADLS Storage Account and ADF to this same Terraform script so they are all connected automatically

This is the “Enterprise Starter Kit” in Terraform. This code creates the resource group, the ADLS Gen2 storage (the reservoir), the Data Factory (the pipes), and the Databricks Workspace (the engine), then connects them so they can talk to each other.

The Integrated Terraform Script

Terraform

# 1. Resource Group
resource "azurerm_resource_group" "main" {
name = "rg-internal-rag-prod"
location = "East US"
}
# 2. ADLS Gen2 Storage (The Warehouse)
resource "azurerm_storage_account" "adls" {
name = "stdatalakeprod001" # Must be unique globally
resource_group_name = azurerm_resource_group.main.name
location = azurerm_resource_group.main.location
account_tier = "Standard"
account_replication_type = "LRS"
account_kind = "StorageV2"
is_hns_enabled = true # THIS TURNS IT INTO ADLS GEN2
}
resource "azurerm_storage_data_lake_gen2_filesystem" "raw" {
name = "bronze-raw"
storage_account_id = azurerm_storage_account.adls.id
}
# 3. Azure Data Factory (The Orchestrator)
resource "azurerm_data_factory" "adf" {
name = "adf-internal-rag-orchestrator"
location = azurerm_resource_group.main.location
resource_group_name = azurerm_resource_group.main.name
identity {
type = "SystemAssigned" # Allows ADF to talk to others without passwords
}
}
# 4. Databricks Workspace (The Processor)
resource "azurerm_databricks_workspace" "dbw" {
name = "dbw-internal-ai-engine"
resource_group_name = azurerm_resource_group.main.name
location = azurerm_resource_group.main.location
sku = "premium" # Required for security trimming/Unity Catalog
}
# 5. CONNECTING THEM: ADF Linked Service to Databricks
# This tells ADF how to trigger Databricks jobs
resource "azurerm_data_factory_linked_service_azure_databricks" "adf_to_dbw" {
name = "LS_Databricks_Connection"
data_factory_id = azurerm_data_factory.adf.id
adb_domain = "https://${azurerm_databricks_workspace.dbw.workspace_url}"
# Using Managed Identity for security (no keys stored!)
msi_work_space_resource_id = azurerm_databricks_workspace.dbw.id
new_cluster_config {
node_type = "Standard_DS3_v2"
cluster_version = "13.3.x-scala2.12"
min_number_of_workers = 1
max_number_of_workers = 2
}
}

Why this setup is “RAG-Ready”:

  • is_hns_enabled = true: This is the most critical line. Without it, you just have a slow, flat blob storage. This enables the high-speed folder structure for your data lake.
  • Managed Identity (SystemAssigned): Notice there are no passwords or “Access Keys” in this code. ADF uses its own identity to log into Databricks. This is the gold standard for internal security.
  • sku = "premium": In an internal environment, you need “Premium” so you can enable Unity Catalog. This ensures that if a user doesn’t have permission to see an HR file in the Data Lake, the Databricks AI code won’t be able to read it either.
  • Automatic Cluster: The new_cluster_config block tells Data Factory to spin up a Databricks cluster only when it’s needed and shut it down immediately after, saving your department significant money.

How to use this:

  1. Save this as main.tf.
  2. Run terraform init and terraform apply.
  3. Once finished, go to the Azure Portal and you will see your entire data ecosystem ready to be loaded with documents.

III. Terraform code to deploy the Azure AI Search service to complete the RAG stack

To complete your Internal RAG stack, we need to add Azure AI Search. In 2026, this service is the “Vector Database” that allows your chatbot to find the right paragraphs within your internal documents.

The Final Terraform Piece: Azure AI Search

Add this block to your existing main.tf file. It deploys the Search service with Semantic Ranking enabled—which is the “secret sauce” for high-accuracy internal bots.

Terraform

# 6. Azure AI Search Service
resource "azurerm_search_service" "search" {
name = "ais-internal-rag-lookup"
resource_group_name = azurerm_resource_group.main.name
location = azurerm_resource_group.main.location
# 'basic' or 'standard' is required for Semantic Ranker and Vector Search
sku = "basic"
# Enabling Semantic Search (Standard plan for the ranker)
semantic_search_sku = "standard"
# Identity for secure connection to ADLS and OpenAI
identity {
type = "SystemAssigned"
}
tags = {
Project = "Internal-RAG"
}
}
# Output the Search Admin Key (for your App/Chatbot)
output "search_admin_key" {
value = azurerm_search_service.search.primary_key
sensitive = true
}
output "search_endpoint" {
value = "https://${azurerm_search_service.search.name}.search.windows.net"
}

How it all fits together (The Finished Product)

You now have a fully automated infrastructure code that creates:

  1. ADLS Gen2: Where your raw internal PDFs and Docs are stored.
  2. Azure Data Factory: The engine that “watches” for new files and tells the system to update.
  3. Azure Databricks: The brain that cleans data and can perform complex AI “chunking” of your documents.
  4. Azure AI Search: The searchable index that your chatbot queries to get answers.

2026 Pro-Tip: “Integrated Vectorization”

In 2026, you don’t actually need to write Python code to turn text into vectors. You can configure Azure AI Search to connect directly to your ADLS (using the adlsgen2 data source type) and it will:

  • Automatically “crack” the PDFs.
  • Chunk the text.
  • Call Azure OpenAI to create vectors.
  • Keep the index updated whenever a file in ADLS changes.

Final Step for Security

Since this is an Internal system, remember to go into the Azure Portal (or add to Terraform) a Role Assignment. You must give the Search Service’s Managed Identity the role of Storage Blob Data Reader on your ADLS account so it can “read” the files to index them.

IV. Terraform code to add the LLM to this stack as well

To finish the “Brain” of your RAG system, we will add Azure OpenAI. This provides the Embedding Model (to turn text into math/vectors) and the GPT Model (to actually talk to your employees).

In 2026, the best practice is to deploy these as “Account Deployments” within a single Azure OpenAI resource.


7. Azure OpenAI Resource & Models

Add this to your main.tf. This script creates the AI service and deploys two specific models:

  1. text-embedding-3-large: For high-accuracy vector search.
  2. gpt-4o: The “Omni” model for fast, reasoning-based chat.

Terraform

# 7. Azure OpenAI Service
resource "azurerm_cognitive_account" "openai" {
name = "oai-internal-rag-brain"
location = "East US" # Check availability for gpt-4o in your region
resource_group_name = azurerm_resource_group.main.name
kind = "OpenAI"
sku_name = "S0"
identity {
type = "SystemAssigned"
}
}
# Deploy the Embedding Model (The "Encoder")
resource "azurerm_cognitive_deployment" "embedding" {
name = "text-embedding-3-large"
cognitive_account_id = azurerm_cognitive_account.openai.id
model {
format = "OpenAI"
name = "text-embedding-3-large"
version = "1"
}
sku {
name = "Standard"
capacity = 50 # Tokens per minute (adjust based on need)
}
}
# Deploy the Chat Model (The "Talker")
resource "azurerm_cognitive_deployment" "gpt4" {
name = "gpt-4o"
cognitive_account_id = azurerm_cognitive_account.openai.id
model {
format = "OpenAI"
name = "gpt-4o"
version = "2024-05-13"
}
sku {
name = "Standard"
capacity = 30
}
}
# Output the OpenAI Endpoint
output "openai_endpoint" {
value = azurerm_cognitive_account.openai.endpoint
}

The Completed 2026 RAG Infrastructure Map

Now that your Terraform is complete, here is how the data flows through the resources you just built:

  1. Ingest: ADF detects a new PDF in ADLS Gen2 (Bronze).
  2. Process: Databricks chunks the text and uses the OpenAI Embedding Model to vectorize it.
  3. Index: The vectors and text are pushed into Azure AI Search.
  4. Query: An employee asks a question in your Chatbot UI.
  5. Retrieve: Azure AI Search finds the relevant chunks.
  6. Augment/Generate: Your app sends the chunks + the question to GPT-4o, which gives the final answer.

One Final Security Step (The “Handshake”)

For an Internal system, you must ensure Azure AI Search has permission to talk to Azure OpenAI to handle “Integrated Vectorization.”

You can add this Terraform block to grant that permission automatically:

Terraform

# Give AI Search permission to use OpenAI
resource "azurerm_role_assignment" "search_to_openai" {
scope = azurerm_cognitive_account.openai.id
role_definition_name = "Cognitive Services OpenAI User"
principal_id = azurerm_search_service.search.identity[0].principal_id
}

Final Checklist for Deployment:

  • [ ] Region: Ensure all services (especially OpenAI and AI Search) are in the same region to minimize latency and data egress costs.
  • [ ] Quotas: Check your Azure subscription for “Cognitive Services” quotas; sometimes new subscriptions have a limit on GPT-4o capacity.
  • [ ] Cost: Remember to turn off your Databricks clusters when not in use—they are the most expensive part of this stack!

V. Infrastructure is defined, the Python/Spark code you’ll need inside Databricks to start processing those internal documents

To get your internal RAG system running, you need a Databricks Notebook that acts as the bridge between your documents in ADLS and the searchable index in Azure AI Search.

In 2026, the most efficient way to do this is using Spark for parallel processing and the langchain-azure-ai library for the AI logic.


1. Library Installation

Run this in the first cell of your Databricks notebook to install the 2026-standard libraries.

Python

%pip install -U langchain-azure-ai azure-search-documents azure-identity pypdf
dbutils.library.restartPython()

2. The Ingestion Script (PySpark)

This script performs the “Heavy Lifting”: it reads PDFs from your Bronze folder, chunks them, and sends them to the AI Search Index.

Python

import os
from pyspark.sql.functions import col, pandas_udf
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_azure_ai.embeddings import AzureOpenAIEmbeddings
from langchain_azure_ai.vectorstores import AzureAISearch
# --- Configuration ---
STORAGE_PATH = "abfss://bronze-raw@stdatalakeprod001.dfs.core.windows.net/hr-manuals/"
SEARCH_ENDPOINT = "https://ais-internal-rag-lookup.search.windows.net"
SEARCH_KEY = dbutils.secrets.get(scope="rag-scope", key="search-admin-key")
# 1. Load Data from ADLS
# Using Spark to list all PDF files in the lake
df = spark.read.format("binaryFile").option("pathGlobFilter", "*.pdf").load(STORAGE_PATH)
# 2. Extract and Chunk Text
# (Simplification: In prod, use 'spark-pdf' or 'Azure AI Document Intelligence')
def process_pdf(content):
import io
from pypdf import PdfReader
reader = PdfReader(io.BytesIO(content))
text = ""
for page in reader.pages:
text += page.extract_text()
# Split into 1000-character chunks with overlap for context
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
return splitter.split_text(text)
# 3. Create Embeddings & Push to Azure AI Search
embeddings = AzureOpenAIEmbeddings(
azure_deployment="text-embedding-3-large",
azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"]
)
vector_store = AzureAISearch(
threading=True,
azure_search_endpoint=SEARCH_ENDPOINT,
azure_search_key=SEARCH_KEY,
index_name="internal-docs-index",
embedding_function=embeddings.embed_query
)
# Convert PDF data to chunks and add to Vector Store
for row in df.collect():
chunks = process_pdf(row.content)
# Metadata helps with "Security Trimming" later
metadata = [{"source": row.path, "id": f"{row.path}_{i}"} for i in range(len(chunks))]
vector_store.add_texts(chunks, metadatas=metadata)
print("Indexing Complete!")

3. Key 2026 Features in this Code

  • RecursiveCharacterTextSplitter: This ensures that chunks don’t break in the middle of a word, which helps the LLM maintain “contextual integrity.”
  • text-embedding-3-large: We use the latest 2026 embedding model because it supports MRL (Matryoshka Representation Learning), allowing you to store smaller vectors without losing accuracy—saving you money on storage.
  • Managed Identity: In a real production setup, you would replace SEARCH_KEY with DefaultAzureCredential() so no keys are ever visible in the code.

4. How the Chatbot Queries this Data

Once that script runs, your “Search Index” is full. When a user asks a question, your backend app (likely a Python Web App or Azure Function) runs this simple logic:

Python

# The "Retriever"
query = "What is the policy on maternity leave?"
docs = vector_store.similarity_search(
query,
k=3,
search_type="hybrid" # 2026 standard: Vector + Keyword
)
# The "Generator"
# Pass 'docs' to GPT-4o to get the final human-friendly answer

Summary of your Pipeline

  1. ADF detects the file.
  2. Databricks (using the code above) reads the file.
  3. Databricks chunks it and calculates the math (embeddings).
  4. Azure AI Search stores the result.

Your infrastructure and code are now fully aligned, setting up the User Interface (like a Streamlit or React app) to actually talk to this bot

For the final piece of your internal RAG system, you need a user-friendly interface. In 2026, there are two primary paths: the “Pro-Code” way (custom and flexible) and the “Low-Code” way (fast and integrated).


Option 1: The “Low-Code” Way (Azure AI Foundry)

If you want a professional-grade internal portal in under 10 minutes, use the Azure AI Foundry (formerly AI Studio) “Deploy to Web App” feature.

  1. Go to Azure AI Foundry: Open your project.
  2. Open Playground: Select the “Chat” playground.
  3. Setup RAG: Under “Add your data,” select your Azure AI Search index.
  4. Deploy: Click the Deploy to button and select “as a web app”.
    • Azure will automatically provision an Azure App Service.
    • It creates a beautiful, ChatGPT-like interface that is pre-connected to your data.
    • Security: It integrates with Microsoft Entra ID out of the box, so only your employees can log in.

Option 2: The “Pro-Code” Way (Python + Streamlit)

If you want to customize the branding or add specific internal tools (like a “Feedback” button), Streamlit is the industry standard for 2026.

The app.py logic:

Python

import streamlit as st
from langchain_azure_ai.chat_models import AzureChatOpenAI
from langchain_azure_ai.vectorstores import AzureAISearch
from langchain_azure_ai.embeddings import AzureOpenAIEmbeddings
st.set_page_config(page_title="Internal HR Assistant", layout="wide")
st.title("🤖 Internal Knowledge Bot")
# 1. Initialize Connection
embeddings = AzureOpenAIEmbeddings(azure_deployment="text-embedding-3-large")
vector_store = AzureAISearch(
azure_search_endpoint=st.secrets["SEARCH_ENDPOINT"],
index_name="internal-docs-index",
embedding_function=embeddings.embed_query
)
llm = AzureChatOpenAI(azure_deployment="gpt-4o", temperature=0)
# 2. Chat Interface
if "messages" not in st.session_state:
st.session_state.messages = []
for message in st.session_state.messages:
with st.chat_message(message["role"]):
st.markdown(message["content"])
if prompt := st.chat_input("Ask me about company policy..."):
st.session_state.messages.append({"role": "user", "content": prompt})
# 3. The RAG Step: Retrieve relevant chunks
docs = vector_store.similarity_search(prompt, k=3)
context = "\n".join([d.page_content for d in docs])
# 4. The RAG Step: Generate Answer
full_prompt = f"Use the following context to answer: {context}\n\nQuestion: {prompt}"
response = llm.invoke(full_prompt).content
with st.chat_message("assistant"):
st.markdown(response)
with st.expander("View Sources"):
for d in docs:
st.write(f"- {d.metadata['source']}")
st.session_state.messages.append({"role": "assistant", "content": response})

3. Critical 2026 UX Features for Internal Bots

To make your bot actually useful for employees, ensure your UI includes:

  • Citations: Never show an answer without a link to the source document in ADLS. This builds trust and allows employees to verify the info.
  • “I Don’t Know” Guardrail: Explicitly tell the LLM in the system prompt: “If the answer is not in the provided documents, say you don’t know. Do not make up facts.”
  • Feedback Loop: Add a Thumbs Up/Down on every response. You can pipe this feedback back into ADF to log which documents are confusing the AI.

Summary of your Journey

You have successfully moved from raw data to a live AI assistant:

  1. Infrastructure: Deployed via Terraform.
  2. Pipeline: Data moved by ADF into ADLS.
  3. Processing: Cleaned and Vectorized in Databricks.
  4. Retrieval: Indexed and ranked in Azure AI Search.
  5. Interface: Served via App Service or Streamlit.

VII. how to monitor the costs of this stack so you don’t get a surprise bill at the end of the month

Building a RAG system in 2026 is an exercise in consumption-based engineering. Because every PDF page processed and every chat message sent has a price tag, monitoring isn’t just “good hygiene”—it’s a requirement to keep the project alive.

Here is how to monitor and optimize the costs of your specific ADF-ADLS-Databricks-AI Search stack.


1. The “Big Three” Cost Drivers

In your internal RAG architecture, these three will typically account for 90% of your bill:

ServiceThe Cost Driver2026 Pro-Tip
DatabricksCluster uptime (DBUs)Use Serverless Compute for job runs. It scales to zero the second the processing finishes.
AI SearchSearch Units (SUs)Start with the Basic tier. Don’t move to Standard until your document count exceeds 15GB or 1 million chunks.
Azure OpenAIToken ConsumptionUse gpt-4o-mini for simple summarization and only use gpt-4o for complex reasoning to save up to 80% on tokens.

2. Setting Up “Hard” Guardrails (Azure Budgets)

Don’t wait for the monthly invoice. Set up an automated kill-switch.

  1. Create a Resource Group Budget: Put all your RAG resources (ADF, ADLS, etc.) in one Resource Group.
  2. Set Thresholds: * 50%: Send an email to the team.
    • 90%: Send a high-priority alert to the Manager.
    • 100% (The Nuclear Option): In 2026, you can trigger an Azure Automation Runbook that programmatically disables the Azure OpenAI API keys, instantly stopping further spending.

3. Optimization Checklist by Service

Azure Data Factory (ADF)

  • Data Integration Units (DIUs): When copying files from SharePoint/On-prem to ADLS, ADF defaults to 4 DIUs. For small internal docs, manually set this to 2 to halve the copy cost.
  • Avoid Over-Polling: Set your triggers to “Tumbling Window” or “Storage Event” rather than “Schedule” (e.g., checking every 1 minute) to reduce trigger run costs.

Azure Databricks

  • Auto-Termination: Ensure your clusters are set to terminate after 10 minutes of inactivity.
  • Photon Engine: Turn on the Photon query engine. While it costs slightly more per hour, it processes data so much faster that the total cost of the job is usually lower.

Azure Data Lake (ADLS)

  • Lifecycle Management: Set a policy to move files from Hot to Cold storage if they haven’t been accessed in 30 days. Your “Raw/Bronze” data almost never needs to be in the Hot tier.

Azure AI Search

  • Image Cracking: If your PDFs contain images, “Document Cracking” costs extra ($1 per 1,000 images). If you don’t need to “read” charts or photos, disable image extraction in your indexer settings.

4. 2026 “FinOps for AI” Dashboard

The most effective way to stay under budget is to build a Cost Visibility Dashboard.

  • Tagging: Ensure every resource in your Terraform code has a Project: Internal-RAG and Department: HR tag.
  • Cost Analysis: Use the Azure Cost Management tool to group by “Tag.” This allows you to show your stakeholders exactly how much the “HR Bot” is costing per department.

Summary

If you are just starting, your daily “Idle” cost (with everything turned on but no one using it) should be roughly $5–$10/day. If it’s higher, check if a Databricks cluster was left running or if you deployed a “Standard” tier AI Search unnecessarily.

Since we’ve covered the infrastructure, the code, the UI, and the costs, is there any specific part of the internal security (like preventing certain employees from seeing certain files) you’d like to dive into?

Azure Data Lake Storage (ADLS)

If ADF is the plumbing and Databricks is the engine, Azure Data Lake Storage (ADLS) Gen2 is the actual physical warehouse where everything is kept.

In 2026, it remains the standard for “Big Data” because it combines the cheap, limitless nature of Cloud Object Storage with the high-speed organization of a File System.


1. The Secret Sauce: Hierarchical Namespace (HNS)

Standard cloud storage (like Azure Blob or Amazon S3) is “flat.” If you have a file at /logs/2026/March/data.csv, the computer sees that whole string as one long name. To move a folder, it has to copy every single file inside it.

With ADLS Gen2, folders are “real” (Hierarchical Namespace).

  • Rename/Move: Renaming a folder with 10 million files is instantaneous because it just changes one reference, not 10 million files.
  • Performance: When a tool like Databricks or Spark asks for “all files in the March folder,” ADLS knows exactly where they are without searching through the entire lake.

2. The Storage Tiers (Cost Savings)

You don’t pay the same price for all data. ADLS allows you to move data between “Tiers” automatically based on how often you touch it:

  • Hot Tier: Highest cost to store, lowest cost to access. Use this for data you are actively processing in your RAG pipeline today.
  • Cool/Cold Tier: Lower storage cost, but you pay a fee to read it. Great for data from last month.
  • Archive Tier: Dirt cheap (pennies per GB). The data is “offline”—it can take a few hours to “rehydrate” it so you can read it again. Perfect for legal compliance backups.

3. Security (ACLs vs. RBAC)

For your Internal RAG system, this is the most important part of ADLS. It uses two layers of security:

  1. RBAC (Role-Based Access Control): Broad permissions (e.g., “John is a Storage Contributor”).
  2. ACLs (Access Control Lists): POSIX-style permissions. You can say “John can see the ‘Public’ folder, but only HR can see the ‘Salaries’ folder.” 2026 Update: Azure AI Search now “respects” these ACLs. If you index files from ADLS, the search results will automatically hide files that the logged-in user doesn’t have permission to see in the Data Lake.

4. ADLS Gen2 vs. Microsoft Fabric OneLake

You might hear about OneLake (the “OneDrive for data”). Here is how to tell them apart in 2026:

  • ADLS Gen2: The “Infrastructure” choice. You have full control over networking, encryption keys, and regions. Best for custom data engineering and Databricks heavy-lifters.
  • OneLake: The “SaaS” choice. It is actually built on top of ADLS, but it manages the folders and permissions for you automatically within Microsoft Fabric.

Summary Checklist

  • Format: Use Delta or Parquet for your “Silver” and “Gold” layers. These are compressed and optimized for AI and BI.
  • Structure: Always follow the Bronze -> Silver -> Gold folder structure to keep your lake from becoming a “data swamp.”
  • Access: Use Managed Identities so ADF and Databricks can talk to ADLS without you ever having to copy-paste a password or a secret key.

Are you planning to manage the ADLS folders yourself, or is your company moving toward a managed environment like Microsoft Fabric?

Azure Databricks

In 2026, Azure Databricks is much more than just a “data processing tool.” It is now positioned as a Data Intelligence Platform. While it’s still based on Apache Spark, it has evolved to use AI to help you manage your data, write your code, and govern your security.

Think of it as the high-performance engine of your data factory.


1. The Core Technology: Spark + Delta Lake

At its heart, Databricks does two things exceptionally well:

  • Apache Spark: A distributed computing engine. If you have 100TB of data, Databricks breaks it into 1,000 tiny pieces and processes them all at the same time across a “cluster” of computers.
  • Delta Lake: This is the storage layer that sits on top of your ADLS. it gives your “data lake” (files) the powers of a “database” (tables), allowing for things like Undo (Time Travel) and ACID transactions (ensuring data isn’t corrupted if a write fails).

2. New in 2026: The “Intelligence” Layer

The biggest shift recently is that Databricks now uses AI to run its own infrastructure:

  • Genie Code (formerly Databricks Assistant): An agentic AI built into the notebooks. You can type “Clean this table and create a vector index for my RAG bot,” and it will write and execute the Spark code for you.
  • Serverless Compute: You no longer need to “size” clusters (deciding how many CPUs/RAM). You just run your code, and Databricks instantly scales the hardware up or down, charging you only for the seconds the code is running.
  • Liquid Clustering: In the past, data engineers had to manually “partition” data to keep it fast. Now, Databricks uses AI to automatically reorganize data based on how you query it, making searches up to 12x faster.

3. How it fits your RAG System

For your internal chatbot, Databricks is the “Processor” that prepares your data for Azure AI Search:

  1. Parsing: It opens your internal PDFs/Word docs from ADLS.
  2. Chunking: It breaks the text into logical paragraphs.
  3. Embedding: It calls an LLM (like OpenAI) to turn those paragraphs into Vectors.
  4. Syncing: It pushes those vectors into your Search Index.

4. Databricks vs. The Competition (2026)

FeatureAzure DatabricksMicrosoft FabricAzure SQL
Best ForHeavy Data Engineering & AIBusiness Intelligence (BI)App Backend / Small Data
LanguagePython, SQL, Scala, RMostly SQL & Low-CodeSQL
Philosophy“Open” (Files in your ADLS)“SaaS” (Everything managed)“Relational” (Strict tables)
PowerUnlimited (Petabyte scale)High (Enterprise scale)Medium (GB to low TB)

5. Unity Catalog (The “Traffic Cop”)

In an internal setting, Unity Catalog is the most important part of Databricks. It provides a single place to manage permissions. If you grant a user access to a table in Databricks, those permissions follow the data even if it’s moved or mirrored into other services like Power BI or Microsoft Fabric.

Summary

  • Use ADF to move the data.
  • Use ADLS to store the data.
  • Use Databricks to do the “heavy thinking,” cleaning, and AI vectorization.
  • Use Azure SQL / AI Search to give the data to your users/bot.

Azure data ecosystem

In the Azure data ecosystem, these four services form the “Modern Data Stack.” They work together to move, store, process, and serve data. If you think of your data as water, this ecosystem is the plumbing, the reservoir, the filtration plant, and the tap.


1. ADLS Gen2 (The Reservoir)

Azure Data Lake Storage Gen2 is the foundation. It is a highly scalable, cost-effective storage space where you keep all your data—structured (tables), semi-structured (JSON/Logs), and unstructured (PDFs/Images).

  • Role: The single source of truth (Data Lake).
  • Key Feature: Hierarchical Namespace. Unlike standard “flat” cloud storage, it allows for folders and subfolders, which makes data access much faster for big data analytics.
  • 2026 Context: It serves as the “Bronze” (Raw) and “Silver” (Filtered) layers in a Medallion Architecture.

2. ADF (The Plumbing & Orchestrator)

Azure Data Factory is the glue. It doesn’t “own” the data; it moves it from point A to point B and tells other services when to start working.

  • Role: ETL/ELT Orchestration. It pulls data from on-premises servers or APIs and drops it into ADLS.
  • Key Feature: Low-code UI. You build “Pipelines” using a drag-and-drop interface.
  • Integration: It often has a “trigger” that tells Databricks: “I just finished moving the raw files to ADLS, now go clean them.”

3. Azure Databricks (The Filtration Plant)

Azure Databricks is where the heavy lifting happens. It is an Apache Spark-based platform used for massive-scale data processing, data science, and machine learning.

  • Role: Transformation & Analytics. It takes the messy data from ADLS and turns it into clean, aggregated “Gold” data.
  • Key Feature: Notebooks. Engineers write code (Python, SQL, Scala) in a collaborative environment.
  • 2026 Context: It is the primary engine for Vectorization in RAG systems—turning your internal documents into mathematical vectors for AI Search.

4. Azure SQL (The Tap)

Azure SQL Database (or Azure Synapse) is the final destination for business users. While ADLS is great for “big data,” it’s not the best for a quick dashboard or a mobile app.

  • Role: Data Serving. It stores the final, “Gold” level data that has been cleaned and structured.
  • Key Feature: High Performance for Queries. It is optimized for Power BI reports and standard business applications.
  • Usage: After Databricks cleans the data, it saves the final results into Azure SQL so the CEO can see a dashboard the next morning.

How they work together (The Flow)

StepServiceAction
1. IngestADFCopies logs from an on-prem server to the cloud.
2. StoreADLSHolds the raw .csv files in a “Raw” folder.
3. ProcessDatabricksReads the .csv, removes duplicates, and calculates monthly totals.
4. ServeAzure SQLThe cleaned totals are loaded into a SQL table.
5. VisualizePower BIConnects to Azure SQL to show a “Sales Revenue” chart.

Summary Table

ServicePrimary Skill NeededBest For…
ADFLogic / Drag-and-DropMoving data & scheduling tasks.
ADLSFolder OrganizationStoring massive amounts of any data type.
DatabricksPython / SQL / SparkComplex math, AI, and cleaning big data.
Azure SQLStandard SQLPowering apps and BI dashboards.

To explain the pipeline between these four, we use the Medallion Architecture. This is the industry-standard way to move data from a “raw” state to an “AI-ready” or “Business-ready” state.


Phase 1: Ingestion (The “Collector”)

  • Services: ADF + ADLS Gen2 (Bronze Folder)
  • The Action: ADF acts as the trigger. It connects to your external source (like an internal SAP system, a REST API, or a local SQL Server).
  • The Result: ADF “copies” the data exactly as it is—warts and all—into the Bronze container of your ADLS.
  • Why? You always keep a raw copy. If your logic fails later, you don’t have to go back to the source; you just restart from the Bronze folder.

Phase 2: Transformation (The “Refinery”)

  • Services: Databricks + ADLS Gen2 (Silver Folder)
  • The Action: ADF sends a signal to Databricks to start a “Job.” Databricks opens the raw files from the Bronze folder.
    • It filters out null values.
    • It fixes date formats (e.g., changing 01-03-26 to 2026-03-01).
    • It joins tables together.
  • The Result: Databricks writes this “clean” data into the Silver container of your ADLS, usually in Delta format (a high-performance version of Parquet).

Phase 3: Aggregation & Logic (The “Chef”)

  • Services: Databricks + ADLS Gen2 (Gold Folder)
  • The Action: Databricks runs a second set of logic. Instead of just cleaning data, it calculates things. It creates “Gold” tables like Monthly_Sales_Summary or Employee_Vector_Embeddings.
  • The Result: These high-value tables are stored in the Gold container. This data is now perfect.

Phase 4: Serving (The “Storefront”)

  • Services: Azure SQL
  • The Action: ADF runs one final “Copy Activity.” it takes the small, aggregated tables from the Gold folder in ADLS and pushes them into Azure SQL Database.
  • The Result: Your internal dashboard (Power BI) or your Chatbot’s metadata storage connects to Azure SQL. Because the data is already cleaned and summarized, the dashboard loads instantly.

The Complete Workflow Summary

StageData StateTool in ChargeWhere it Sits
IngestRaw / MessyADFADLS (Bronze)
CleanFiltered / StandardizedDatabricksADLS (Silver)
ComputeAggregated / Business LogicDatabricksADLS (Gold)
ServeFinal Tables / Ready for UIADFAzure SQL

How this connects to your RAG Chatbot:

In your specific case, Databricks is the MVP. It reads the internal PDFs from the Silver folder, uses an AI model to turn the text into Vectors, and then you can either store those vectors in Azure SQL (if they are small) or send them straight to Azure AI Search.

Azure AI Search

Azure AI Search (formerly known as Azure Cognitive Search) is a high-performance, “search-as-a-service” platform designed to help developers build rich search experiences over private, heterogeneous content.

In the era of Generative AI, it has become the industry standard for Retrieval-Augmented Generation (RAG), serving as the “knowledge base” that feeds relevant information to Large Language Models (LLMs) like GPT-4.


1. How It Works: The High-Level Flow

Azure AI Search acts as a middle layer between your raw data and your end-user application.

  1. Ingestion: It pulls data from sources like ADLS, Azure SQL, or Cosmos DB using “Indexers.”
  2. Enrichment (Cognitive Skills): During ingestion, it can use AI to “crack” documents—extracting text from images (OCR), detecting languages, or identifying key phrases.
  3. Indexing: It organizes this data into a highly optimized, searchable “Index.”
  4. Serving: Your app sends a query to the index and gets back ranked, relevant results.

2. Three Ways to Search

The real power of Azure AI Search is that it doesn’t just look for exact word matches; it understands intent.

Search TypeHow it WorksBest For…
Keyword (BM25)Traditional text matching. Matches “Apple” to “Apple.”Exact terms, serial numbers, product names.
Vector SearchUses mathematical “embeddings” to find conceptually similar items.“Frigid weather” matching “cold temperatures.”
Hybrid SearchThe Gold Standard. Runs Keyword and Vector search simultaneously and merges them.Providing the most accurate, context-aware results.

Pro Tip: Azure AI Search uses Semantic Ranking, which uses a secondary Llama-style model to re-rank the top results, ensuring the absolute best answer is at the very top.


3. Key Components

To set this up, you’ll interact with four main objects:

  • Data Source: The connection to your data (e.g., an Azure Blob Storage container).
  • Skillset: An optional set of AI steps (like “Translate” or “Chunking”) applied during indexing.
  • Index: The physical schema (the “table”) where the searchable data lives.
  • Indexer: The “engine” that runs on a schedule to keep the Index synced with the Data Source.

4. The “RAG” Connection

If you are building a chatbot, Azure AI Search is your Retriever.

  1. The user asks: “What is our policy on remote work?”
  2. Your app sends that question to Azure AI Search.
  3. The Search service finds the 3 most relevant paragraphs from your 500-page HR manual.
  4. Your app sends those 3 paragraphs to Azure OpenAI to summarize into a natural answer.

5. Why use it over a standard Database?

While SQL or Cosmos DB can do “searches,” Azure AI Search is specialized for:

  • Faceted Navigation: Those “Filter by Price” or “Filter by Category” sidebars you see on Amazon.
  • Synonyms: Knowing that “cell phone” and “mobile” mean the same thing.
  • Language Support: It handles word stemming and lemmatization for 50+ languages.
  • Scaling: It can handle millions of documents and thousands of queries per second without slowing down your primary database.

RAG (Retrieval-Augmented Generation)

To build a RAG (Retrieval-Augmented Generation) system using Azure Data Factory (ADF), Azure Data Lake Storage (ADLS), and Azure AI Search, you are essentially creating a two-part machine: a Data Ingestion Pipeline (The “Factory”) and a Search & LLM Orchestrator (The “Brain”).

Here is the modern 2026 blueprint for setting this up.


1. The High-Level Architecture

  1. ADLS Gen2: Acts as your “Landing Zone” for raw documents (PDFs, Office docs, JSON).
  2. ADF: Orchestrates the movement of data and triggers the “cracking” (parsing) of documents.
  3. Azure AI Search: Stores the Vector Index. It breaks documents into chunks, turns them into math (embeddings), and stores them for retrieval.
  4. Azure OpenAI / AI Studio: The LLM that reads the retrieved chunks and answers the user.

2. Step 1: The Ingestion Pipeline (ADF + ADLS)

You don’t want to manually upload files. ADF automates the flow.

  • The Trigger: Set up a Storage Event Trigger in ADF. When a new PDF is dropped into your ADLS raw-data container, the pipeline starts.
  • The Activity: Use a Copy Activity or a Web Activity.
    • Modern Approach: In 2026, the most efficient way is to use the Azure AI Search “Indexer.” You don’t necessarily need to “move” the data with ADF; instead, use ADF to tell Azure AI Search: “Hey, new data just arrived in ADLS, go index it now.”
  • ADF Pipeline Logic: 1. Wait for file in ADLS.2. (Optional) Use an Azure Function or AI Skillset to pre-process (e.g., stripping headers/footers).3. Call the Azure AI Search REST API to Run Indexer.

3. Step 2: The “Smart” Indexing (Azure AI Search)

This is where your data becomes “AI-ready.” Inside Azure AI Search, you must configure:

  • Crack & Chunk: Don’t index a 100-page PDF as one block. Use the Markdown/Text Splitter skill to break it into chunks (e.g., 500 tokens each).
  • Vectorization: Add an Embedding Skill. This automatically sends your text chunks to an embedding model (like text-embedding-3-large) and saves the resulting vector in the index.
  • Knowledge Base (New for 2026): Use the Agentic Retrieval feature. This allows the search service to handle “multi-step” queries (e.g., “Compare the 2025 and 2026 health plans”) by automatically breaking them into sub-queries.

4. Step 3: The Chatbot Logic (The RAG Loop)

When a user asks a question, your chatbot follows this “Search -> Ground -> Answer” flow:

StepAction
1. User Query“What is our policy on remote work?”
2. SearchApp sends query to Azure AI Search using Hybrid Search (Keyword + Vector).
3. RetrieveSearch returns the top 3-5 most relevant “chunks” of text.
4. AugmentYou create a prompt: “Answer the user based ONLY on this context: [Chunks]”
5. GenerateAzure OpenAI generates a natural language response.

5. Key 2026 Features to Use

  • Semantic Ranker: Always turn this on. It uses a high-powered model to re-sort your search results, ensuring the “Best” answer is actually #1 before it goes to the LLM.
  • Integrated Vectorization: In the past, you had to write custom Python code to create vectors. Now, Azure AI Search handles this internally via Integrated Vectorization—you just point it at your Azure OpenAI resource.
  • OneLake Integration: If you are using Microsoft Fabric, you can now link OneLake directly to AI Search without any ETL pipelines at all.

Why use ADF instead of just uploading to Search?

  • Cleanup: You can use ADF to remove PII (Personal Identifiable Information) before it ever hits the AI Search index.
  • Orchestration: If your data comes from 10 different SQL databases and 50 SharePoint folders, ADF is the only way to centralize it into the Data Lake for indexing.

Building a data pipeline in Azure

Building a data pipeline in Azure using Azure Data Factory (ADF) and Azure Data Lake Storage (ADLS) is the “bread and butter” of modern cloud data engineering. Think of ADLS as your massive digital warehouse and ADF as the conveyor belts and robotic arms moving things around.

Here is the high-level workflow and the steps to get it running.


1. The Architecture

In a typical scenario, you move data from a source (like an on-premises SQL DB or an API) into ADLS, then process it.

Key Components:

  • Linked Services: Your “Connection Strings.” These store the credentials to talk to ADLS or your source.
  • Datasets: These point to specific folders or files within your Linked Service.
  • Pipelines: The logical grouping of activities (the workflow).
  • Activities: The individual actions (e.g., Copy Data, Databricks Notebook, Lookup).

2. Step-by-Step Implementation

Step 1: Set up the Storage (ADLS Gen2)

  1. In the Azure Portal, create a Storage Account.
  2. Crucial: Under the “Advanced” tab, ensure Hierarchical Namespace is enabled. This turns standard Blob storage into ADLS Gen2.
  3. Create a Container (e.g., raw-data).

Step 2: Create the Linked Service in ADF

  1. Open Azure Data Factory Studio.
  2. Go to the Manage tab (toolbox icon) > Linked Services > New.
  3. Search for Azure Data Lake Storage Gen2.
  4. Select your subscription and the storage account you created. Test the connection and click Create.

Step 3: Define your Datasets

You need a “Source” dataset (where data comes from) and a “Sink” dataset (where data goes).

  1. Go to the Author tab (pencil icon) > Datasets > New Dataset.
  2. Select Azure Data Lake Storage Gen2.
  3. Choose the format (Parquet and Delimited Text/CSV are most common).
  4. Point it to the specific file path in your ADLS container.

Step 4: Build the Pipeline

  1. In the Author tab, click the + icon > Pipeline.
  2. From the Activities menu, drag and drop the Copy Data activity onto the canvas.
  3. Source Tab: Select your source dataset.
  4. Sink Tab: Select your ADLS dataset.
  5. Mapping Tab: Click “Import Schemas” to ensure the columns align correctly.

3. Best Practices for ADLS Pipelines

  • Folder Structure: Use a “Medallion Architecture” (Bronze/Raw, Silver/Cleaned, Gold/Aggregated) within your ADLS containers to keep data organized.
  • Triggering: Don’t just run things manually. Use Schedule Triggers (time-based) or Storage Event Triggers (runs automatically when a file drops into ADLS).
  • Parameters: Avoid hardcoding file names. Use Parameters and Dynamic Content so one pipeline can handle multiple different files.

4. Example Formula for Dynamic Paths

If you want to organize your data by date automatically in ADLS, you can use a dynamic expression in the dataset path:

$$dataset().Directory = concat(‘raw/’, formatDateTime(utcNow(), ‘yyyy/MM/dd’))$$

This ensures that every time the pipeline runs, it creates a new folder for that day’s data.

Ingress and API Gateways

In the world of Kubernetes and OpenShift, both Ingress and API Gateways serve as the entry point for external traffic. While they overlap in functionality, they operate at different levels of the networking stack and offer different “intelligence” regarding how they handle requests.

Think of Ingress as a simple receptionist directing people to the right room, while an API Gateway is a concierge who also checks IDs, translates languages, and limits how many people enter at once.


1. What is Ingress?

Ingress is a native Kubernetes resource (Layer 7) that manages external access to services, typically HTTP and HTTPS.

  • Primary Job: Simple routing based on the URL path (e.g., /api) or the hostname (e.g., app.example.com).
  • Implementation: In OCP, this is usually handled by the OpenShift Ingress Controller (based on HAProxy) using Routes.
  • Pros: Lightweight, standard across Kubernetes, and built-in.
  • Cons: Limited “logic.” It’s hard to do complex things like rate limiting, authentication, or request transformation without custom annotations.

2. What is an API Gateway?

An API Gateway is a more sophisticated proxy that sits in front of your microservices to provide “cross-cutting concerns.”

  • Primary Job: API Management. It handles security, monitoring, and orchestration.
  • Key Features:
    • Authentication/Authorization: Validating JWT tokens or API keys before the request hits the service.
    • Rate Limiting: Ensuring one user doesn’t spam your backend.
    • Payload Transformation: Changing a XML request to JSON for a modern backend.
    • Circuit Breaking: Stopping traffic to a failing service to prevent a total system crash.
  • Examples: Kong, Tyk, Apigee, or the Red Hat 3scale API Management platform.

Key Comparison Table

FeatureIngress / RouteAPI Gateway
OSI LayerLayer 7 (HTTP/S)Layer 7 + Application Logic
Main GoalExpose services to the internetProtect and manage APIs
ComplexityLowHigh
SecurityBasic SSL/TLS terminationJWT, OAuth, mTLS, IP Whitelisting
Traffic ControlSimple Load BalancingRate Limiting, Quotas, Retries
CostUsually free (built into OCP)Often requires licensing or extra infra

When to use which?

  • Use Ingress/Routes when: You have a web application and just need to point a domain name to a service. It’s the “plumbing” of the cluster.
  • Use an API Gateway when: You are exposing APIs to third parties, need strict usage tracking (monetization), or want to centralize security logic so your developers don’t have to write auth code for every single microservice.

The “Modern” Middle Ground: Gateway API

There is a newer Kubernetes standard called the Gateway API. It is designed to replace Ingress by providing the power of an API Gateway (like header-based routing and traffic splitting) while remaining a standard part of the Kubernetes ecosystem. In OpenShift, you can enable the Gateway API through the Operator.

To help you see the evolution, here is how the “old” standard (Ingress) compares to the “new” standard (Gateway API).

1. The Traditional Ingress

Ingress is a single, “flat” resource. It’s simple but limited because the person who owns the app (the developer) and the person who owns the network (the admin) have to share the same file.

YAML

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: my-app-ingress
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
spec:
rules:
- host: app.example.com
http:
paths:
- path: /api
pathType: Prefix
backend:
service:
name: api-service
port:
number: 80
  • The Problem: If you want to do something fancy like a “Canary deployment” (sending 10% of traffic to a new version), you usually have to use messy, vendor-specific annotations.

2. The Modern Gateway API

The Gateway API breaks the configuration into pieces. This allows the Cluster Admin to define the entry point (the Gateway) and the Developer to define how their specific app is reached (the HTTPRoute).

The Admin’s Part (The Infrastructure):

YAML

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: external-gateway
spec:
gatewayClassName: openshift-default
listeners:
- name: http
protocol: HTTP
port: 80

The Developer’s Part (The Logic & Traffic Splitting):

YAML

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: my-app-route
spec:
parentRefs:
- name: external-gateway
hostnames:
- "app.example.com"
rules:
- matches:
- path: { type: PathPrefix, value: /api }
backendRefs:
- name: api-v1
port: 80
weight: 90 # 90% of traffic here
- name: api-v2
port: 80
weight: 10 # 10% of traffic to the new version!

Summary of Differences

FeatureIngressGateway API
StructureMonolithic (One file for everything)Role-based (Separated for Admin vs Dev)
Traffic SplittingRequires non-standard annotationsBuilt-in (Weights/Canary)
ExtensibilityLimitedHigh (Supports TCP, UDP, TLS, GRPC)
PortabilityHigh (but annotations are not)Very High (Standardized across vendors)

Why OpenShift is moving this way

OpenShift 4.12+ fully supports the Gateway API because it solves the “annotation hell” that occurred when users tried to make basic Ingress act like a full API Gateway. It gives you the power of a professional Gateway (like Kong or Istio) but stays within the native Kubernetes language.

In OpenShift 4.15 and later (reaching General Availability in 4.19), the Gateway API is managed by the Cluster Ingress Operator. Unlike standard Kubernetes where you might have to install many CRDs manually, OpenShift streamlines this by bundling the controller logic into its existing operators.

Here is the step-by-step process to enable and use it.


1. Enable the Gateway API CRDs

In newer versions of OCP, the CRDs are often present but “dormant” until a GatewayClass is created. The Ingress Operator watches for a specific controllerName to trigger the installation of the underlying proxy (which is Istio/Envoy in the Red Hat implementation).

Create the GatewayClass:

YAML

apiVersion: gateway.networking.k8s.io/v1
kind: GatewayClass
metadata:
name: openshift-default
spec:
controllerName: openshift.io/gateway-controller/v1

What happens next? The Ingress Operator will automatically detect this and start a deployment called istiod-openshift-gateway in the openshift-ingress namespace.


2. Set up a Wildcard Certificate (Required)

Unlike standard Routes, the Gateway API in OCP does not automatically generate a default certificate. You need to provide a TLS secret in the openshift-ingress namespace.

Bash

# Example: Creating a self-signed wildcard for testing
oc -n openshift-ingress create secret tls gwapi-wildcard \
--cert=wildcard.crt --key=wildcard.key

3. Deploy the Gateway

The Gateway represents the actual “entry point” or load balancer.

YAML

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: my-gateway
namespace: openshift-ingress
spec:
gatewayClassName: openshift-default
listeners:
- name: https
protocol: HTTPS
port: 443
hostname: "*.apps.mycluster.com"
tls:
mode: Terminate
certificateRefs:
- name: gwapi-wildcard

4. Create an HTTPRoute (Developer Task)

Now that the “door” (Gateway) is open, a developer in a different namespace can “attach” their application to it.

YAML

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: my-app-route
namespace: my-app-project
spec:
parentRefs:
- name: my-gateway
namespace: openshift-ingress
hostnames:
- "myapp.apps.mycluster.com"
rules:
- backendRefs:
- name: my-app-service
port: 8080

Summary Checklist for the Interview

If you are asked how to set this up in an interview, remember these four pillars:

  1. Operator-Led: It’s managed by the Ingress Operator; no separate “Gateway Operator” is needed for the default Red Hat implementation.
  2. Implementation: OpenShift uses Envoy (via a lightweight Istio control plane) as the engine behind the Gateway API.
  3. Namespace: The Gateway object itself almost always lives in openshift-ingress.
  4. Service Type: Creating a Gateway usually triggers the creation of a Service type: LoadBalancer automatically.

Ingress vs Service mesh

Ingress and a service mesh solve different networking problems.

Ingress
Ingress is a Kubernetes API object for managing external access into the cluster, typically HTTP/HTTPS. It routes inbound requests based on hosts and paths to backend Services. Kubernetes now says Ingress is stable but frozen, and recommends the newer Gateway API for future development. (Kubernetes)

Service mesh
A service mesh is an infrastructure layer for service-to-service communication inside and around your app, adding things like traffic policy, observability, and zero-trust security without changing app code. In Istio, this includes traffic routing, retries, timeouts, fault injection, mTLS, authentication, and authorization. (Istio)

Practical difference
Think of it like this:

  • Ingress = the front door to your cluster
  • Service mesh = the road system and security checkpoints between services inside the cluster

Use Ingress when
You need:

  • a public endpoint for your app
  • host/path routing like api.example.com or /shop
  • TLS termination for incoming web traffic

That is the classic “internet → cluster → service” problem. (Kubernetes)

Use a service mesh when
You need:

  • service-to-service observability
  • mutual TLS between workloads
  • canary / weighted routing between versions
  • retries, timeouts, circuit breaking
  • policy and identity for east-west traffic
  • control over some outbound traffic too

Istio’s docs specifically describe percentage routing, version-aware routing, external service entries, retries, timeouts, and circuit breakers. (Istio)

Do they overlap?
A little. Both can influence traffic routing, but at different scopes:

  • Ingress mainly handles north-south traffic: outside users coming in
  • Service mesh mainly handles east-west traffic: service-to-service traffic inside the platform

A mesh can also handle ingress/egress via its own gateways, but that is a broader and heavier solution than plain Kubernetes Ingress. (Kubernetes)

Which should you choose?

  • For a simple web app exposing a few services: Ingress is usually enough.
  • For microservices that need security, tracing, traffic shaping, and resilience: service mesh is worth considering.
  • Many teams use both: one for external entry, one for internal communication.

One current note: for new Kubernetes edge-routing designs, Gateway API is the direction Kubernetes recommends over Ingress. (Kubernetes)

Here’s a concrete example.

Example app

Imagine an e-commerce app running on Kubernetes:

  • web-frontend
  • product-api
  • cart-api
  • checkout-api
  • payment-service
  • user-service

Customers come from the internet. The services call each other inside the cluster.

With Ingress only

Traffic flow:

Internet → Ingress controller → Kubernetes Service → Pods

Example:

  • shop.example.com goes to web-frontend
  • shop.example.com/api/* goes to product-api

What Ingress is doing here:

  • expose the app publicly
  • terminate TLS
  • route by host/path
  • maybe do some basic load balancing

So a request might go:

  1. User opens https://shop.example.com
  2. Ingress sends / to web-frontend
  3. web-frontend calls cart-api
  4. cart-api calls user-service
  5. checkout-api calls payment-service

The key point: Ingress mostly helps with step 1, the outside-in entry point. It does not, by itself, give you rich control/security/telemetry for steps 3–5. Ingress is for external access, and the Kubernetes project notes the API is stable but frozen, with Gateway API recommended for newer traffic-management work. (Kubernetes)

With Ingress + service mesh

Now add a mesh like Istio.

Traffic flow becomes:

Internet → Ingress/Gateway → web-frontend → mesh-controlled service-to-service traffic

Now you still have an entry point, but inside the cluster the mesh handles communication between services.

What the mesh adds:

  • mTLS between services
  • retries/timeouts
  • canary routing
  • traffic splitting
  • telemetry/tracing
  • authz policies between workloads

Example:

  • checkout-api sends 95% of traffic to payment-service v1 and 5% to payment-service v2
  • calls from cart-api to user-service get a 2-second timeout and one retry
  • only checkout-api is allowed to call payment-service
  • all service-to-service traffic is encrypted with mutual TLS

Those are standard service-mesh capabilities described in Istio’s traffic-management and security docs. (Istio)

Simple diagram

Ingress only

[User on Internet]
|
v
[Ingress]
|
v

[web-frontend]

| v [product-api] -> [cart-api] -> [checkout-api] -> [payment-service]

Ingress + service mesh

[User on Internet]
|
v
[Ingress / Gateway]
|
v

[web-frontend]

| v ————————————————- |

Service Mesh inside cluster | | [product-api] <-> [cart-api] <-> [checkout-api]| | \ | / | | \ v / | | ——> [user-service] | | [payment-service] | | | | mTLS, retries, tracing, canaries, policy | ————————————————-

Real-world way teams choose

Use just Ingress when:

  • you have a small app
  • you mostly need public routing
  • internal service communication is simple
  • you do not need per-service security/policy

Add a service mesh when:

  • you have many microservices
  • debugging internal calls is hard
  • you need zero-trust service identity
  • you do canaries/traffic shaping often
  • you want consistent retries/timeouts/policies

One important 2026 note

For brand-new Kubernetes edge-routing setups, many teams are moving toward Gateway API instead of classic Ingress. Kubernetes recommends Gateway over Ingress for future-facing work, and Istio also supports Gateway API for traffic management. (Kubernetes)

Rule of thumb

  • Ingress/Gateway API: “How does traffic get into my cluster?”
  • Service mesh: “How do services inside my platform talk securely and reliably?”

sidecar

Sidecar is a design pattern where a helper container runs alongside your main application container in the same pod (in Kubernetes) or host, sharing the same network and storage.


Think of it like this: Your main app does its job, and the sidecar handles cross-cutting concerns without you changing the app’s code.


Common sidecar use cases:

  • Logging — sidecar collects and ships logs (e.g., Fluentd)
  • Security/mTLS — sidecar handles encryption between services (e.g., Envoy in a service mesh like Istio)
  • Monitoring — sidecar scrapes and exposes metrics (e.g., Prometheus exporters)
  • Config sync — sidecar pulls updated configs from Vault or a config server

In Kubernetes:

pods:
containers:
- name: my-app # main container
- name: log-shipper # sidecar container

Both share the same pod, IP, and volumes.


Interview angle:

“I’ve worked with sidecars in Kubernetes — for example, using an Envoy proxy sidecar injected automatically by Istio to handle service-to-service encryption without touching application code.”