AKS

At its core, Azure Kubernetes Service (AKS) is Microsoft’s managed version of Kubernetes. It’s designed to take the “scary” parts of managing a container orchestration system—like setting up the brain of the cluster, patching servers, and handling scaling— and offload them to Azure so you can focus on your code.

Think of it as Kubernetes with a personal assistant.


1. How it Works (The Architecture)

AKS splits a cluster into two distinct parts:

  • The Control Plane (Managed by Azure): This is the “brain.” It manages the API server, the scheduler, and the cluster’s state. In AKS, Microsoft manages this for you for free (or for a small fee if you want a guaranteed Uptime SLA). You don’t have to worry about its health or security patching.
  • The Data Plane (Managed by You): These are the “worker nodes” (Virtual Machines) where your applications actually run. While you pay for these VMs, AKS makes it easy to add, remove, or update them with a single click or command.

2. Key Features (2026 Standards)

As of 2026, AKS has evolved into an “AI-ready” platform. Here are the standout features:

  • AKS Automatic: A newer “Zero-Ops” tier where Azure handles almost everything—node configuration, security hardening, and even choosing the right VM sizes based on your app’s needs.
  • Smart Scaling: It uses the Horizontal Pod Autoscaler (HPA) and Cluster Autoscaler to grow or shrink your infrastructure based on real-time traffic, saving you money during quiet hours.
  • AI & GPU Support: Native integration with the latest NVIDIA GPUs (like the NCv6 series) makes it a go-to for training LLMs or running AI inference.
  • Enterprise Security: It integrates directly with Microsoft Entra ID (formerly Azure AD), so you can control who accesses your cluster using the same identity they use for email.

3. Why Use It?

BenefitWhat it means for you
Reduced OverheadYou don’t have to be a Kubernetes expert to keep the cluster running.
Cost EfficiencyYou only pay for the worker nodes you use. Features like Spot VMs can cut costs by up to 80% for non-critical tasks.
Fast UpdatesUpgrading to a new Kubernetes version is a “push-button” experience rather than a manual nightmare.
Deep IntegrationIt plays perfectly with Azure Monitor, Azure Storage, and GitHub Actions for a smooth “code-to-cloud” pipeline.

4. Common Use Cases

  • Microservices: Breaking down a giant app into smaller, independent pieces.
  • AI/ML Workflows: Running data-heavy training jobs and deploying models.
  • DevOps Pipelines: Using containers to ensure that “it works on my machine” also means “it works in production.”

Pro Tip: If you’re just starting out in 2026, look into Azure Linux 3.0 as your node operating system. It’s a lightweight, secure OS built by Microsoft specifically for AKS that boots faster and has a smaller attack surface than traditional Linux distros.

Starting a brand-new project in 2026 is an exciting time for AKS because the platform has moved from “manual infrastructure” to “intelligent automation.”

Here is the recommended blueprint for launching a fresh project today.


1. The Big Choice: Automatic vs. Standard

The first decision you’ll face in the Azure Portal is which “tier” to use.

FeatureAKS Automatic (Recommended)AKS Standard
Philosophy“Just run my code.”“Give me all the knobs.”
ManagementAzure manages nodes, scaling, and security.You manage node pools and VM sizes.
Best ForNew startups, rapid dev, and “Zero-Ops” teams.Large enterprises with strict custom networking.
SecurityHardened by default (Azure Policy, Cilium).Configurable (you must set the guardrails).

Advice: For a brand-new project, start with AKS Automatic. It enforces modern best practices (like the Cilium network data plane) out of the box, saving you from “Day 2” configuration headaches.

Automatic Kubernetes Cluster manages these elements for you:

  • Networking and Security Azure CNI Overlay powered by Azure Cilium
  • Resource provisioning Automated node provisioning and scaling
  • On-demand scaling Optimal scaling tools like KEDA, HPA, and VPA
  • Kubernetes version upgrade Automatic updates for enhanced stability

2. Setting Up Your Foundation (The 2026 Stack)

When configuring your new cluster, stick to these current standards:

  • The OS: Choose Azure Linux 3.0. It’s Microsoft’s own cloud-optimized distro. It’s faster and more secure than Ubuntu because it contains only the bare essentials needed to run containers.
  • Networking: Use Azure CNI Overlay. It allows you to scale to thousands of Pods without burning through your private IP address space—a common pitfall in older projects.
  • Identity: Enable Microsoft Entra Workload ID. Never use “Service Principals” or hardcoded secrets to let your app talk to a database. Workload ID gives your Pod its own managed identity.

3. Integrating AI (KAITO)

If your new project involves AI (and most do in 2026), you’ll want to enable the Kubernetes AI Toolchain Operator (KAITO).

  • What it does: It simplifies running Large Language Models (LLMs) like Phi-4 or Falcon on your cluster.
  • Why use it: Instead of manually figuring out which GPU driver matches which model, KAITO automates the provisioning of GPU nodes and sets up the inference endpoint for you.

4. Cost Control for New Projects

Don’t let a “Hello World” app cost you hundreds.

  1. Use the “Dev/Test” Preset: This scales down redundancy to save money while you’re still building.
  2. Spot Instances: For background tasks or non-critical processing, use Spot node pools to get up to 80-90% off standard VM pricing.
  3. Vertical Pod Autoscaler (VPA): In 2026, the VPA is much smarter. It can watch your app and suggest (or automatically apply) the exact amount of CPU and RAM needed, so you aren’t paying for “idle” capacity.

5. Your First Steps

To get moving, I recommend the following workflow:

  1. Install the Azure CLI (ensure it’s version 2.82+ for 2026 features).
  2. Create an ACR (Azure Container Registry): This is where you’ll store your app images.
  3. Deploy via “Automated Deployments”: Use the “Automated Deployments” tab in the AKS portal. You point it at your GitHub repo, and it will generate the GitHub Actions and Kubernetes YAML files for you.

Building a microservices architecture on AKS is a “textbook” use case, but in 2026, the strategy has shifted from just “managing containers” to “managing a distributed system.”

Since each service in your project will have its own lifecycle, database, and scaling needs, here is how you should structure your new AKS environment.


1. The Connectivity Layer: Service Mesh

In a microservices setup, the network becomes your biggest challenge. You shouldn’t have your services talking to each other using hardcoded internal URLs.

  • The 2026 Choice: Istio-based Service Mesh add-on for AKS.
  • Why: It provides “Mutual TLS” (mTLS) by default, meaning all communication between your microservices is encrypted automatically. It also gives you a “Service Graph,” allowing you to visualize exactly how traffic flows from Service A to Service B.

2. Global Traffic Routing (Ingress)

You need a “front door” to route external users to the correct microservice (e.g., api.myapp.com/orders goes to the Order Service).

  • Application Gateway for Containers (ALB): This is the modern evolution of the standard Ingress Controller. It’s a managed service that sits outside your cluster, handling SSL termination and Web Application Firewall (WAF) duties so your worker nodes don’t have to waste CPU on security overhead.

3. Data Persistence & State

The golden rule of microservices is one database per service.

  • Don’t run DBs inside AKS: While you can run SQL or MongoDB as a container, it’s a headache to manage.
  • The 2026 Way: Use Azure Cosmos DB or Azure SQL and connect them to your microservices using Service Connector. Service Connector handles the networking and authentication (via Workload ID) automatically, so your code doesn’t need to store connection strings or passwords.

4. Microservices Design Pattern (Dapr)

For a brand-new project, I highly recommend using Dapr (Distributed Application Runtime), which is an integrated extension in AKS.

Dapr provides “building blocks” as sidecars to your code:

  • Pub/Sub: Easily send messages between services (e.g., the “Order” service tells the “Email” service to send a receipt).
  • State Management: A simple API to save data without writing complex database drivers.
  • Resiliency: Automatically handles retries if one microservice is temporarily down.

5. Observability (The “Where is the Bug?” Problem)

With 10+ microservices, finding an error is like finding a needle in a haystack. You need a unified view.

  • Managed Prometheus & Grafana: AKS has a “one-click” onboarding for these. Prometheus collects metrics (CPU/RAM/Request counts), and Grafana gives you the dashboard.
  • Application Insights: Use this for “Distributed Tracing.” It allows you to follow a single user’s request as it travels through five different microservices, showing you exactly where it slowed down or failed.

Summary Checklist for Your New Project

  1. Cluster: Create an AKS Automatic cluster with the Azure Linux 3.0 OS.
  2. Identity: Use Workload ID instead of secrets.
  3. Communication: Enable the Istio add-on and Dapr extension.
  4. Database: Use Cosmos DB for high-scale microservices.
  5. CI/CD: Use GitHub Actions with the “Draft” tool to generate your Dockerfiles and manifests automatically.

Azure Storage

Azure Storage is a highly durable, scalable, and secure cloud storage solution. In 2026, it has evolved significantly into an AI-ready foundational layer, optimized not just for simple files, but for the massive datasets required for training AI models and serving AI agents.

The platform is divided into several specialized “data services” depending on the type of data you are storing.


1. The Core Data Services

ServiceData TypeBest For
Blob StorageUnstructured (Objects)Images, videos, backups, and AI training data lakes.
Azure FilesFile Shares (SMB/NFS)Replacing on-premise file servers; “Lift and Shift” for legacy apps.
Azure DisksBlock StoragePersistent storage for Virtual Machines (OS and data disks).
Azure TablesNoSQL Key-ValueLarge scale, schema-less structured data (e.g., user profiles).
Azure QueuesMessagingReliable messaging between different parts of an application.

2. Modern Tiers (Cost vs. Speed)

You don’t pay the same price for data you use every second versus data you keep for 10 years. You choose an Access Tier to optimize your bill:

  • Premium: SSD-backed. Ultra-low latency for high-performance apps and AI inference.
  • Hot: For data you access frequently. Lower access cost, higher storage cost.
  • Cool (30 days): For data like short-term backups. Lower storage cost, higher access cost.
  • Cold (90 days): New in the 2020s—optimized for “infrequent” but immediate access.
  • Archive (180 days): Lowest cost, but data is “offline.” Rehydrating it takes hours.

3. Redundancy: Protecting Your Data

Azure keeps multiple copies of your data to protect against hardware failure or entire data center disasters.

  • LRS (Locally Redundant): 3 copies in a single data center. Protects against disk failure.
  • ZRS (Zone-Redundant): 3 copies across 3 different data centers in one region. Protects against a data center fire/outage.
  • GRS (Geo-Redundant): Copies data to a secondary region hundreds of miles away. Protects against a major regional disaster.

🚀 What’s New in 2026?

  • Agentic Scale: Blob storage can now handle millions of small objects per second to support AI Agents that need to “remember” long-term context or retrieve data from RAG (Retrieval-Augmented Generation) stores.
  • Elastic SAN: A fully managed Storage Area Network (SAN) service that lets you pool block storage for different workloads, making it easier to migrate heavy SQL or Oracle databases to the cloud.
  • AI Integration: Native “hooks” for frameworks like LangChain and Microsoft Foundry, allowing AI models to read directly from your storage with minimal setup.

Choosing the right storage in 2026 depends on two main factors: the structure of your data and how your application (or AI agent) needs to access it.

Here is the 2026 decision framework to help you choose the right tool for the job.


🚦 The “Quick Decision” Tree

If your project needs to…The Winner is…
Store millions of files for AI training or Data Lakes.Blob Storage (Data Lake Gen2)
Replace an on-premise file server (SMB/NFS).Azure Files
Provide high-speed block storage for Virtual Machines.Managed Disks
Pool storage across many VMs/Containers like a Cloud SAN.Elastic SAN
Send messages between different microservices.Queue Storage
Store simple Key-Value data (User profiles, logs).Table Storage

🟦 1. Blob Storage: The AI & Big Data King

In 2026, Blob storage is no longer just for “backups.” It is the central engine for Agentic Scale—supporting AI agents that need to read massive amounts of context quickly.

  • Best For: Unstructured data (PDFs, Images, Parquet files).
  • Key Feature: Data Lake Storage Gen2. This adds a “Hierarchical Namespace” (real folders) to your blobs, which makes big data analytics and AI processing 10x faster.
  • 2026 Strategy: Use Cold Tier for data you only touch once a quarter but need available instantly for AI “Reasoning” tasks.

📂 2. Azure Files: The “Lift-and-Shift” Hero

If you have an existing application that expects a “Drive Letter” (like Z:\), use Azure Files.

  • Best For: Shared folders across multiple VMs or local office computers.
  • New in 2026: Elastic ZRS (Zone Redundant Storage). This provides ultra-high availability for mission-critical file shares without the complexity of managing your own cluster.
  • Performance: Use Premium Files if you are running active databases or high-transaction apps; use Standard for simple office document sharing.

💽 3. Managed Disks vs. Elastic SAN

This is the “local” vs “network” storage debate for your servers.

  • Managed Disks (The Individual): Use Premium SSD v2. It’s the modern standard because it allows you to scale IOPS and Throughput separately, so you don’t have to buy a “huge disk” just to get “high speed.”
  • Elastic SAN (The Pool): If you are migrating a massive environment from an on-premise SAN (like Dell EMC or NetApp), Elastic SAN lets you buy one large “pool” of performance and share it across all your VMs and Kubernetes clusters.

🔍 4. Specialized: Tables & Queues

These are “developer” storage types.

  • Azure Tables: Use this if Cosmos DB is too expensive for your needs. It’s a “no-frills” NoSQL database for billions of small, structured rows.
  • Azure Queues: Use this to decouple your app. If a user uploads a photo, put a message in the Queue. A “Worker” then sees that message and processes the photo. This prevents your app from crashing under heavy load.

💡 Pro-Tip for 2026: The “Cost First” Choice

If you are still unsure, start with Blob Storage. It is the most flexible, has the best API support for modern AI frameworks (like LangChain), and offers the most aggressive cost-saving tiers (Cool, Cold, and Archive).

What is the “main action” your application will take? (e.g., “Scanning documents for AI,” “Hosting a website,” or “Running a heavy SQL database?”) I can give you the specific configuration for that.

Cost optimization in Azure is no longer just about “turning things off.” In 2026, it is a continuous lifecycle known as FinOps, focusing on three distinct phases: Inform (Visibility), Optimize (Rightsizing & Rates), and Operate (Governance).

Here is the strategic blueprint for optimizing your Azure spend today.


1. Inform: Get Full Visibility

You cannot optimize what you cannot see.

  • Tagging Enforcement: Use Azure Policy to require tags like Environment, Owner, and CostCenter. This allows you to group costs by department or project in Azure Cost Management.
  • Budget Alerts: Set thresholds at 50%, 80%, and 100% of your predicted monthly spend.
  • Azure Advisor Score: Check your “Cost Score” in Azure Advisor. It provides a “to-do list” of unused resources, such as unattached Managed Disks or idle ExpressRoute circuits.

2. Optimize: The Two-Pronged Approach

Optimization is divided into Usage (buying less) and Rate (paying less for what you use).

A. Usage Optimization (Rightsizing)

  • Shut Down Idle Resources: Azure Advisor flags VMs with <3% CPU usage. For Dev/Test environments, use Auto-shutdown or Azure Automation to turn VMs off at 7:00 PM and on at 7:00 AM.
  • Storage Tiering: Move data that hasn’t been touched in 30 days to the Cool tier, and data older than 180 days to the Archive tier. This can save up to 90% on storage costs.
  • B-Series VMs: For workloads with low average CPU but occasional spikes (like small web servers), use the B-Series (Burstable) instances to save significantly.

B. Rate Optimization (Commitment Discounts)

In 2026, you choose your discount based on how much flexibility you need.

Discount TypeSavingsBest For…
Reserved Instances (RI)Up to 72%Static workloads. You commit to a specific VM type in a specific region for 1 or 3 years.
Savings Plan for ComputeUp to 65%Dynamic workloads. A flexible $ /hour commitment that applies across VM families and regions.
Azure Hybrid BenefitUp to 85%Using your existing Windows/SQL licenses in the cloud so you don’t pay for them twice.
Spot InstancesUp to 90%Interruptible workloads like batch processing or AI model training.

3. Operate: Modern 2026 Strategies

  • AI Cost Governance: With the rise of Generative AI, monitor your Azure OpenAI and AI Agent token usage. Use Rate Limiting on your APIs to prevent a runaway AI bot from draining your budget in a single night.
  • FinOps Automation: Use Azure Resource Graph to find “orphaned” resources (like Public IPs not attached to anything) and delete them automatically via Logic Apps.
  • Sustainability & Carbon Optimization: Use the Azure Carbon Optimization tool. Often, the most “green” resource (lowest carbon footprint) is also the most cost-efficient one.

✅ The “Quick Wins” Checklist

  1. [ ] Delete Unattached Disks: When you delete a VM, the disk often stays behind and keeps billing you.
  2. [ ] Switch to Savings Plans: If your RIs are expiring, move to a Savings Plan for easier management.
  3. [ ] Check for “Zombies”: Idle Load Balancers, VPN Gateways, and App Service Plans with zero apps.
  4. [ ] Rightsize your SQL: Switch from “DTU” to the vCore model for more granular scaling and Hybrid Benefit savings.

Pro Tip: Never buy a Reserved Instance (RI) for a server that hasn’t been rightsized first. If you buy a 3-year reservation for an oversized 16-core VM, you are “locking in” waste for 36 months!

To find the “low-hanging fruit” in your Azure environment, you can use Azure Resource Graph Explorer and Log Analytics.

Here are the specific KQL (Kusto Query Language) scripts to identify common waste areas.


1. Identify Orphaned Resources (Quickest Savings)

These resources are costing you money every hour but aren’t attached to anything. Run these in the Azure Resource Graph Explorer.

A. Unattached Managed Disks

Code snippet

Resources
| where type has "microsoft.compute/disks"
| extend diskState = tostring(properties.diskState)
| where managedBy == "" and diskState == "Unattached"
| project name, resourceGroup, subscriptionId, location, diskSizeGB = properties.diskSizeGB
| order by diskSizeGB desc

B. Unattached Public IPs

Code snippet

Resources
| where type == "microsoft.network/publicipaddresses"
| where properties.ipConfiguration == "" and properties.natGateway == ""
| project name, resourceGroup, subscriptionId, location, ipAddress = properties.ipAddress

2. Identify Underutilized VMs (Rightsizing)

To run this, your VMs must be sending performance data to a Log Analytics Workspace. Use this to find VMs that are consistently running below 5% CPU.

KQL for Underutilized VMs (Last 7 Days):

Code snippet

Perf
| where TimeGenerated > ago(7d)
| where ObjectName == "Processor" and CounterName == "% Processor Time" and InstanceName == "_Total"
| summarize AvgCPU = avg(CounterValue), MaxCPU = max(CounterValue) by Computer, _ResourceId
| where AvgCPU < 5
| order by AvgCPU asc
  • Action: If MaxCPU is also low, consider “Downsizing” the VM (e.g., from a D4 to a D2) or switching it to a B-series.

3. Find Idle App Service Plans

App Service Plans cost money even if they have zero apps running on them. Run this in Resource Graph Explorer.

Code snippet

resources
| where type =~ "microsoft.web/serverfarms"
| where properties.numberOfSites == 0
| project name, resourceGroup, subscriptionId, Sku = sku.name, Tier = sku.tier

4. Search for “Zombie” Network Interfaces

These don’t cost money directly, but they clutter your environment and use up IP addresses in your subnets.

Code snippet

Resources
| where type =~ 'microsoft.network/networkinterfaces'
| where isnull(properties.virtualMachine)
| project name, resourceGroup, subscriptionId, location

💡 How to Automate This in 2026

Instead of running these manually, use Azure Workbooks.

  1. Search for “Workbooks” in the Azure Portal.
  2. Click Add > New.
  3. Add a “Query” cell and paste any of the KQL scripts above.
  4. Save the Workbook as “Monthly Cost Cleanup.”

Now, you can open this dashboard once a month and see exactly what needs to be deleted!

Setting up an automated alert for “Unattached Disks” is a brilliant move for cost governance. In Azure, this is handled by monitoring the Activity Log for a specific event: the “Delete Virtual Machine” action (which leaves the disk behind) or the “Detach Disk” action.

Here is the 2026 step-by-step guide to setting this up.


Step 1: Create an Action Group (The “Who” to notify)

Before you create the alert, you need to tell Azure how to contact you.

  1. Search for Monitor in the Azure Portal.
  2. Click Alerts > Action groups > + Create.
  3. Basics: Give it a name like CostAlertTeam.
  4. Notifications: Select Email/SMS message/Push/Voice.
  5. Enter your email address and name the notification EmailDevOps.
  6. Click Review + create.

Step 2: Create the Activity Log Alert (The “When”)

Now, we create the trigger that watches for disks being left alone.

  1. In Monitor, click Alerts > + Create > Alert rule.
  2. Scope: Select your Subscription.
  3. Condition: This is the most important part. Click + Add condition and search for:
    • Signal Name: Detach Disk (Microsoft.Compute/disks)
    • Alternative: You can also alert on Delete Virtual Machine, but “Detach Disk” is more accurate for catching orphaned resources.
  4. Refine the Logic: Under “Event initiated by,” you can leave it as “Any” or specify a specific automation service principal if you only want to catch manual detaches.

Step 3: Connect and Save

  1. Actions: Click Select action groups and choose the CostAlertTeam group you created in Step 1.
  2. Details: Name the rule Alert-Disk-Unattached.
  3. Severity: Set it to Informational (Sev 4) or Warning (Sev 3).
  4. Click Review + create.

💡 The “Pro” Way (2026 Strategy): Use Log Analytics

The method above tells you when a disk is detached, but it won’t tell you about disks that are already unattached. To catch those, use a Log Search Alert with a KQL query.

The KQL Query:

Code snippet

// Run this every 24 hours to find any disk with a "ManagedBy" state of NULL
resources
| where type has "microsoft.compute/disks"
| extend diskState = tostring(properties.diskState)
| where managedBy == "" and diskState == "Unattached"
| project name, resourceGroup, subscriptionId

Why this is better:

  • Activity Log Alerts are “reactive” (they fire only at the moment of the event).
  • Log Search Alerts are “proactive” (they scan your environment every morning and email you a list of every unattached disk, even if it was detached months ago).

✅ Summary of the Workflow

  1. Detach/Delete Event happens in the VNet.
  2. Activity Log captures the event.
  3. Azure Monitor sees the event matches your rule.
  4. Action Group sends you an email immediately.

While an immediate alert is great for a “fire-drill” response, a Weekly Summary Report is the gold standard for long-term cost governance. It keeps your inbox clean and ensures your team stays accountable for “disk hygiene.”

In 2026, the best way to do this without writing custom code is using Azure Logic Apps.


🛠️ The Architecture: “The Monday Morning Cleanup”

We will build a simple 3-step workflow that runs every Monday at 9:00 AM, queries for unattached disks, and sends you a formatted HTML table.

Step 1: Create the Logic App (Recurrence)

  1. Search for Logic Apps and create a new one (select Consumption plan for lowest cost).
  2. Open the Logic App Designer and select the Recurrence trigger.
  3. Set it to:
    • Interval: 1
    • Frequency: Week
    • On these days: Monday
    • At these hours: 9

Step 2: Run the KQL Query

  1. Add a new step and search for the Azure Monitor Logs connector.
  2. Select the action: Run query and visualize results.
  3. Configure the connection:
    • Subscription/Resource Group: Select your primary management group.
    • Resource Type: Log Analytics Workspace.
  4. The Query: Paste the “Orphaned Disk” query from earlier:Code snippetResources | where type has "microsoft.compute/disks" | extend diskState = tostring(properties.diskState) | where managedBy == "" and diskState == "Unattached" | project DiskName = name, ResourceGroup = resourceGroup, SizeGB = properties.diskSizeGB, Location = location
  5. Chart Type: Select HTML Table.

Step 3: Send the Email

  1. Add a final step: Office 365 Outlook – Send an email (V2).
  2. To: Your team’s email.
  3. Subject: ⚠️ Weekly Action: Unattached Azure Disks found
  4. Body:
    • Type some text like: “The following disks are currently unattached and costing money. Please delete them if they are no longer needed.”
    • From the Dynamic Content list, select Attachment Content (this is the HTML table from Step 2).

📊 Why this is the “Pro” Move

  • Zero Maintenance: Once it’s running, you never have to check the portal manually.
  • Low Cost: A Logic App running once a week costs roughly $0.02 per month.
  • Formatted for Humans: Instead of a raw JSON blob, you get a clean table that you can forward to project owners.

✅ Bonus: Add a “Delete” Link

If you want to be a 2026 power user, you can modify the KQL to include a “Deep Link” directly to each disk in the Azure Portal:

Code snippet

| extend PortalLink = strcat("https://portal.azure.com/#@yourtenant.onmicrosoft.com/resource", id)
| project DiskName, SizeGB, PortalLink

Now, you can click the link in your email and delete the disk in seconds.

Combining the different “zombie” resources into one report is the most efficient way to manage your Azure hygiene.

By using the union operator in KQL, we can create a single list of various resource types that are currently costing you money without providing value.


1. The “Ultimate Zombie” KQL Query

Copy and paste this query into your Logic App or Azure Resource Graph Explorer. It looks for unattached disks, unassociated IPs, and empty App Service Plans all at once.

Code snippet

// Query for Orphaned Disks
Resources
| where type has "microsoft.compute/disks"
| extend diskState = tostring(properties.diskState)
| where managedBy == "" and diskState == "Unattached"
| project Name = name, Type = "Orphaned Disk", Detail = strcat(properties.diskSizeGB, " GB"), ResourceGroup, SubscriptionId
| union (
// Query for Unassociated Public IPs
Resources
| where type == "microsoft.network/publicipaddresses"
| where properties.ipConfiguration == "" and properties.natGateway == ""
| project Name = name, Type = "Unattached IP", Detail = tostring(properties.ipAddress), ResourceGroup, SubscriptionId
)
| union (
// Query for Empty App Service Plans (Costly!)
resources
| where type =~ "microsoft.web/serverfarms"
| where properties.numberOfSites == 0
| project Name = name, Type = "Empty App Service Plan", Detail = strcat(sku.tier, " - ", sku.name), ResourceGroup, SubscriptionId
)
| union (
// Query for Idle Load Balancers (No Backend Pool members)
resources
| where type == "microsoft.network/loadbalancers"
| where array_length(properties.backendAddressPools) == 0
| project Name = name, Type = "Idle Load Balancer", Detail = "No Backend Pools", ResourceGroup, SubscriptionId
)
| order by Type asc

2. Updating Your Logic App Report

To make this work in your weekly email:

  1. Open your Logic App and update the “Run query” step with the new combined KQL above.
  2. Update the HTML Table: Since the new query uses consistent column names (Name, Type, Detail), your HTML table will now neatly list the different types of waste side-by-side.

3. Advanced 2026 Tip: Add “Potential Savings”

If you want to get your manager’s attention, you can add a “Estimated Monthly Waste” column. While KQL doesn’t know your exact billing, you can hardcode estimates:

Code snippet

| extend MonthlyWaste = case(
Type == "Orphaned Disk", 5.00, // Estimate $5 per month
Type == "Unattached IP", 4.00, // Estimate $4 per month
Type == "Empty App Service Plan", 50.00, // Estimate $50+ for Standard+
0.00)

✅ Your “Monday Morning” Checklist

When you receive this email every Monday, follow this triage:

  • Disks: Delete immediately unless you specifically kept it as a “manual backup” (though you should use Azure Backup for that).
  • Public IPs: Delete. Unused Public IPs are charged by the hour in Azure.
  • App Service Plans: If you aren’t using them, scale them to the Free (F1) tier or delete them. These are often the biggest hidden costs.

To turn this report into a powerful leadership tool, we need to calculate the “Total Potential Monthly Savings.” This changes the conversation from “We have a few loose disks” to “We can save $800/month by clicking these buttons.”

Here is how to update your Logic App and KQL to include a summary total.


1. Updated “Master Zombie” Query (With Estimated Costs)

We will add a hidden cost value to every “zombie” found, then summarize the total at the very end.

Code snippet

let RawData = Resources
| where type has "microsoft.compute/disks"
| extend diskState = tostring(properties.diskState)
| where managedBy == "" and diskState == "Unattached"
| project Name = name, Type = "Orphaned Disk", Detail = strcat(properties.diskSizeGB, " GB"), MonthlyWaste = 10.00, ResourceGroup
| union (
Resources
| where type == "microsoft.network/publicipaddresses"
| where properties.ipConfiguration == "" and properties.natGateway == ""
| project Name = name, Type = "Unattached IP", Detail = tostring(properties.ipAddress), MonthlyWaste = 4.00, ResourceGroup
)
| union (
resources
| where type =~ "microsoft.web/serverfarms"
| where properties.numberOfSites == 0
| project Name = name, Type = "Empty App Service Plan", Detail = strcat(sku.tier, " - ", sku.name), MonthlyWaste = 55.00, ResourceGroup
);
// This part creates the final list
RawData
| order by Type asc
| union (
RawData
| summarize Name = "TOTAL POTENTIAL SAVINGS", Type = "---", Detail = "---", MonthlyWaste = sum(MonthlyWaste), ResourceGroup = "---"
)

2. Formatting the Logic App Email

Since KQL doesn’t easily format currency, we’ll use the Logic App “Compose” action to make the final total stand out in your email.

  1. Run the Query: Use the Run query and list results action in Logic Apps with the KQL above.
  2. Add a “Compose” Step: Between the Query and the Email, add a Data Operations - Compose action.
  3. The HTML Body: Use this template in the email body to make it look professional:

HTML

<h3>Azure Monthly Hygiene Report</h3>
<p>The following resources are identified as waste.
Cleaning these up will result in the estimated savings below.</p>
@{body('Create_HTML_table')}
<br>
<div style="background-color: #e1f5fe; padding: 15px; border-radius: 5px; border: 1px solid #01579b;">
<strong>💡 Quick Win Tip:</strong> Deleting these resources today
will save your department approx <strong>$@{outputs('Total_Waste_Sum')}</strong> per month.
</div>

3. Why This Works in 2026

  • The “Nudge” Effect: By showing the total dollar amount at the bottom, you create a psychological incentive for resource owners to clean up.
  • Customizable Pricing: You can adjust the MonthlyWaste numbers in the KQL to match your specific Enterprise Agreement (EA) pricing.
  • Single Pane of Glass: You now have one query that covers Compute, Network, and Web services.

✅ Final Triage Steps

  • Review: If you see a “TOTAL POTENTIAL SAVINGS” of $0.00, congratulations! Your environment is clean.
  • Action: For the “Empty App Service Plans,” check if they are in a Free (F1) or Shared (D1) tier first—those don’t cost money, but they will still show up as “Empty.”

Azure 3-tier app

A clean Azure 3-tier app design is:

  1. Web tier for user traffic
  2. App tier for business logic and APIs
  3. Data tier for storage and databases

That matches Azure’s n-tier guidance, where logical layers are separated and can be deployed to distinct tiers for security, scale, and manageability. (Microsoft Learn)

Simple Azure design

Users
|
Azure Front Door / WAF
|
Web Tier
(App Service or VMSS)
|
App Tier
(App Service / AKS / VMSS)
|
Data Tier
(Azure SQL / Storage / Cache)

Better interview-ready version

Internet
|
Front Door + WAF
|
Application Gateway
|
---------------- Web Subnet ----------------
Web Tier
(App Service or VM Scale Set)
|
----------- App / API Private Subnet -------
App Tier
(App Service with VNet Integration / AKS / VMSS)
|
----------- Data Private Subnet ------------
Azure SQL / Storage / Redis / Key Vault
(Private Endpoints)

What I’d choose in Azure

For a modern Azure-native design, I’d usually use:

  • Front Door + WAF for global entry and protection
  • App Service for the web tier
  • App Service or AKS for the app/API tier
  • Azure SQL for the database
  • Key Vault for secrets
  • Private Endpoints for Key Vault and database access
  • VNet integration so the app tier can reach private resources inside the virtual network. Azure App Service supports VNet integration for reaching resources in or through a VNet, and Azure supports private endpoints for services like Key Vault. (Microsoft Learn)

Security design

A strong answer should include:

  • Put the web tier behind WAF
  • Keep the app tier private
  • Put the data tier behind Private Endpoints
  • Use Managed Identity from app tier to Key Vault and database where supported
  • Use NSGs and subnet separation
  • Disable public access on back-end services when possible. Azure’s secure n-tier App Service guidance specifically uses VNet integration and private endpoints to isolate traffic within the virtual network. (Microsoft Learn)

High availability and scaling

For resilience, I’d make the web and app tiers stateless, enable autoscaling, and run across multiple availability zones or multiple instances. Azure’s web app and Well-Architected guidance emphasizes designing for reliability, scalability, and secure operation. (Microsoft Learn)

2-minute interview answer

“I’d design the 3-tier app with a web tier, app tier, and data tier. User traffic would enter through Azure Front Door with WAF, then go to the web tier, typically App Service or VM Scale Sets. The web tier would call a private app tier that hosts the business logic. The app tier would connect to the data tier, such as Azure SQL, Storage, Redis, and Key Vault. I’d use VNet integration and private endpoints so the back-end services are not publicly exposed. For security, I’d separate tiers into subnets, apply NSGs, use Managed Identity for secret and database access, and store secrets in Key Vault. For reliability, I’d keep the web and app tiers stateless and scale them horizontally.” (Microsoft Learn)

Easy memory trick

Remember it as:

Ingress → Web → Logic → Data
and
Public only in front, private everywhere else


🧠 🧱 3-Tier Azure Diagram

✍️ Draw This on a Whiteboard

                 🌍 Internet
                      |
             Azure Front Door / WAF
                      |
              Application Gateway
                      |
        --------------------------------
        |        Web Tier (Public)     |
        |  App Service / VM Scale Set  |
        --------------------------------
                      |
        --------------------------------
        |        App Tier (Private)    |
        |  API / Backend / AKS         |
        --------------------------------
                      |
        --------------------------------
        |        Data Tier (Private)   |
        |  Azure SQL / Storage        |
        |  + Key Vault               |
        --------------------------------

        (Private Endpoints + VNet Integration)



🎤 What to Say While Drawing

🟢 Step 1 — Entry Point

“This is a 3-tier architecture in Microsoft Azure. Traffic enters through Front Door with WAF for global routing and security.”


🟢 Step 2 — Web Tier

“The web tier handles user requests. It’s the only layer exposed publicly and is typically built using App Service or VM Scale Sets.”


🟢 Step 3 — App Tier

“The app tier contains business logic and APIs. It’s private and only accessible from the web tier.”


🟢 Step 4 — Data Tier

“The data tier includes services like Azure SQL, Storage, and Key Vault, all accessed via Private Endpoints so they are not exposed to the internet.”


🟢 Step 5 — Security

I use VNet integration and Private Endpoints so all backend communication stays inside Azure. I also use Managed Identity for secure access to Key Vault and databases, eliminating secrets.



🔐 Add These Details

Mention these to stand out:

  • NSGs between tiers
  • Private DNS for Private Endpoints
  • No public access on DB / Key Vault
  • Use Azure Key Vault for secrets
  • Identity via Microsoft Entra ID

⚡ Ultra-Simple Memory Trick

👉 Draw 3 boxes vertically:

Web (Public)
App (Private)
Data (Private)

Then add:

  • WAF on top
  • Private Endpoints at bottom

💬 30-Second Version

“I’d design a 3-tier app with a web tier, app tier, and data tier. Traffic enters through Front Door with WAF, hits the web tier, then flows to a private app tier and finally to a private data tier. I’d secure backend services using Private Endpoints and use Managed Identity for authentication, ensuring no secrets are stored and no backend services are publicly exposed.”


🧠 Why This Works in Interviews

You just demonstrated:

  • ✅ Architecture design
  • ✅ Security best practices
  • ✅ Networking (private endpoints, VNets)
  • ✅ Identity (Managed Identity)

How to investigate spike in azure


Quick Decision Tree First

Do you know which service spiked?

  • Yes → Skip to Step 3
  • No → Start at Step 1

Step 1: Pinpoint the Spike in Cost Management

  • Azure Portal → Cost Management → Cost Analysis
  • Set view to Daily to find the exact day
  • Group by Service Name first → tells you what spiked
  • Then group by Resource → tells you which specific resource

Step 2: Narrow by Dimension

Keep drilling down by:

  • Resource Group
  • Resource type
  • Region (unexpected cross-region egress is a common hidden cost)
  • Meter (very granular — shows exactly what operation you’re being charged for)

Step 3: Go to the Offending Resource

Once you know what it is:

ServiceWhere to look
VM / VMSSCheck scaling events, uptime, instance count
StorageCheck blob transactions, egress, data written
Azure SQL / SynapseQuery history, DTU spikes, long-running queries
ADF (Data Factory)Pipeline run history — loops, retries, backfills
DatabricksCluster history — was a cluster left running?
App ServiceScale-out events, request volume
Azure FunctionsExecution count — was something stuck in a loop?

Step 4: Check Activity Log

  • Monitor → Activity Log
  • Filter by the spike timeframe
  • Look for:
    • New resource deployments
    • Scaling events
    • Config changes
    • Who or what triggered it (user vs service principal)

This answers “what changed?”


Step 5: Check Azure Monitor Metrics

  • Go to the specific resource → Metrics
  • Look at usage metrics around the spike time:
    • CPU / memory
    • Data in/out (egress is often the culprit)
    • Request count
    • DTU / vCore usage

Correlate the metric spike timeline with the cost spike timeline.


Step 6: Check Logs (Log Analytics / KQL)

If you have Log Analytics workspace connected:

// Example: Find expensive or long-running operations
AzureActivity
| where TimeGenerated between (datetime(2026-04-01) .. datetime(2026-04-11))
| where ActivityStatusValue == "Success"
| summarize count() by OperationNameValue, ResourceGroup
| order by count_ desc
// Check for VM scaling events
AzureActivity
| where OperationNameValue contains "virtualMachines"
| where TimeGenerated > ago(7d)
| project TimeGenerated, Caller, OperationNameValue, ResourceGroup

Step 7: Check for Common Culprits

These are the most frequent causes of unexpected spikes:

  • 🔁 Pipeline/job stuck in a loop (ADF, Functions, Logic Apps)
  • 💾 Unexpected data egress (cross-region or internet-bound traffic)
  • 📈 Auto-scaling that didn’t scale back down
  • 🗄️ Full table scan or bad query in SQL/Synapse
  • 🖥️ VM or cluster left running after a job
  • 📦 Historical data backfill triggered accidentally
  • 🔄 Snapshot or backup policy changed

The Mental Model

Cost Analysis (when + what?)
→ Drill by dimension (which resource?)
→ Activity Log (what changed?)
→ Metrics (how did usage behave?)
→ Logs/KQL (why did it happen?)

“Traffic Spike” or perhaps a “Burst” in resource usage

It sounds like you are describing a “Traffic Spike” or perhaps a “Burst” in resource usage. Since there are no alerts, you are in “detective mode,” looking for a silent surge that hasn’t crossed a threshold yet but is clearly visible in your telemetry.

If you heard “frost,” you might be referring to “Cold Start” spikes (common in Serverless/Functions) or a “Request Spike.” Here are the steps to track down the source of a sudden surge in Azure:


🔍 Step 1: Use Azure Monitor “Metrics Explorer”

Since you don’t have alerts, you need to visualize the spike to see its “shape.”

  1. Go to the resource (e.g., App Service, VM, or Load Balancer).
  2. Select Metrics from the left menu.
  3. Add the Request Count (for apps) or CPU/Network In (for VMs) metric.
  4. The Secret Step: Change the “Aggregation” to Count or Sum and look for the exact timestamp of the spike.
  5. Apply Splitting: Split the metric by “RemoteIP” or “Instance”. This tells you if the spike is coming from one specific user/IP or hitting one specific server.

🕵️ Step 2: Dig into Log Analytics (KQL)

If the metrics show a spike but not the “who,” you need the logs. This is where you find the “Source.”

  1. Go to Logs (Log Analytics Workspace).
  2. Run a query to find the top callers during that spike period.

Example KQL for App Gateways/Web Apps:

Code snippet

// Find the top 10 IP addresses causing the spike
AzureDiagnostics
| where TimeGenerated > datetime(2026-04-10T12:00:00Z) // Set to your spike time
| where Category == "ApplicationGatewayAccessLog"
| summarize RequestCount = count() by clientIP_s
| top 10 by RequestCount
  • Result: If one IP address has 50,000 requests while others have 10, you’ve found a bot or a misconfigured client.

🌐 Step 3: Check “Application Insights” (App Level)

If the spike is happening inside your application code (e.g., a “Cold Start” or a heavy API call):

  1. Go to Application Insights > Failures or Performance.
  2. Look at the “Top 10 Operations”.
  3. Check if a specific API endpoint (e.g., /api/export) suddenly jumped in volume.
  4. Use User Map to see if the traffic is coming from a specific geographic region (e.g., a sudden burst of traffic from a country you don’t usually service).

🗺️ Step 4: Network Watcher (Infrastructure Level)

If you suspect the spike is at the “packet” level (like a DDoS attempt or a backup job gone rogue):

  1. Go to Network Watcher > NSG Flow Logs.
  2. Use Traffic Analytics. It provides a map showing which VNets or Public IPs are sending the most data.
  3. Check for “Flows”: It will show you the “Source Port” and “Destination Port.” If you see a spike on Port 22 (SSH) or 3389 (RDP), someone is likely trying to brute-force your VMs.

🤖 Step 5: Check for “Auto-Scaling” Events

Sometimes the “spike” isn’t a problem, but a reaction.

  1. Go to Activity Log.
  2. Filter for “Autoscale” events.
  3. If the spike happened exactly when a new instance was added, the “spike” might actually be the resource “warming up” (loading caches, etc.), which can look like a surge in CPU or Disk I/O.

Summary Checklist:

  • Metrics Explorer: To see when it happened and how big it was.
  • Log Analytics (KQL): To find the specific Client IP or User Agent.
  • Traffic Analytics: To see if it was a Network-level burst.
  • Activity Log: To see if any Manual Changes or Scaling occurred at that exact second.

A common real-world “mystery spike” case. Since you mentioned “frost spike” and “source space,” you are likely referring to a Cost Spike or a Request/Throughput Spike in your resource namespace.

If there are no alerts firing, it means the spike either didn’t hit a specific threshold or was too brief to trigger a standard “Static” alert.


🏗️ Step 1: Establish the “When” and “What”

First, you need to see the “DNA” of the spike using Azure Monitor Metrics.

  • Look at the Graph: Is it a “Square” spike (starts and stops abruptly, like a scheduled job)? Or a “Needle” spike (hits a peak and drops, like a bot attack)?
  • Identify the Resource: Go to Metrics Explorer and check:
    • For VMs: Percentage CPU or Network In/Out.
    • For Storage/SQL: Transactions or DTU Consumption.
    • For App Services: Requests or Data In.

🔍 Step 2: Finding the Source (The Detective Work)

Since you don’t know where it came from, you use “Splitting” and “Filtering” in Metrics Explorer.

  1. Split by Instance/Role: If you have 10 servers, split by InstanceName. Does only one server show the spike? If yes, it’s a local process (like a hanging Windows Update or a log-rotation fail).
  2. Split by Operation: For Storage or SQL, split by API Name. Is it GetBlob? PutBlob? This tells you if you are reading too much or writing too much.
  3. Split by Remote IP: If your load balancer shows the spike, split by ClientIP. If one IP has 100x the traffic of others, you’ve found your source.

🕵️ Step 3: Deep Dive with Log Analytics (KQL)

Metrics only show numbers. Logs show names. You need to run a KQL query in your Log Analytics Workspace.

Query to find “Who is talking to me”:

Code snippet

// This finds the top 5 callers during the spike window
AzureDiagnostics
| where TimeGenerated > datetime(2026-04-10T12:00:00Z) // Use your spike time
| summarize RequestCount = count() by clientIp_s, requestUri_s
| top 5 by RequestCount
  • Result: This will literally list the IP address and the specific URL they were hitting.

💰 Step 4: The “Cost” Investigation

If the spike is financial (a “Cost Spike”), you check Azure Cost Management.

  1. Cost Analysis: View cost by Resource. Did one specific Disk or Data Transfer cost jump?
  2. Check for “Orphaned” Resources: Sometimes a spike comes from a process that created 1,000 snapshots or temporary disks and forgot to delete them.

🤖 Step 5: Check the “Silent” Sources

If the metrics and logs don’t show an external attacker, check internal Azure “automated” sources:

  • Resource Graph: Check for “Change Tracking.” Did someone deploy code or change a firewall rule at that exact minute?
  • Backup/Recovery Services: A “huge spike” in disk I/O often aligns with a Storage Snapshot or an Azure Backup job starting.
  • Defender for Cloud: Even if you don’t have a “Metric Alert,” check the Security Alerts. Defender might have seen the spike and flagged it as “Suspicious PowerShell Activity” or “Port Scanning.”

✅ Summary Checklist

StepActionTool
1. VisualizeSee the shape and duration of the spike.Metrics Explorer
2. IsolateSplit metrics by IP or Instance.Metrics Explorer
3. IdentifyRun a query to find the specific Client IP or User.Log Analytics (KQL)
4. CorrelateCheck if any “Deployments” happened at that time.Activity Log / Change Analysis
5. NetworkCheck for massive data transfers between regions.Network Watcher / Traffic Analytics

How to prevent this next time? Once you find the source, create a Dynamic Threshold Alert. Unlike static alerts, these use AI to learn your “normal” pattern and will fire if a spike looks “unusual,” even if it doesn’t hit a high maximum number.

Enterprise RAG Pipeline & Internal AI Assistant Azure Ecosystem: ADF, ADLS Gen2, Databricks, AI Search, OpenAI


1. The Project Title

Enterprise RAG Pipeline & Internal AI Assistant Azure Ecosystem: ADF, ADLS Gen2, Databricks, AI Search, OpenAI


2. Impact-Driven Bullet Points

Use the C-A-R (Context-Action-Result) method. Choose 3-4 from this list:

  • Architecture: Architected and deployed a multi-stage data lake (Medallion Architecture) using ADLS Gen2 and Terraform, reducing data fragmentation across internal departments.
  • Orchestration: Developed automated Azure Data Factory (ADF) pipelines with event-based triggers to ingest and preprocess 5,000+ internal documents (PDF/Office) with 99% reliability.
  • AI Engineering: Built a Databricks processing engine to perform recursive character chunking and vector embedding using text-embedding-3-large, optimizing retrieval context for a GPT-4o powered chatbot.
  • Search Optimization: Implemented Hybrid Search (Vector + Keyword) and Semantic Ranking in Azure AI Search, improving answer relevance by 35% compared to traditional keyword-only search.
  • Security & Governance: Integrated Microsoft Entra ID and ACL-based Security Trimming to ensure the AI assistant respects document-level permissions, preventing unauthorized data exposure.
  • Cost Management: Optimized cloud spend by 40% through Databricks Serverless compute and automated ADLS Lifecycle Management policies (Hot-to-Cold tiering).

3. Skills Section (Keywords for ATS)

  • Cloud & Data: Azure Data Factory (ADF), ADLS Gen2, Azure Databricks, Spark (PySpark), Medallion Architecture, Delta Lake.
  • AI & Search: Retrieval-Augmented Generation (RAG), Azure AI Search, Vector Databases, Semantic Ranking, Hybrid Retrieval.
  • LLMs: Azure OpenAI (GPT-4o), Embeddings, Prompt Engineering, LangChain/LlamaIndex.
  • DevOps/IaC: Terraform, Azure DevOps (CI/CD), Managed Identities, Unity Catalog.

4. The “Interview Hook”

In your Professional Summary or Project Description, add one sentence that proves you know the real-world challenges of AI:

“Implemented a production-ready RAG system that solves for LLM hallucinations by enforcing strict citation requirements and PII redaction during the ingestion phase.”


Pro-Tip for 2026:

Hiring managers currently care deeply about “Day 2 Operations.” If they ask about this project in an interview, mention how you monitored it for Cost (Azure Budgets) and Quality (using an evaluation framework like Ragas or Azure AI Content Safety). This proves you aren’t just a “tutorial follower” but a production-ready engineer.

To deploy Azure Databricks using Terraform, you need to set up three main components: a Resource Group, the Databricks Workspace, and the Network Security Group (optional but recommended).

Below is a clean, modular example. This configuration uses the “standard” pricing tier, which is usually sufficient for internal RAG testing.

1. The Terraform Configuration (main.tf)

Terraform

# 1. Define the Provider
terraform {
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "~> 3.0" # Use the latest stable 3.x version
}
}
}
provider "azurerm" {
features {}
}
# 2. Create a Resource Group
resource "azurerm_resource_group" "rg" {
name = "rg-databricks-internal-rag"
location = "East US"
}
# 3. Create the Azure Databricks Workspace
resource "azurerm_databricks_workspace" "example" {
name = "dbw-internal-ai-processor"
resource_group_name = azurerm_resource_group.rg.name
location = azurerm_resource_group.rg.location
sku = "premium" # Premium is required for Unity Catalog & Security Trimming
# Managed Resource Group is where Databricks puts its internal VMs/Storage
managed_resource_group_name = "rg-dbw-managed-resources"
tags = {
Environment = "Dev"
Project = "Internal-RAG"
}
}
# 4. Output the Workspace URL (You'll need this to log in)
output "databricks_host" {
value = azurerm_databricks_workspace.example.workspace_url
}

2. Crucial Architecture Notes for Terraform

  • The Managed Resource Group: Databricks is a “Managed Service.” When you run the code above, Azure will automatically create a second resource group (e.g., rg-dbw-managed-resources). You should not manually delete items inside that group; Terraform and Databricks manage them.
  • SKU Selection:
    • Standard: Cheaper, but lacks advanced security.
    • Premium: Required if you want to use Unity Catalog or User-level Permissions, which are essential for your internal RAG system’s security.
  • VNet Injection (Advanced): If your company requires the Databricks clusters to stay entirely off the public internet, you would need to add a custom_parameters block to the resource to “inject” it into your own Virtual Network.

3. How to Deploy

  1. Initialize: Download the Azure providers.Bashterraform init
  2. Plan: Preview what will be created.Bashterraform plan
  3. Apply: Deploy to Azure (you will be prompted to type yes).Bashterraform apply

4. Next Steps for your RAG Pipeline

Once this workspace is deployed, you will likely want to use Terraform to create Personal Access Tokens (PATs) or Service Principals so that Azure Data Factory (ADF) has permission to trigger Databricks jobs.

II . Add the ADLS Storage Account and ADF to this same Terraform script so they are all connected automatically

This is the “Enterprise Starter Kit” in Terraform. This code creates the resource group, the ADLS Gen2 storage (the reservoir), the Data Factory (the pipes), and the Databricks Workspace (the engine), then connects them so they can talk to each other.

The Integrated Terraform Script

Terraform

# 1. Resource Group
resource "azurerm_resource_group" "main" {
name = "rg-internal-rag-prod"
location = "East US"
}
# 2. ADLS Gen2 Storage (The Warehouse)
resource "azurerm_storage_account" "adls" {
name = "stdatalakeprod001" # Must be unique globally
resource_group_name = azurerm_resource_group.main.name
location = azurerm_resource_group.main.location
account_tier = "Standard"
account_replication_type = "LRS"
account_kind = "StorageV2"
is_hns_enabled = true # THIS TURNS IT INTO ADLS GEN2
}
resource "azurerm_storage_data_lake_gen2_filesystem" "raw" {
name = "bronze-raw"
storage_account_id = azurerm_storage_account.adls.id
}
# 3. Azure Data Factory (The Orchestrator)
resource "azurerm_data_factory" "adf" {
name = "adf-internal-rag-orchestrator"
location = azurerm_resource_group.main.location
resource_group_name = azurerm_resource_group.main.name
identity {
type = "SystemAssigned" # Allows ADF to talk to others without passwords
}
}
# 4. Databricks Workspace (The Processor)
resource "azurerm_databricks_workspace" "dbw" {
name = "dbw-internal-ai-engine"
resource_group_name = azurerm_resource_group.main.name
location = azurerm_resource_group.main.location
sku = "premium" # Required for security trimming/Unity Catalog
}
# 5. CONNECTING THEM: ADF Linked Service to Databricks
# This tells ADF how to trigger Databricks jobs
resource "azurerm_data_factory_linked_service_azure_databricks" "adf_to_dbw" {
name = "LS_Databricks_Connection"
data_factory_id = azurerm_data_factory.adf.id
adb_domain = "https://${azurerm_databricks_workspace.dbw.workspace_url}"
# Using Managed Identity for security (no keys stored!)
msi_work_space_resource_id = azurerm_databricks_workspace.dbw.id
new_cluster_config {
node_type = "Standard_DS3_v2"
cluster_version = "13.3.x-scala2.12"
min_number_of_workers = 1
max_number_of_workers = 2
}
}

Why this setup is “RAG-Ready”:

  • is_hns_enabled = true: This is the most critical line. Without it, you just have a slow, flat blob storage. This enables the high-speed folder structure for your data lake.
  • Managed Identity (SystemAssigned): Notice there are no passwords or “Access Keys” in this code. ADF uses its own identity to log into Databricks. This is the gold standard for internal security.
  • sku = "premium": In an internal environment, you need “Premium” so you can enable Unity Catalog. This ensures that if a user doesn’t have permission to see an HR file in the Data Lake, the Databricks AI code won’t be able to read it either.
  • Automatic Cluster: The new_cluster_config block tells Data Factory to spin up a Databricks cluster only when it’s needed and shut it down immediately after, saving your department significant money.

How to use this:

  1. Save this as main.tf.
  2. Run terraform init and terraform apply.
  3. Once finished, go to the Azure Portal and you will see your entire data ecosystem ready to be loaded with documents.

III. Terraform code to deploy the Azure AI Search service to complete the RAG stack

To complete your Internal RAG stack, we need to add Azure AI Search. In 2026, this service is the “Vector Database” that allows your chatbot to find the right paragraphs within your internal documents.

The Final Terraform Piece: Azure AI Search

Add this block to your existing main.tf file. It deploys the Search service with Semantic Ranking enabled—which is the “secret sauce” for high-accuracy internal bots.

Terraform

# 6. Azure AI Search Service
resource "azurerm_search_service" "search" {
name = "ais-internal-rag-lookup"
resource_group_name = azurerm_resource_group.main.name
location = azurerm_resource_group.main.location
# 'basic' or 'standard' is required for Semantic Ranker and Vector Search
sku = "basic"
# Enabling Semantic Search (Standard plan for the ranker)
semantic_search_sku = "standard"
# Identity for secure connection to ADLS and OpenAI
identity {
type = "SystemAssigned"
}
tags = {
Project = "Internal-RAG"
}
}
# Output the Search Admin Key (for your App/Chatbot)
output "search_admin_key" {
value = azurerm_search_service.search.primary_key
sensitive = true
}
output "search_endpoint" {
value = "https://${azurerm_search_service.search.name}.search.windows.net"
}

How it all fits together (The Finished Product)

You now have a fully automated infrastructure code that creates:

  1. ADLS Gen2: Where your raw internal PDFs and Docs are stored.
  2. Azure Data Factory: The engine that “watches” for new files and tells the system to update.
  3. Azure Databricks: The brain that cleans data and can perform complex AI “chunking” of your documents.
  4. Azure AI Search: The searchable index that your chatbot queries to get answers.

2026 Pro-Tip: “Integrated Vectorization”

In 2026, you don’t actually need to write Python code to turn text into vectors. You can configure Azure AI Search to connect directly to your ADLS (using the adlsgen2 data source type) and it will:

  • Automatically “crack” the PDFs.
  • Chunk the text.
  • Call Azure OpenAI to create vectors.
  • Keep the index updated whenever a file in ADLS changes.

Final Step for Security

Since this is an Internal system, remember to go into the Azure Portal (or add to Terraform) a Role Assignment. You must give the Search Service’s Managed Identity the role of Storage Blob Data Reader on your ADLS account so it can “read” the files to index them.

IV. Terraform code to add the LLM to this stack as well

To finish the “Brain” of your RAG system, we will add Azure OpenAI. This provides the Embedding Model (to turn text into math/vectors) and the GPT Model (to actually talk to your employees).

In 2026, the best practice is to deploy these as “Account Deployments” within a single Azure OpenAI resource.


7. Azure OpenAI Resource & Models

Add this to your main.tf. This script creates the AI service and deploys two specific models:

  1. text-embedding-3-large: For high-accuracy vector search.
  2. gpt-4o: The “Omni” model for fast, reasoning-based chat.

Terraform

# 7. Azure OpenAI Service
resource "azurerm_cognitive_account" "openai" {
name = "oai-internal-rag-brain"
location = "East US" # Check availability for gpt-4o in your region
resource_group_name = azurerm_resource_group.main.name
kind = "OpenAI"
sku_name = "S0"
identity {
type = "SystemAssigned"
}
}
# Deploy the Embedding Model (The "Encoder")
resource "azurerm_cognitive_deployment" "embedding" {
name = "text-embedding-3-large"
cognitive_account_id = azurerm_cognitive_account.openai.id
model {
format = "OpenAI"
name = "text-embedding-3-large"
version = "1"
}
sku {
name = "Standard"
capacity = 50 # Tokens per minute (adjust based on need)
}
}
# Deploy the Chat Model (The "Talker")
resource "azurerm_cognitive_deployment" "gpt4" {
name = "gpt-4o"
cognitive_account_id = azurerm_cognitive_account.openai.id
model {
format = "OpenAI"
name = "gpt-4o"
version = "2024-05-13"
}
sku {
name = "Standard"
capacity = 30
}
}
# Output the OpenAI Endpoint
output "openai_endpoint" {
value = azurerm_cognitive_account.openai.endpoint
}

The Completed 2026 RAG Infrastructure Map

Now that your Terraform is complete, here is how the data flows through the resources you just built:

  1. Ingest: ADF detects a new PDF in ADLS Gen2 (Bronze).
  2. Process: Databricks chunks the text and uses the OpenAI Embedding Model to vectorize it.
  3. Index: The vectors and text are pushed into Azure AI Search.
  4. Query: An employee asks a question in your Chatbot UI.
  5. Retrieve: Azure AI Search finds the relevant chunks.
  6. Augment/Generate: Your app sends the chunks + the question to GPT-4o, which gives the final answer.

One Final Security Step (The “Handshake”)

For an Internal system, you must ensure Azure AI Search has permission to talk to Azure OpenAI to handle “Integrated Vectorization.”

You can add this Terraform block to grant that permission automatically:

Terraform

# Give AI Search permission to use OpenAI
resource "azurerm_role_assignment" "search_to_openai" {
scope = azurerm_cognitive_account.openai.id
role_definition_name = "Cognitive Services OpenAI User"
principal_id = azurerm_search_service.search.identity[0].principal_id
}

Final Checklist for Deployment:

  • [ ] Region: Ensure all services (especially OpenAI and AI Search) are in the same region to minimize latency and data egress costs.
  • [ ] Quotas: Check your Azure subscription for “Cognitive Services” quotas; sometimes new subscriptions have a limit on GPT-4o capacity.
  • [ ] Cost: Remember to turn off your Databricks clusters when not in use—they are the most expensive part of this stack!

V. Infrastructure is defined, the Python/Spark code you’ll need inside Databricks to start processing those internal documents

To get your internal RAG system running, you need a Databricks Notebook that acts as the bridge between your documents in ADLS and the searchable index in Azure AI Search.

In 2026, the most efficient way to do this is using Spark for parallel processing and the langchain-azure-ai library for the AI logic.


1. Library Installation

Run this in the first cell of your Databricks notebook to install the 2026-standard libraries.

Python

%pip install -U langchain-azure-ai azure-search-documents azure-identity pypdf
dbutils.library.restartPython()

2. The Ingestion Script (PySpark)

This script performs the “Heavy Lifting”: it reads PDFs from your Bronze folder, chunks them, and sends them to the AI Search Index.

Python

import os
from pyspark.sql.functions import col, pandas_udf
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_azure_ai.embeddings import AzureOpenAIEmbeddings
from langchain_azure_ai.vectorstores import AzureAISearch
# --- Configuration ---
STORAGE_PATH = "abfss://bronze-raw@stdatalakeprod001.dfs.core.windows.net/hr-manuals/"
SEARCH_ENDPOINT = "https://ais-internal-rag-lookup.search.windows.net"
SEARCH_KEY = dbutils.secrets.get(scope="rag-scope", key="search-admin-key")
# 1. Load Data from ADLS
# Using Spark to list all PDF files in the lake
df = spark.read.format("binaryFile").option("pathGlobFilter", "*.pdf").load(STORAGE_PATH)
# 2. Extract and Chunk Text
# (Simplification: In prod, use 'spark-pdf' or 'Azure AI Document Intelligence')
def process_pdf(content):
import io
from pypdf import PdfReader
reader = PdfReader(io.BytesIO(content))
text = ""
for page in reader.pages:
text += page.extract_text()
# Split into 1000-character chunks with overlap for context
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
return splitter.split_text(text)
# 3. Create Embeddings & Push to Azure AI Search
embeddings = AzureOpenAIEmbeddings(
azure_deployment="text-embedding-3-large",
azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"]
)
vector_store = AzureAISearch(
threading=True,
azure_search_endpoint=SEARCH_ENDPOINT,
azure_search_key=SEARCH_KEY,
index_name="internal-docs-index",
embedding_function=embeddings.embed_query
)
# Convert PDF data to chunks and add to Vector Store
for row in df.collect():
chunks = process_pdf(row.content)
# Metadata helps with "Security Trimming" later
metadata = [{"source": row.path, "id": f"{row.path}_{i}"} for i in range(len(chunks))]
vector_store.add_texts(chunks, metadatas=metadata)
print("Indexing Complete!")

3. Key 2026 Features in this Code

  • RecursiveCharacterTextSplitter: This ensures that chunks don’t break in the middle of a word, which helps the LLM maintain “contextual integrity.”
  • text-embedding-3-large: We use the latest 2026 embedding model because it supports MRL (Matryoshka Representation Learning), allowing you to store smaller vectors without losing accuracy—saving you money on storage.
  • Managed Identity: In a real production setup, you would replace SEARCH_KEY with DefaultAzureCredential() so no keys are ever visible in the code.

4. How the Chatbot Queries this Data

Once that script runs, your “Search Index” is full. When a user asks a question, your backend app (likely a Python Web App or Azure Function) runs this simple logic:

Python

# The "Retriever"
query = "What is the policy on maternity leave?"
docs = vector_store.similarity_search(
query,
k=3,
search_type="hybrid" # 2026 standard: Vector + Keyword
)
# The "Generator"
# Pass 'docs' to GPT-4o to get the final human-friendly answer

Summary of your Pipeline

  1. ADF detects the file.
  2. Databricks (using the code above) reads the file.
  3. Databricks chunks it and calculates the math (embeddings).
  4. Azure AI Search stores the result.

Your infrastructure and code are now fully aligned, setting up the User Interface (like a Streamlit or React app) to actually talk to this bot

For the final piece of your internal RAG system, you need a user-friendly interface. In 2026, there are two primary paths: the “Pro-Code” way (custom and flexible) and the “Low-Code” way (fast and integrated).


Option 1: The “Low-Code” Way (Azure AI Foundry)

If you want a professional-grade internal portal in under 10 minutes, use the Azure AI Foundry (formerly AI Studio) “Deploy to Web App” feature.

  1. Go to Azure AI Foundry: Open your project.
  2. Open Playground: Select the “Chat” playground.
  3. Setup RAG: Under “Add your data,” select your Azure AI Search index.
  4. Deploy: Click the Deploy to button and select “as a web app”.
    • Azure will automatically provision an Azure App Service.
    • It creates a beautiful, ChatGPT-like interface that is pre-connected to your data.
    • Security: It integrates with Microsoft Entra ID out of the box, so only your employees can log in.

Option 2: The “Pro-Code” Way (Python + Streamlit)

If you want to customize the branding or add specific internal tools (like a “Feedback” button), Streamlit is the industry standard for 2026.

The app.py logic:

Python

import streamlit as st
from langchain_azure_ai.chat_models import AzureChatOpenAI
from langchain_azure_ai.vectorstores import AzureAISearch
from langchain_azure_ai.embeddings import AzureOpenAIEmbeddings
st.set_page_config(page_title="Internal HR Assistant", layout="wide")
st.title("🤖 Internal Knowledge Bot")
# 1. Initialize Connection
embeddings = AzureOpenAIEmbeddings(azure_deployment="text-embedding-3-large")
vector_store = AzureAISearch(
azure_search_endpoint=st.secrets["SEARCH_ENDPOINT"],
index_name="internal-docs-index",
embedding_function=embeddings.embed_query
)
llm = AzureChatOpenAI(azure_deployment="gpt-4o", temperature=0)
# 2. Chat Interface
if "messages" not in st.session_state:
st.session_state.messages = []
for message in st.session_state.messages:
with st.chat_message(message["role"]):
st.markdown(message["content"])
if prompt := st.chat_input("Ask me about company policy..."):
st.session_state.messages.append({"role": "user", "content": prompt})
# 3. The RAG Step: Retrieve relevant chunks
docs = vector_store.similarity_search(prompt, k=3)
context = "\n".join([d.page_content for d in docs])
# 4. The RAG Step: Generate Answer
full_prompt = f"Use the following context to answer: {context}\n\nQuestion: {prompt}"
response = llm.invoke(full_prompt).content
with st.chat_message("assistant"):
st.markdown(response)
with st.expander("View Sources"):
for d in docs:
st.write(f"- {d.metadata['source']}")
st.session_state.messages.append({"role": "assistant", "content": response})

3. Critical 2026 UX Features for Internal Bots

To make your bot actually useful for employees, ensure your UI includes:

  • Citations: Never show an answer without a link to the source document in ADLS. This builds trust and allows employees to verify the info.
  • “I Don’t Know” Guardrail: Explicitly tell the LLM in the system prompt: “If the answer is not in the provided documents, say you don’t know. Do not make up facts.”
  • Feedback Loop: Add a Thumbs Up/Down on every response. You can pipe this feedback back into ADF to log which documents are confusing the AI.

Summary of your Journey

You have successfully moved from raw data to a live AI assistant:

  1. Infrastructure: Deployed via Terraform.
  2. Pipeline: Data moved by ADF into ADLS.
  3. Processing: Cleaned and Vectorized in Databricks.
  4. Retrieval: Indexed and ranked in Azure AI Search.
  5. Interface: Served via App Service or Streamlit.

VII. how to monitor the costs of this stack so you don’t get a surprise bill at the end of the month

Building a RAG system in 2026 is an exercise in consumption-based engineering. Because every PDF page processed and every chat message sent has a price tag, monitoring isn’t just “good hygiene”—it’s a requirement to keep the project alive.

Here is how to monitor and optimize the costs of your specific ADF-ADLS-Databricks-AI Search stack.


1. The “Big Three” Cost Drivers

In your internal RAG architecture, these three will typically account for 90% of your bill:

ServiceThe Cost Driver2026 Pro-Tip
DatabricksCluster uptime (DBUs)Use Serverless Compute for job runs. It scales to zero the second the processing finishes.
AI SearchSearch Units (SUs)Start with the Basic tier. Don’t move to Standard until your document count exceeds 15GB or 1 million chunks.
Azure OpenAIToken ConsumptionUse gpt-4o-mini for simple summarization and only use gpt-4o for complex reasoning to save up to 80% on tokens.

2. Setting Up “Hard” Guardrails (Azure Budgets)

Don’t wait for the monthly invoice. Set up an automated kill-switch.

  1. Create a Resource Group Budget: Put all your RAG resources (ADF, ADLS, etc.) in one Resource Group.
  2. Set Thresholds: * 50%: Send an email to the team.
    • 90%: Send a high-priority alert to the Manager.
    • 100% (The Nuclear Option): In 2026, you can trigger an Azure Automation Runbook that programmatically disables the Azure OpenAI API keys, instantly stopping further spending.

3. Optimization Checklist by Service

Azure Data Factory (ADF)

  • Data Integration Units (DIUs): When copying files from SharePoint/On-prem to ADLS, ADF defaults to 4 DIUs. For small internal docs, manually set this to 2 to halve the copy cost.
  • Avoid Over-Polling: Set your triggers to “Tumbling Window” or “Storage Event” rather than “Schedule” (e.g., checking every 1 minute) to reduce trigger run costs.

Azure Databricks

  • Auto-Termination: Ensure your clusters are set to terminate after 10 minutes of inactivity.
  • Photon Engine: Turn on the Photon query engine. While it costs slightly more per hour, it processes data so much faster that the total cost of the job is usually lower.

Azure Data Lake (ADLS)

  • Lifecycle Management: Set a policy to move files from Hot to Cold storage if they haven’t been accessed in 30 days. Your “Raw/Bronze” data almost never needs to be in the Hot tier.

Azure AI Search

  • Image Cracking: If your PDFs contain images, “Document Cracking” costs extra ($1 per 1,000 images). If you don’t need to “read” charts or photos, disable image extraction in your indexer settings.

4. 2026 “FinOps for AI” Dashboard

The most effective way to stay under budget is to build a Cost Visibility Dashboard.

  • Tagging: Ensure every resource in your Terraform code has a Project: Internal-RAG and Department: HR tag.
  • Cost Analysis: Use the Azure Cost Management tool to group by “Tag.” This allows you to show your stakeholders exactly how much the “HR Bot” is costing per department.

Summary

If you are just starting, your daily “Idle” cost (with everything turned on but no one using it) should be roughly $5–$10/day. If it’s higher, check if a Databricks cluster was left running or if you deployed a “Standard” tier AI Search unnecessarily.

Since we’ve covered the infrastructure, the code, the UI, and the costs, is there any specific part of the internal security (like preventing certain employees from seeing certain files) you’d like to dive into?

Azure Data Lake Storage (ADLS)

If ADF is the plumbing and Databricks is the engine, Azure Data Lake Storage (ADLS) Gen2 is the actual physical warehouse where everything is kept.

In 2026, it remains the standard for “Big Data” because it combines the cheap, limitless nature of Cloud Object Storage with the high-speed organization of a File System.


1. The Secret Sauce: Hierarchical Namespace (HNS)

Standard cloud storage (like Azure Blob or Amazon S3) is “flat.” If you have a file at /logs/2026/March/data.csv, the computer sees that whole string as one long name. To move a folder, it has to copy every single file inside it.

With ADLS Gen2, folders are “real” (Hierarchical Namespace).

  • Rename/Move: Renaming a folder with 10 million files is instantaneous because it just changes one reference, not 10 million files.
  • Performance: When a tool like Databricks or Spark asks for “all files in the March folder,” ADLS knows exactly where they are without searching through the entire lake.

2. The Storage Tiers (Cost Savings)

You don’t pay the same price for all data. ADLS allows you to move data between “Tiers” automatically based on how often you touch it:

  • Hot Tier: Highest cost to store, lowest cost to access. Use this for data you are actively processing in your RAG pipeline today.
  • Cool/Cold Tier: Lower storage cost, but you pay a fee to read it. Great for data from last month.
  • Archive Tier: Dirt cheap (pennies per GB). The data is “offline”—it can take a few hours to “rehydrate” it so you can read it again. Perfect for legal compliance backups.

3. Security (ACLs vs. RBAC)

For your Internal RAG system, this is the most important part of ADLS. It uses two layers of security:

  1. RBAC (Role-Based Access Control): Broad permissions (e.g., “John is a Storage Contributor”).
  2. ACLs (Access Control Lists): POSIX-style permissions. You can say “John can see the ‘Public’ folder, but only HR can see the ‘Salaries’ folder.” 2026 Update: Azure AI Search now “respects” these ACLs. If you index files from ADLS, the search results will automatically hide files that the logged-in user doesn’t have permission to see in the Data Lake.

4. ADLS Gen2 vs. Microsoft Fabric OneLake

You might hear about OneLake (the “OneDrive for data”). Here is how to tell them apart in 2026:

  • ADLS Gen2: The “Infrastructure” choice. You have full control over networking, encryption keys, and regions. Best for custom data engineering and Databricks heavy-lifters.
  • OneLake: The “SaaS” choice. It is actually built on top of ADLS, but it manages the folders and permissions for you automatically within Microsoft Fabric.

Summary Checklist

  • Format: Use Delta or Parquet for your “Silver” and “Gold” layers. These are compressed and optimized for AI and BI.
  • Structure: Always follow the Bronze -> Silver -> Gold folder structure to keep your lake from becoming a “data swamp.”
  • Access: Use Managed Identities so ADF and Databricks can talk to ADLS without you ever having to copy-paste a password or a secret key.

Are you planning to manage the ADLS folders yourself, or is your company moving toward a managed environment like Microsoft Fabric?

Azure Databricks

In 2026, Azure Databricks is much more than just a “data processing tool.” It is now positioned as a Data Intelligence Platform. While it’s still based on Apache Spark, it has evolved to use AI to help you manage your data, write your code, and govern your security.

Think of it as the high-performance engine of your data factory.


1. The Core Technology: Spark + Delta Lake

At its heart, Databricks does two things exceptionally well:

  • Apache Spark: A distributed computing engine. If you have 100TB of data, Databricks breaks it into 1,000 tiny pieces and processes them all at the same time across a “cluster” of computers.
  • Delta Lake: This is the storage layer that sits on top of your ADLS. it gives your “data lake” (files) the powers of a “database” (tables), allowing for things like Undo (Time Travel) and ACID transactions (ensuring data isn’t corrupted if a write fails).

2. New in 2026: The “Intelligence” Layer

The biggest shift recently is that Databricks now uses AI to run its own infrastructure:

  • Genie Code (formerly Databricks Assistant): An agentic AI built into the notebooks. You can type “Clean this table and create a vector index for my RAG bot,” and it will write and execute the Spark code for you.
  • Serverless Compute: You no longer need to “size” clusters (deciding how many CPUs/RAM). You just run your code, and Databricks instantly scales the hardware up or down, charging you only for the seconds the code is running.
  • Liquid Clustering: In the past, data engineers had to manually “partition” data to keep it fast. Now, Databricks uses AI to automatically reorganize data based on how you query it, making searches up to 12x faster.

3. How it fits your RAG System

For your internal chatbot, Databricks is the “Processor” that prepares your data for Azure AI Search:

  1. Parsing: It opens your internal PDFs/Word docs from ADLS.
  2. Chunking: It breaks the text into logical paragraphs.
  3. Embedding: It calls an LLM (like OpenAI) to turn those paragraphs into Vectors.
  4. Syncing: It pushes those vectors into your Search Index.

4. Databricks vs. The Competition (2026)

FeatureAzure DatabricksMicrosoft FabricAzure SQL
Best ForHeavy Data Engineering & AIBusiness Intelligence (BI)App Backend / Small Data
LanguagePython, SQL, Scala, RMostly SQL & Low-CodeSQL
Philosophy“Open” (Files in your ADLS)“SaaS” (Everything managed)“Relational” (Strict tables)
PowerUnlimited (Petabyte scale)High (Enterprise scale)Medium (GB to low TB)

5. Unity Catalog (The “Traffic Cop”)

In an internal setting, Unity Catalog is the most important part of Databricks. It provides a single place to manage permissions. If you grant a user access to a table in Databricks, those permissions follow the data even if it’s moved or mirrored into other services like Power BI or Microsoft Fabric.

Summary

  • Use ADF to move the data.
  • Use ADLS to store the data.
  • Use Databricks to do the “heavy thinking,” cleaning, and AI vectorization.
  • Use Azure SQL / AI Search to give the data to your users/bot.

Azure data ecosystem

In the Azure data ecosystem, these four services form the “Modern Data Stack.” They work together to move, store, process, and serve data. If you think of your data as water, this ecosystem is the plumbing, the reservoir, the filtration plant, and the tap.


1. ADLS Gen2 (The Reservoir)

Azure Data Lake Storage Gen2 is the foundation. It is a highly scalable, cost-effective storage space where you keep all your data—structured (tables), semi-structured (JSON/Logs), and unstructured (PDFs/Images).

  • Role: The single source of truth (Data Lake).
  • Key Feature: Hierarchical Namespace. Unlike standard “flat” cloud storage, it allows for folders and subfolders, which makes data access much faster for big data analytics.
  • 2026 Context: It serves as the “Bronze” (Raw) and “Silver” (Filtered) layers in a Medallion Architecture.

2. ADF (The Plumbing & Orchestrator)

Azure Data Factory is the glue. It doesn’t “own” the data; it moves it from point A to point B and tells other services when to start working.

  • Role: ETL/ELT Orchestration. It pulls data from on-premises servers or APIs and drops it into ADLS.
  • Key Feature: Low-code UI. You build “Pipelines” using a drag-and-drop interface.
  • Integration: It often has a “trigger” that tells Databricks: “I just finished moving the raw files to ADLS, now go clean them.”

3. Azure Databricks (The Filtration Plant)

Azure Databricks is where the heavy lifting happens. It is an Apache Spark-based platform used for massive-scale data processing, data science, and machine learning.

  • Role: Transformation & Analytics. It takes the messy data from ADLS and turns it into clean, aggregated “Gold” data.
  • Key Feature: Notebooks. Engineers write code (Python, SQL, Scala) in a collaborative environment.
  • 2026 Context: It is the primary engine for Vectorization in RAG systems—turning your internal documents into mathematical vectors for AI Search.

4. Azure SQL (The Tap)

Azure SQL Database (or Azure Synapse) is the final destination for business users. While ADLS is great for “big data,” it’s not the best for a quick dashboard or a mobile app.

  • Role: Data Serving. It stores the final, “Gold” level data that has been cleaned and structured.
  • Key Feature: High Performance for Queries. It is optimized for Power BI reports and standard business applications.
  • Usage: After Databricks cleans the data, it saves the final results into Azure SQL so the CEO can see a dashboard the next morning.

How they work together (The Flow)

StepServiceAction
1. IngestADFCopies logs from an on-prem server to the cloud.
2. StoreADLSHolds the raw .csv files in a “Raw” folder.
3. ProcessDatabricksReads the .csv, removes duplicates, and calculates monthly totals.
4. ServeAzure SQLThe cleaned totals are loaded into a SQL table.
5. VisualizePower BIConnects to Azure SQL to show a “Sales Revenue” chart.

Summary Table

ServicePrimary Skill NeededBest For…
ADFLogic / Drag-and-DropMoving data & scheduling tasks.
ADLSFolder OrganizationStoring massive amounts of any data type.
DatabricksPython / SQL / SparkComplex math, AI, and cleaning big data.
Azure SQLStandard SQLPowering apps and BI dashboards.

To explain the pipeline between these four, we use the Medallion Architecture. This is the industry-standard way to move data from a “raw” state to an “AI-ready” or “Business-ready” state.


Phase 1: Ingestion (The “Collector”)

  • Services: ADF + ADLS Gen2 (Bronze Folder)
  • The Action: ADF acts as the trigger. It connects to your external source (like an internal SAP system, a REST API, or a local SQL Server).
  • The Result: ADF “copies” the data exactly as it is—warts and all—into the Bronze container of your ADLS.
  • Why? You always keep a raw copy. If your logic fails later, you don’t have to go back to the source; you just restart from the Bronze folder.

Phase 2: Transformation (The “Refinery”)

  • Services: Databricks + ADLS Gen2 (Silver Folder)
  • The Action: ADF sends a signal to Databricks to start a “Job.” Databricks opens the raw files from the Bronze folder.
    • It filters out null values.
    • It fixes date formats (e.g., changing 01-03-26 to 2026-03-01).
    • It joins tables together.
  • The Result: Databricks writes this “clean” data into the Silver container of your ADLS, usually in Delta format (a high-performance version of Parquet).

Phase 3: Aggregation & Logic (The “Chef”)

  • Services: Databricks + ADLS Gen2 (Gold Folder)
  • The Action: Databricks runs a second set of logic. Instead of just cleaning data, it calculates things. It creates “Gold” tables like Monthly_Sales_Summary or Employee_Vector_Embeddings.
  • The Result: These high-value tables are stored in the Gold container. This data is now perfect.

Phase 4: Serving (The “Storefront”)

  • Services: Azure SQL
  • The Action: ADF runs one final “Copy Activity.” it takes the small, aggregated tables from the Gold folder in ADLS and pushes them into Azure SQL Database.
  • The Result: Your internal dashboard (Power BI) or your Chatbot’s metadata storage connects to Azure SQL. Because the data is already cleaned and summarized, the dashboard loads instantly.

The Complete Workflow Summary

StageData StateTool in ChargeWhere it Sits
IngestRaw / MessyADFADLS (Bronze)
CleanFiltered / StandardizedDatabricksADLS (Silver)
ComputeAggregated / Business LogicDatabricksADLS (Gold)
ServeFinal Tables / Ready for UIADFAzure SQL

How this connects to your RAG Chatbot:

In your specific case, Databricks is the MVP. It reads the internal PDFs from the Silver folder, uses an AI model to turn the text into Vectors, and then you can either store those vectors in Azure SQL (if they are small) or send them straight to Azure AI Search.

Azure AI Search

Azure AI Search (formerly known as Azure Cognitive Search) is a high-performance, “search-as-a-service” platform designed to help developers build rich search experiences over private, heterogeneous content.

In the era of Generative AI, it has become the industry standard for Retrieval-Augmented Generation (RAG), serving as the “knowledge base” that feeds relevant information to Large Language Models (LLMs) like GPT-4.


1. How It Works: The High-Level Flow

Azure AI Search acts as a middle layer between your raw data and your end-user application.

  1. Ingestion: It pulls data from sources like ADLS, Azure SQL, or Cosmos DB using “Indexers.”
  2. Enrichment (Cognitive Skills): During ingestion, it can use AI to “crack” documents—extracting text from images (OCR), detecting languages, or identifying key phrases.
  3. Indexing: It organizes this data into a highly optimized, searchable “Index.”
  4. Serving: Your app sends a query to the index and gets back ranked, relevant results.

2. Three Ways to Search

The real power of Azure AI Search is that it doesn’t just look for exact word matches; it understands intent.

Search TypeHow it WorksBest For…
Keyword (BM25)Traditional text matching. Matches “Apple” to “Apple.”Exact terms, serial numbers, product names.
Vector SearchUses mathematical “embeddings” to find conceptually similar items.“Frigid weather” matching “cold temperatures.”
Hybrid SearchThe Gold Standard. Runs Keyword and Vector search simultaneously and merges them.Providing the most accurate, context-aware results.

Pro Tip: Azure AI Search uses Semantic Ranking, which uses a secondary Llama-style model to re-rank the top results, ensuring the absolute best answer is at the very top.


3. Key Components

To set this up, you’ll interact with four main objects:

  • Data Source: The connection to your data (e.g., an Azure Blob Storage container).
  • Skillset: An optional set of AI steps (like “Translate” or “Chunking”) applied during indexing.
  • Index: The physical schema (the “table”) where the searchable data lives.
  • Indexer: The “engine” that runs on a schedule to keep the Index synced with the Data Source.

4. The “RAG” Connection

If you are building a chatbot, Azure AI Search is your Retriever.

  1. The user asks: “What is our policy on remote work?”
  2. Your app sends that question to Azure AI Search.
  3. The Search service finds the 3 most relevant paragraphs from your 500-page HR manual.
  4. Your app sends those 3 paragraphs to Azure OpenAI to summarize into a natural answer.

5. Why use it over a standard Database?

While SQL or Cosmos DB can do “searches,” Azure AI Search is specialized for:

  • Faceted Navigation: Those “Filter by Price” or “Filter by Category” sidebars you see on Amazon.
  • Synonyms: Knowing that “cell phone” and “mobile” mean the same thing.
  • Language Support: It handles word stemming and lemmatization for 50+ languages.
  • Scaling: It can handle millions of documents and thousands of queries per second without slowing down your primary database.

RAG (Retrieval-Augmented Generation)

To build a RAG (Retrieval-Augmented Generation) system using Azure Data Factory (ADF), Azure Data Lake Storage (ADLS), and Azure AI Search, you are essentially creating a two-part machine: a Data Ingestion Pipeline (The “Factory”) and a Search & LLM Orchestrator (The “Brain”).

Here is the modern 2026 blueprint for setting this up.


1. The High-Level Architecture

  1. ADLS Gen2: Acts as your “Landing Zone” for raw documents (PDFs, Office docs, JSON).
  2. ADF: Orchestrates the movement of data and triggers the “cracking” (parsing) of documents.
  3. Azure AI Search: Stores the Vector Index. It breaks documents into chunks, turns them into math (embeddings), and stores them for retrieval.
  4. Azure OpenAI / AI Studio: The LLM that reads the retrieved chunks and answers the user.

2. Step 1: The Ingestion Pipeline (ADF + ADLS)

You don’t want to manually upload files. ADF automates the flow.

  • The Trigger: Set up a Storage Event Trigger in ADF. When a new PDF is dropped into your ADLS raw-data container, the pipeline starts.
  • The Activity: Use a Copy Activity or a Web Activity.
    • Modern Approach: In 2026, the most efficient way is to use the Azure AI Search “Indexer.” You don’t necessarily need to “move” the data with ADF; instead, use ADF to tell Azure AI Search: “Hey, new data just arrived in ADLS, go index it now.”
  • ADF Pipeline Logic: 1. Wait for file in ADLS.2. (Optional) Use an Azure Function or AI Skillset to pre-process (e.g., stripping headers/footers).3. Call the Azure AI Search REST API to Run Indexer.

3. Step 2: The “Smart” Indexing (Azure AI Search)

This is where your data becomes “AI-ready.” Inside Azure AI Search, you must configure:

  • Crack & Chunk: Don’t index a 100-page PDF as one block. Use the Markdown/Text Splitter skill to break it into chunks (e.g., 500 tokens each).
  • Vectorization: Add an Embedding Skill. This automatically sends your text chunks to an embedding model (like text-embedding-3-large) and saves the resulting vector in the index.
  • Knowledge Base (New for 2026): Use the Agentic Retrieval feature. This allows the search service to handle “multi-step” queries (e.g., “Compare the 2025 and 2026 health plans”) by automatically breaking them into sub-queries.

4. Step 3: The Chatbot Logic (The RAG Loop)

When a user asks a question, your chatbot follows this “Search -> Ground -> Answer” flow:

StepAction
1. User Query“What is our policy on remote work?”
2. SearchApp sends query to Azure AI Search using Hybrid Search (Keyword + Vector).
3. RetrieveSearch returns the top 3-5 most relevant “chunks” of text.
4. AugmentYou create a prompt: “Answer the user based ONLY on this context: [Chunks]”
5. GenerateAzure OpenAI generates a natural language response.

5. Key 2026 Features to Use

  • Semantic Ranker: Always turn this on. It uses a high-powered model to re-sort your search results, ensuring the “Best” answer is actually #1 before it goes to the LLM.
  • Integrated Vectorization: In the past, you had to write custom Python code to create vectors. Now, Azure AI Search handles this internally via Integrated Vectorization—you just point it at your Azure OpenAI resource.
  • OneLake Integration: If you are using Microsoft Fabric, you can now link OneLake directly to AI Search without any ETL pipelines at all.

Why use ADF instead of just uploading to Search?

  • Cleanup: You can use ADF to remove PII (Personal Identifiable Information) before it ever hits the AI Search index.
  • Orchestration: If your data comes from 10 different SQL databases and 50 SharePoint folders, ADF is the only way to centralize it into the Data Lake for indexing.