Enterprise RAG Pipeline & Internal AI Assistant Azure Ecosystem: ADF, ADLS Gen2, Databricks, AI Search, OpenAI


1. The Project Title

Enterprise RAG Pipeline & Internal AI Assistant Azure Ecosystem: ADF, ADLS Gen2, Databricks, AI Search, OpenAI


2. Impact-Driven Bullet Points

Use the C-A-R (Context-Action-Result) method. Choose 3-4 from this list:

  • Architecture: Architected and deployed a multi-stage data lake (Medallion Architecture) using ADLS Gen2 and Terraform, reducing data fragmentation across internal departments.
  • Orchestration: Developed automated Azure Data Factory (ADF) pipelines with event-based triggers to ingest and preprocess 5,000+ internal documents (PDF/Office) with 99% reliability.
  • AI Engineering: Built a Databricks processing engine to perform recursive character chunking and vector embedding using text-embedding-3-large, optimizing retrieval context for a GPT-4o powered chatbot.
  • Search Optimization: Implemented Hybrid Search (Vector + Keyword) and Semantic Ranking in Azure AI Search, improving answer relevance by 35% compared to traditional keyword-only search.
  • Security & Governance: Integrated Microsoft Entra ID and ACL-based Security Trimming to ensure the AI assistant respects document-level permissions, preventing unauthorized data exposure.
  • Cost Management: Optimized cloud spend by 40% through Databricks Serverless compute and automated ADLS Lifecycle Management policies (Hot-to-Cold tiering).

3. Skills Section (Keywords for ATS)

  • Cloud & Data: Azure Data Factory (ADF), ADLS Gen2, Azure Databricks, Spark (PySpark), Medallion Architecture, Delta Lake.
  • AI & Search: Retrieval-Augmented Generation (RAG), Azure AI Search, Vector Databases, Semantic Ranking, Hybrid Retrieval.
  • LLMs: Azure OpenAI (GPT-4o), Embeddings, Prompt Engineering, LangChain/LlamaIndex.
  • DevOps/IaC: Terraform, Azure DevOps (CI/CD), Managed Identities, Unity Catalog.

4. The “Interview Hook”

In your Professional Summary or Project Description, add one sentence that proves you know the real-world challenges of AI:

“Implemented a production-ready RAG system that solves for LLM hallucinations by enforcing strict citation requirements and PII redaction during the ingestion phase.”


Pro-Tip for 2026:

Hiring managers currently care deeply about “Day 2 Operations.” If they ask about this project in an interview, mention how you monitored it for Cost (Azure Budgets) and Quality (using an evaluation framework like Ragas or Azure AI Content Safety). This proves you aren’t just a “tutorial follower” but a production-ready engineer.

To deploy Azure Databricks using Terraform, you need to set up three main components: a Resource Group, the Databricks Workspace, and the Network Security Group (optional but recommended).

Below is a clean, modular example. This configuration uses the “standard” pricing tier, which is usually sufficient for internal RAG testing.

1. The Terraform Configuration (main.tf)

Terraform

# 1. Define the Provider
terraform {
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "~> 3.0" # Use the latest stable 3.x version
}
}
}
provider "azurerm" {
features {}
}
# 2. Create a Resource Group
resource "azurerm_resource_group" "rg" {
name = "rg-databricks-internal-rag"
location = "East US"
}
# 3. Create the Azure Databricks Workspace
resource "azurerm_databricks_workspace" "example" {
name = "dbw-internal-ai-processor"
resource_group_name = azurerm_resource_group.rg.name
location = azurerm_resource_group.rg.location
sku = "premium" # Premium is required for Unity Catalog & Security Trimming
# Managed Resource Group is where Databricks puts its internal VMs/Storage
managed_resource_group_name = "rg-dbw-managed-resources"
tags = {
Environment = "Dev"
Project = "Internal-RAG"
}
}
# 4. Output the Workspace URL (You'll need this to log in)
output "databricks_host" {
value = azurerm_databricks_workspace.example.workspace_url
}

2. Crucial Architecture Notes for Terraform

  • The Managed Resource Group: Databricks is a “Managed Service.” When you run the code above, Azure will automatically create a second resource group (e.g., rg-dbw-managed-resources). You should not manually delete items inside that group; Terraform and Databricks manage them.
  • SKU Selection:
    • Standard: Cheaper, but lacks advanced security.
    • Premium: Required if you want to use Unity Catalog or User-level Permissions, which are essential for your internal RAG system’s security.
  • VNet Injection (Advanced): If your company requires the Databricks clusters to stay entirely off the public internet, you would need to add a custom_parameters block to the resource to “inject” it into your own Virtual Network.

3. How to Deploy

  1. Initialize: Download the Azure providers.Bashterraform init
  2. Plan: Preview what will be created.Bashterraform plan
  3. Apply: Deploy to Azure (you will be prompted to type yes).Bashterraform apply

4. Next Steps for your RAG Pipeline

Once this workspace is deployed, you will likely want to use Terraform to create Personal Access Tokens (PATs) or Service Principals so that Azure Data Factory (ADF) has permission to trigger Databricks jobs.

II . Add the ADLS Storage Account and ADF to this same Terraform script so they are all connected automatically

This is the “Enterprise Starter Kit” in Terraform. This code creates the resource group, the ADLS Gen2 storage (the reservoir), the Data Factory (the pipes), and the Databricks Workspace (the engine), then connects them so they can talk to each other.

The Integrated Terraform Script

Terraform

# 1. Resource Group
resource "azurerm_resource_group" "main" {
name = "rg-internal-rag-prod"
location = "East US"
}
# 2. ADLS Gen2 Storage (The Warehouse)
resource "azurerm_storage_account" "adls" {
name = "stdatalakeprod001" # Must be unique globally
resource_group_name = azurerm_resource_group.main.name
location = azurerm_resource_group.main.location
account_tier = "Standard"
account_replication_type = "LRS"
account_kind = "StorageV2"
is_hns_enabled = true # THIS TURNS IT INTO ADLS GEN2
}
resource "azurerm_storage_data_lake_gen2_filesystem" "raw" {
name = "bronze-raw"
storage_account_id = azurerm_storage_account.adls.id
}
# 3. Azure Data Factory (The Orchestrator)
resource "azurerm_data_factory" "adf" {
name = "adf-internal-rag-orchestrator"
location = azurerm_resource_group.main.location
resource_group_name = azurerm_resource_group.main.name
identity {
type = "SystemAssigned" # Allows ADF to talk to others without passwords
}
}
# 4. Databricks Workspace (The Processor)
resource "azurerm_databricks_workspace" "dbw" {
name = "dbw-internal-ai-engine"
resource_group_name = azurerm_resource_group.main.name
location = azurerm_resource_group.main.location
sku = "premium" # Required for security trimming/Unity Catalog
}
# 5. CONNECTING THEM: ADF Linked Service to Databricks
# This tells ADF how to trigger Databricks jobs
resource "azurerm_data_factory_linked_service_azure_databricks" "adf_to_dbw" {
name = "LS_Databricks_Connection"
data_factory_id = azurerm_data_factory.adf.id
adb_domain = "https://${azurerm_databricks_workspace.dbw.workspace_url}"
# Using Managed Identity for security (no keys stored!)
msi_work_space_resource_id = azurerm_databricks_workspace.dbw.id
new_cluster_config {
node_type = "Standard_DS3_v2"
cluster_version = "13.3.x-scala2.12"
min_number_of_workers = 1
max_number_of_workers = 2
}
}

Why this setup is “RAG-Ready”:

  • is_hns_enabled = true: This is the most critical line. Without it, you just have a slow, flat blob storage. This enables the high-speed folder structure for your data lake.
  • Managed Identity (SystemAssigned): Notice there are no passwords or “Access Keys” in this code. ADF uses its own identity to log into Databricks. This is the gold standard for internal security.
  • sku = "premium": In an internal environment, you need “Premium” so you can enable Unity Catalog. This ensures that if a user doesn’t have permission to see an HR file in the Data Lake, the Databricks AI code won’t be able to read it either.
  • Automatic Cluster: The new_cluster_config block tells Data Factory to spin up a Databricks cluster only when it’s needed and shut it down immediately after, saving your department significant money.

How to use this:

  1. Save this as main.tf.
  2. Run terraform init and terraform apply.
  3. Once finished, go to the Azure Portal and you will see your entire data ecosystem ready to be loaded with documents.

III. Terraform code to deploy the Azure AI Search service to complete the RAG stack

To complete your Internal RAG stack, we need to add Azure AI Search. In 2026, this service is the “Vector Database” that allows your chatbot to find the right paragraphs within your internal documents.

The Final Terraform Piece: Azure AI Search

Add this block to your existing main.tf file. It deploys the Search service with Semantic Ranking enabled—which is the “secret sauce” for high-accuracy internal bots.

Terraform

# 6. Azure AI Search Service
resource "azurerm_search_service" "search" {
name = "ais-internal-rag-lookup"
resource_group_name = azurerm_resource_group.main.name
location = azurerm_resource_group.main.location
# 'basic' or 'standard' is required for Semantic Ranker and Vector Search
sku = "basic"
# Enabling Semantic Search (Standard plan for the ranker)
semantic_search_sku = "standard"
# Identity for secure connection to ADLS and OpenAI
identity {
type = "SystemAssigned"
}
tags = {
Project = "Internal-RAG"
}
}
# Output the Search Admin Key (for your App/Chatbot)
output "search_admin_key" {
value = azurerm_search_service.search.primary_key
sensitive = true
}
output "search_endpoint" {
value = "https://${azurerm_search_service.search.name}.search.windows.net"
}

How it all fits together (The Finished Product)

You now have a fully automated infrastructure code that creates:

  1. ADLS Gen2: Where your raw internal PDFs and Docs are stored.
  2. Azure Data Factory: The engine that “watches” for new files and tells the system to update.
  3. Azure Databricks: The brain that cleans data and can perform complex AI “chunking” of your documents.
  4. Azure AI Search: The searchable index that your chatbot queries to get answers.

2026 Pro-Tip: “Integrated Vectorization”

In 2026, you don’t actually need to write Python code to turn text into vectors. You can configure Azure AI Search to connect directly to your ADLS (using the adlsgen2 data source type) and it will:

  • Automatically “crack” the PDFs.
  • Chunk the text.
  • Call Azure OpenAI to create vectors.
  • Keep the index updated whenever a file in ADLS changes.

Final Step for Security

Since this is an Internal system, remember to go into the Azure Portal (or add to Terraform) a Role Assignment. You must give the Search Service’s Managed Identity the role of Storage Blob Data Reader on your ADLS account so it can “read” the files to index them.

IV. Terraform code to add the LLM to this stack as well

To finish the “Brain” of your RAG system, we will add Azure OpenAI. This provides the Embedding Model (to turn text into math/vectors) and the GPT Model (to actually talk to your employees).

In 2026, the best practice is to deploy these as “Account Deployments” within a single Azure OpenAI resource.


7. Azure OpenAI Resource & Models

Add this to your main.tf. This script creates the AI service and deploys two specific models:

  1. text-embedding-3-large: For high-accuracy vector search.
  2. gpt-4o: The “Omni” model for fast, reasoning-based chat.

Terraform

# 7. Azure OpenAI Service
resource "azurerm_cognitive_account" "openai" {
name = "oai-internal-rag-brain"
location = "East US" # Check availability for gpt-4o in your region
resource_group_name = azurerm_resource_group.main.name
kind = "OpenAI"
sku_name = "S0"
identity {
type = "SystemAssigned"
}
}
# Deploy the Embedding Model (The "Encoder")
resource "azurerm_cognitive_deployment" "embedding" {
name = "text-embedding-3-large"
cognitive_account_id = azurerm_cognitive_account.openai.id
model {
format = "OpenAI"
name = "text-embedding-3-large"
version = "1"
}
sku {
name = "Standard"
capacity = 50 # Tokens per minute (adjust based on need)
}
}
# Deploy the Chat Model (The "Talker")
resource "azurerm_cognitive_deployment" "gpt4" {
name = "gpt-4o"
cognitive_account_id = azurerm_cognitive_account.openai.id
model {
format = "OpenAI"
name = "gpt-4o"
version = "2024-05-13"
}
sku {
name = "Standard"
capacity = 30
}
}
# Output the OpenAI Endpoint
output "openai_endpoint" {
value = azurerm_cognitive_account.openai.endpoint
}

The Completed 2026 RAG Infrastructure Map

Now that your Terraform is complete, here is how the data flows through the resources you just built:

  1. Ingest: ADF detects a new PDF in ADLS Gen2 (Bronze).
  2. Process: Databricks chunks the text and uses the OpenAI Embedding Model to vectorize it.
  3. Index: The vectors and text are pushed into Azure AI Search.
  4. Query: An employee asks a question in your Chatbot UI.
  5. Retrieve: Azure AI Search finds the relevant chunks.
  6. Augment/Generate: Your app sends the chunks + the question to GPT-4o, which gives the final answer.

One Final Security Step (The “Handshake”)

For an Internal system, you must ensure Azure AI Search has permission to talk to Azure OpenAI to handle “Integrated Vectorization.”

You can add this Terraform block to grant that permission automatically:

Terraform

# Give AI Search permission to use OpenAI
resource "azurerm_role_assignment" "search_to_openai" {
scope = azurerm_cognitive_account.openai.id
role_definition_name = "Cognitive Services OpenAI User"
principal_id = azurerm_search_service.search.identity[0].principal_id
}

Final Checklist for Deployment:

  • [ ] Region: Ensure all services (especially OpenAI and AI Search) are in the same region to minimize latency and data egress costs.
  • [ ] Quotas: Check your Azure subscription for “Cognitive Services” quotas; sometimes new subscriptions have a limit on GPT-4o capacity.
  • [ ] Cost: Remember to turn off your Databricks clusters when not in use—they are the most expensive part of this stack!

V. Infrastructure is defined, the Python/Spark code you’ll need inside Databricks to start processing those internal documents

To get your internal RAG system running, you need a Databricks Notebook that acts as the bridge between your documents in ADLS and the searchable index in Azure AI Search.

In 2026, the most efficient way to do this is using Spark for parallel processing and the langchain-azure-ai library for the AI logic.


1. Library Installation

Run this in the first cell of your Databricks notebook to install the 2026-standard libraries.

Python

%pip install -U langchain-azure-ai azure-search-documents azure-identity pypdf
dbutils.library.restartPython()

2. The Ingestion Script (PySpark)

This script performs the “Heavy Lifting”: it reads PDFs from your Bronze folder, chunks them, and sends them to the AI Search Index.

Python

import os
from pyspark.sql.functions import col, pandas_udf
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_azure_ai.embeddings import AzureOpenAIEmbeddings
from langchain_azure_ai.vectorstores import AzureAISearch
# --- Configuration ---
STORAGE_PATH = "abfss://bronze-raw@stdatalakeprod001.dfs.core.windows.net/hr-manuals/"
SEARCH_ENDPOINT = "https://ais-internal-rag-lookup.search.windows.net"
SEARCH_KEY = dbutils.secrets.get(scope="rag-scope", key="search-admin-key")
# 1. Load Data from ADLS
# Using Spark to list all PDF files in the lake
df = spark.read.format("binaryFile").option("pathGlobFilter", "*.pdf").load(STORAGE_PATH)
# 2. Extract and Chunk Text
# (Simplification: In prod, use 'spark-pdf' or 'Azure AI Document Intelligence')
def process_pdf(content):
import io
from pypdf import PdfReader
reader = PdfReader(io.BytesIO(content))
text = ""
for page in reader.pages:
text += page.extract_text()
# Split into 1000-character chunks with overlap for context
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
return splitter.split_text(text)
# 3. Create Embeddings & Push to Azure AI Search
embeddings = AzureOpenAIEmbeddings(
azure_deployment="text-embedding-3-large",
azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"]
)
vector_store = AzureAISearch(
threading=True,
azure_search_endpoint=SEARCH_ENDPOINT,
azure_search_key=SEARCH_KEY,
index_name="internal-docs-index",
embedding_function=embeddings.embed_query
)
# Convert PDF data to chunks and add to Vector Store
for row in df.collect():
chunks = process_pdf(row.content)
# Metadata helps with "Security Trimming" later
metadata = [{"source": row.path, "id": f"{row.path}_{i}"} for i in range(len(chunks))]
vector_store.add_texts(chunks, metadatas=metadata)
print("Indexing Complete!")

3. Key 2026 Features in this Code

  • RecursiveCharacterTextSplitter: This ensures that chunks don’t break in the middle of a word, which helps the LLM maintain “contextual integrity.”
  • text-embedding-3-large: We use the latest 2026 embedding model because it supports MRL (Matryoshka Representation Learning), allowing you to store smaller vectors without losing accuracy—saving you money on storage.
  • Managed Identity: In a real production setup, you would replace SEARCH_KEY with DefaultAzureCredential() so no keys are ever visible in the code.

4. How the Chatbot Queries this Data

Once that script runs, your “Search Index” is full. When a user asks a question, your backend app (likely a Python Web App or Azure Function) runs this simple logic:

Python

# The "Retriever"
query = "What is the policy on maternity leave?"
docs = vector_store.similarity_search(
query,
k=3,
search_type="hybrid" # 2026 standard: Vector + Keyword
)
# The "Generator"
# Pass 'docs' to GPT-4o to get the final human-friendly answer

Summary of your Pipeline

  1. ADF detects the file.
  2. Databricks (using the code above) reads the file.
  3. Databricks chunks it and calculates the math (embeddings).
  4. Azure AI Search stores the result.

Your infrastructure and code are now fully aligned, setting up the User Interface (like a Streamlit or React app) to actually talk to this bot

For the final piece of your internal RAG system, you need a user-friendly interface. In 2026, there are two primary paths: the “Pro-Code” way (custom and flexible) and the “Low-Code” way (fast and integrated).


Option 1: The “Low-Code” Way (Azure AI Foundry)

If you want a professional-grade internal portal in under 10 minutes, use the Azure AI Foundry (formerly AI Studio) “Deploy to Web App” feature.

  1. Go to Azure AI Foundry: Open your project.
  2. Open Playground: Select the “Chat” playground.
  3. Setup RAG: Under “Add your data,” select your Azure AI Search index.
  4. Deploy: Click the Deploy to button and select “as a web app”.
    • Azure will automatically provision an Azure App Service.
    • It creates a beautiful, ChatGPT-like interface that is pre-connected to your data.
    • Security: It integrates with Microsoft Entra ID out of the box, so only your employees can log in.

Option 2: The “Pro-Code” Way (Python + Streamlit)

If you want to customize the branding or add specific internal tools (like a “Feedback” button), Streamlit is the industry standard for 2026.

The app.py logic:

Python

import streamlit as st
from langchain_azure_ai.chat_models import AzureChatOpenAI
from langchain_azure_ai.vectorstores import AzureAISearch
from langchain_azure_ai.embeddings import AzureOpenAIEmbeddings
st.set_page_config(page_title="Internal HR Assistant", layout="wide")
st.title("🤖 Internal Knowledge Bot")
# 1. Initialize Connection
embeddings = AzureOpenAIEmbeddings(azure_deployment="text-embedding-3-large")
vector_store = AzureAISearch(
azure_search_endpoint=st.secrets["SEARCH_ENDPOINT"],
index_name="internal-docs-index",
embedding_function=embeddings.embed_query
)
llm = AzureChatOpenAI(azure_deployment="gpt-4o", temperature=0)
# 2. Chat Interface
if "messages" not in st.session_state:
st.session_state.messages = []
for message in st.session_state.messages:
with st.chat_message(message["role"]):
st.markdown(message["content"])
if prompt := st.chat_input("Ask me about company policy..."):
st.session_state.messages.append({"role": "user", "content": prompt})
# 3. The RAG Step: Retrieve relevant chunks
docs = vector_store.similarity_search(prompt, k=3)
context = "\n".join([d.page_content for d in docs])
# 4. The RAG Step: Generate Answer
full_prompt = f"Use the following context to answer: {context}\n\nQuestion: {prompt}"
response = llm.invoke(full_prompt).content
with st.chat_message("assistant"):
st.markdown(response)
with st.expander("View Sources"):
for d in docs:
st.write(f"- {d.metadata['source']}")
st.session_state.messages.append({"role": "assistant", "content": response})

3. Critical 2026 UX Features for Internal Bots

To make your bot actually useful for employees, ensure your UI includes:

  • Citations: Never show an answer without a link to the source document in ADLS. This builds trust and allows employees to verify the info.
  • “I Don’t Know” Guardrail: Explicitly tell the LLM in the system prompt: “If the answer is not in the provided documents, say you don’t know. Do not make up facts.”
  • Feedback Loop: Add a Thumbs Up/Down on every response. You can pipe this feedback back into ADF to log which documents are confusing the AI.

Summary of your Journey

You have successfully moved from raw data to a live AI assistant:

  1. Infrastructure: Deployed via Terraform.
  2. Pipeline: Data moved by ADF into ADLS.
  3. Processing: Cleaned and Vectorized in Databricks.
  4. Retrieval: Indexed and ranked in Azure AI Search.
  5. Interface: Served via App Service or Streamlit.

VII. how to monitor the costs of this stack so you don’t get a surprise bill at the end of the month

Building a RAG system in 2026 is an exercise in consumption-based engineering. Because every PDF page processed and every chat message sent has a price tag, monitoring isn’t just “good hygiene”—it’s a requirement to keep the project alive.

Here is how to monitor and optimize the costs of your specific ADF-ADLS-Databricks-AI Search stack.


1. The “Big Three” Cost Drivers

In your internal RAG architecture, these three will typically account for 90% of your bill:

ServiceThe Cost Driver2026 Pro-Tip
DatabricksCluster uptime (DBUs)Use Serverless Compute for job runs. It scales to zero the second the processing finishes.
AI SearchSearch Units (SUs)Start with the Basic tier. Don’t move to Standard until your document count exceeds 15GB or 1 million chunks.
Azure OpenAIToken ConsumptionUse gpt-4o-mini for simple summarization and only use gpt-4o for complex reasoning to save up to 80% on tokens.

2. Setting Up “Hard” Guardrails (Azure Budgets)

Don’t wait for the monthly invoice. Set up an automated kill-switch.

  1. Create a Resource Group Budget: Put all your RAG resources (ADF, ADLS, etc.) in one Resource Group.
  2. Set Thresholds: * 50%: Send an email to the team.
    • 90%: Send a high-priority alert to the Manager.
    • 100% (The Nuclear Option): In 2026, you can trigger an Azure Automation Runbook that programmatically disables the Azure OpenAI API keys, instantly stopping further spending.

3. Optimization Checklist by Service

Azure Data Factory (ADF)

  • Data Integration Units (DIUs): When copying files from SharePoint/On-prem to ADLS, ADF defaults to 4 DIUs. For small internal docs, manually set this to 2 to halve the copy cost.
  • Avoid Over-Polling: Set your triggers to “Tumbling Window” or “Storage Event” rather than “Schedule” (e.g., checking every 1 minute) to reduce trigger run costs.

Azure Databricks

  • Auto-Termination: Ensure your clusters are set to terminate after 10 minutes of inactivity.
  • Photon Engine: Turn on the Photon query engine. While it costs slightly more per hour, it processes data so much faster that the total cost of the job is usually lower.

Azure Data Lake (ADLS)

  • Lifecycle Management: Set a policy to move files from Hot to Cold storage if they haven’t been accessed in 30 days. Your “Raw/Bronze” data almost never needs to be in the Hot tier.

Azure AI Search

  • Image Cracking: If your PDFs contain images, “Document Cracking” costs extra ($1 per 1,000 images). If you don’t need to “read” charts or photos, disable image extraction in your indexer settings.

4. 2026 “FinOps for AI” Dashboard

The most effective way to stay under budget is to build a Cost Visibility Dashboard.

  • Tagging: Ensure every resource in your Terraform code has a Project: Internal-RAG and Department: HR tag.
  • Cost Analysis: Use the Azure Cost Management tool to group by “Tag.” This allows you to show your stakeholders exactly how much the “HR Bot” is costing per department.

Summary

If you are just starting, your daily “Idle” cost (with everything turned on but no one using it) should be roughly $5–$10/day. If it’s higher, check if a Databricks cluster was left running or if you deployed a “Standard” tier AI Search unnecessarily.

Since we’ve covered the infrastructure, the code, the UI, and the costs, is there any specific part of the internal security (like preventing certain employees from seeing certain files) you’d like to dive into?

Leave a comment