Deploying RAG Infrastructure on Azure: A Step-by-Step Guide

Overview — What We’re Building

[ Documents ]→[ Ingestion Pipeline ]→[ AI Search + Embeddings ]
[ User ] → [ APIM ] → [ App Service / AKS ] → [ Azure OpenAI ]
[ Monitoring + Security ]

Prerequisites

# Tools needed
- Azure CLI (az)
- Terraform or Bicep (IaC)
- Docker
- Python 3.11+
- VS Code + Azure extension
# Azure services needed
- Azure Subscription
- Contributor or Owner role

Option A — Deploy with Terraform (Recommended)

Project Structure

rag-azure/
├── infra/
│ ├── main.tf
│ ├── variables.tf
│ ├── outputs.tf
│ ├── modules/
│ │ ├── openai/
│ │ ├── ai_search/
│ │ ├── storage/
│ │ ├── app_service/
│ │ └── networking/
├── app/
│ ├── api/
│ │ ├── main.py
│ │ ├── retrieval.py
│ │ ├── generation.py
│ │ └── security.py
│ ├── ingestion/
│ │ ├── ingest.py
│ │ └── chunker.py
│ ├── Dockerfile
│ └── requirements.txt
├── scripts/
│ ├── deploy.sh
│ └── index_documents.sh
└── .github/
└── workflows/
└── deploy.yml

Step 1 — Core Infrastructure (Terraform)

# infra/main.tf

terraform {
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 3.80"
    }
  }
  backend "azurerm" {
    resource_group_name  = "rg-tfstate"
    storage_account_name = "stgtfstate"
    container_name       = "tfstate"
    key                  = "rag.tfstate"
  }
}

provider "azurerm" {
  features {}
}

# ── Resource Group ──────────────────────────────────────────
resource "azurerm_resource_group" "rag" {
  name     = "rg-${var.project}-${var.env}"
  location = var.location
  tags     = local.tags
}

# ── Virtual Network ─────────────────────────────────────────
resource "azurerm_virtual_network" "rag" {
  name                = "vnet-${var.project}-${var.env}"
  resource_group_name = azurerm_resource_group.rag.name
  location            = azurerm_resource_group.rag.location
  address_space       = ["10.0.0.0/16"]
}

resource "azurerm_subnet" "app" {
  name                 = "snet-app"
  resource_group_name  = azurerm_resource_group.rag.name
  virtual_network_name = azurerm_virtual_network.rag.name
  address_prefixes     = ["10.0.1.0/24"]
  delegation {
    name = "app-service-delegation"
    service_delegation {
      name = "Microsoft.Web/serverFarms"
    }
  }
}

resource "azurerm_subnet" "private_endpoints" {
  name                 = "snet-pe"
  resource_group_name  = azurerm_resource_group.rag.name
  virtual_network_name = azurerm_virtual_network.rag.name
  address_prefixes     = ["10.0.2.0/24"]
}



Step 2 — Azure OpenAI

# infra/modules/openai/main.tf

resource "azurerm_cognitive_account" "openai" {
  name                = "oai-${var.project}-${var.env}"
  resource_group_name = var.resource_group_name
  location            = var.location
  kind                = "OpenAI"
  sku_name            = "S0"

  # Disable public access — private endpoint only
  public_network_access_enabled = false

  identity {
    type = "SystemAssigned"
  }

  tags = var.tags
}

# Deploy models
resource "azurerm_cognitive_deployment" "gpt4o" {
  name                 = "gpt-4o"
  cognitive_account_id = azurerm_cognitive_account.openai.id

  model {
    format  = "OpenAI"
    name    = "gpt-4o"
    version = "2024-08-06"
  }

  scale {
    type     = "Standard"
    capacity = 40  # TPM in thousands
  }
}

resource "azurerm_cognitive_deployment" "embeddings" {
  name                 = "text-embedding-3-large"
  cognitive_account_id = azurerm_cognitive_account.openai.id

  model {
    format  = "OpenAI"
    name    = "text-embedding-3-large"
    version = "1"
  }

  scale {
    type     = "Standard"
    capacity = 120
  }
}

# Private Endpoint
resource "azurerm_private_endpoint" "openai" {
  name                = "pe-openai-${var.env}"
  resource_group_name = var.resource_group_name
  location            = var.location
  subnet_id           = var.private_endpoint_subnet_id

  private_service_connection {
    name                           = "psc-openai"
    private_connection_resource_id = azurerm_cognitive_account.openai.id
    subresource_names              = ["account"]
    is_manual_connection           = false
  }

  private_dns_zone_group {
    name                 = "openai-dns"
    private_dns_zone_ids = [var.openai_dns_zone_id]
  }
}



Step 3 — Azure AI Search

# infra/modules/ai_search/main.tf

resource "azurerm_search_service" "rag" {
  name                = "srch-${var.project}-${var.env}"
  resource_group_name = var.resource_group_name
  location            = var.location
  sku                 = "standard"  # Use standard for vector search
  replica_count       = 2           # HA for production
  partition_count     = 1

  # Disable API key auth — use Entra ID only
  local_authentication_enabled   = false
  public_network_access_enabled  = false

  identity {
    type = "SystemAssigned"
  }

  tags = var.tags
}

# Private Endpoint
resource "azurerm_private_endpoint" "search" {
  name                = "pe-search-${var.env}"
  resource_group_name = var.resource_group_name
  location            = var.location
  subnet_id           = var.private_endpoint_subnet_id

  private_service_connection {
    name                           = "psc-search"
    private_connection_resource_id = azurerm_search_service.rag.id
    subresource_names              = ["searchService"]
    is_manual_connection           = false
  }
}



Step 4 — Storage Account (Document Store)

# infra/modules/storage/main.tf

resource "azurerm_storage_account" "docs" {
  name                     = "st${var.project}${var.env}"
  resource_group_name      = var.resource_group_name
  location                 = var.location
  account_tier             = "Standard"
  account_replication_type = "ZRS"        # Zone-redundant

  # Security settings
  public_network_access_enabled   = false
  allow_nested_items_to_be_public = false
  min_tls_version                 = "TLS1_2"
  shared_access_key_enabled       = false  # Entra ID only

  blob_properties {
    versioning_enabled = true              # Keep doc versions
    delete_retention_policy {
      days = 30
    }
  }

  identity {
    type = "SystemAssigned"
  }
}

resource "azurerm_storage_container" "documents" {
  name                  = "documents"
  storage_account_name  = azurerm_storage_account.docs.name
  container_access_type = "private"
}

resource "azurerm_storage_container" "processed" {
  name                  = "processed"
  storage_account_name  = azurerm_storage_account.docs.name
  container_access_type = "private"
}



Step 5 — Key Vault

# infra/modules/keyvault/main.tf

data "azurerm_client_config" "current" {}

resource "azurerm_key_vault" "rag" {
  name                = "kv-${var.project}-${var.env}"
  resource_group_name = var.resource_group_name
  location            = var.location
  tenant_id           = data.azurerm_client_config.current.tenant_id
  sku_name            = "premium"   # HSM-backed keys

  # Disable public access
  public_network_access_enabled = false

  # Require RBAC (not access policies)
  enable_rbac_authorization = true

  purge_protection_enabled   = true
  soft_delete_retention_days = 90
}

# Private Endpoint
resource "azurerm_private_endpoint" "keyvault" {
  name                = "pe-kv-${var.env}"
  resource_group_name = var.resource_group_name
  location            = var.location
  subnet_id           = var.private_endpoint_subnet_id

  private_service_connection {
    name                           = "psc-kv"
    private_connection_resource_id = azurerm_key_vault.rag.id
    subresource_names              = ["vault"]
    is_manual_connection           = false
  }
}



Step 6 — App Service (RAG API)

# infra/modules/app_service/main.tf
resource "azurerm_service_plan" "rag" {
name = "asp-${var.project}-${var.env}"
resource_group_name = var.resource_group_name
location = var.location
os_type = "Linux"
sku_name = "P2v3" # Production tier
}
resource "azurerm_linux_web_app" "rag_api" {
name = "app-${var.project}-${var.env}"
resource_group_name = var.resource_group_name
location = var.location
service_plan_id = azurerm_service_plan.rag.id
# VNet integration
virtual_network_subnet_id = var.app_subnet_id
https_only = true
identity {
type = "SystemAssigned" # Managed Identity
}
site_config {
always_on = true
http2_enabled = true
ftps_state = "Disabled"
min_tls_version = "1.2"
application_stack {
docker_image_name = "${var.acr_name}.azurecr.io/rag-api:latest"
docker_registry_url = "https://${var.acr_name}.azurecr.io"
}
health_check_path = "/health"
}
app_settings = {
# All values pulled from Key Vault via references
"AZURE_OPENAI_ENDPOINT" = "@Microsoft.KeyVault(SecretUri=${var.kv_uri}secrets/openai-endpoint/)"
"SEARCH_ENDPOINT" = "@Microsoft.KeyVault(SecretUri=${var.kv_uri}secrets/search-endpoint/)"
"STORAGE_ACCOUNT_URL" = "@Microsoft.KeyVault(SecretUri=${var.kv_uri}secrets/storage-url/)"
"APPLICATIONINSIGHTS_CONNECTION_STRING" = "@Microsoft.KeyVault(SecretUri=${var.kv_uri}secrets/appinsights-conn/)"
"ENVIRONMENT" = var.env
}
}

Step 7 — RBAC Assignments

# infra/rbac.tf
locals {
app_principal_id = azurerm_linux_web_app.rag_api.identity[0].principal_id
search_principal_id = azurerm_search_service.rag.identity[0].principal_id
}
# App → OpenAI
resource "azurerm_role_assignment" "app_to_openai" {
scope = module.openai.id
role_definition_name = "Cognitive Services OpenAI User"
principal_id = local.app_principal_id
}
# App → AI Search
resource "azurerm_role_assignment" "app_to_search" {
scope = module.ai_search.id
role_definition_name = "Search Index Data Reader"
principal_id = local.app_principal_id
}
# App → Storage
resource "azurerm_role_assignment" "app_to_storage" {
scope = module.storage.id
role_definition_name = "Storage Blob Data Reader"
principal_id = local.app_principal_id
}
# App → Key Vault
resource "azurerm_role_assignment" "app_to_kv" {
scope = module.keyvault.id
role_definition_name = "Key Vault Secrets User"
principal_id = local.app_principal_id
}
# Search → Storage (for indexer to read docs)
resource "azurerm_role_assignment" "search_to_storage" {
scope = module.storage.id
role_definition_name = "Storage Blob Data Reader"
principal_id = local.search_principal_id
}
# Search → OpenAI (for integrated vectorization)
resource "azurerm_role_assignment" "search_to_openai" {
scope = module.openai.id
role_definition_name = "Cognitive Services OpenAI User"
principal_id = local.search_principal_id
}

Step 8 — Create the AI Search Index

# scripts/create_index.py
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import (
SearchIndex, SearchField, SearchFieldDataType,
VectorSearch, HnswAlgorithmConfiguration,
VectorSearchProfile, SemanticConfiguration,
SemanticSearch, SemanticPrioritizedFields,
SemanticField
)
from azure.identity import DefaultAzureCredential
credential = DefaultAzureCredential()
index_client = SearchIndexClient(
endpoint=SEARCH_ENDPOINT,
credential=credential
)
index = SearchIndex(
name="rag-index",
fields=[
SearchField(name="chunk_id", type=SearchFieldDataType.String, key=True),
SearchField(name="content", type=SearchFieldDataType.String, searchable=True),
SearchField(name="source_file", type=SearchFieldDataType.String, filterable=True),
SearchField(name="page_number", type=SearchFieldDataType.Int32, filterable=True),
SearchField(name="sensitivity", type=SearchFieldDataType.String, filterable=True),
SearchField(
name="allowed_groups",
type=SearchFieldDataType.Collection(SearchFieldDataType.String),
filterable=True
),
SearchField(
name="embedding",
type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
searchable=True,
vector_search_dimensions=3072, # text-embedding-3-large
vector_search_profile_name="hnsw-profile"
),
],
vector_search=VectorSearch(
algorithms=[HnswAlgorithmConfiguration(name="hnsw-algo")],
profiles=[VectorSearchProfile(
name="hnsw-profile",
algorithm_configuration_name="hnsw-algo"
)]
),
semantic_search=SemanticSearch(
configurations=[SemanticConfiguration(
name="semantic-config",
prioritized_fields=SemanticPrioritizedFields(
content_fields=[SemanticField(field_name="content")]
)
)]
)
)
index_client.create_or_update_index(index)
print("✅ Index created")

Step 9 — Document Ingestion Pipeline

# app/ingestion/ingest.py
from azure.storage.blob import BlobServiceClient
from azure.search.documents import SearchClient
from azure.identity import DefaultAzureCredential
from openai import AzureOpenAI
import hashlib, json
credential = DefaultAzureCredential()
def ingest_document(blob_name: str):
# 1. Download from Blob Storage
blob_client = BlobServiceClient(
account_url=STORAGE_URL,
credential=credential
).get_blob_client("documents", blob_name)
content = blob_client.download_blob().readall().decode("utf-8")
# 2. Chunk the document
chunks = chunk_document(content, chunk_size=512, overlap=50)
# 3. Embed each chunk
openai_client = AzureOpenAI(
azure_endpoint=OPENAI_ENDPOINT,
azure_ad_token_provider=get_token_provider(credential)
)
documents = []
for i, chunk in enumerate(chunks):
embedding = openai_client.embeddings.create(
input=chunk,
model="text-embedding-3-large"
).data[0].embedding
documents.append({
"chunk_id": hashlib.md5(f"{blob_name}-{i}".encode()).hexdigest(),
"content": chunk,
"source_file": blob_name,
"page_number": i,
"embedding": embedding,
"allowed_groups": get_document_acl(blob_name), # from Purview / metadata
"sensitivity": get_sensitivity_label(blob_name)
})
# 4. Upload to AI Search
search_client = SearchClient(
endpoint=SEARCH_ENDPOINT,
index_name="rag-index",
credential=credential
)
result = search_client.upload_documents(documents)
print(f"✅ Indexed {len(documents)} chunks from {blob_name}")

Step 10 — RAG API (FastAPI)

# app/api/main.py
from fastapi import FastAPI, Depends, HTTPException
from azure.identity import DefaultAzureCredential
from azure.search.documents import SearchClient
from azure.search.documents.models import VectorizedQuery
from openai import AzureOpenAI
app = FastAPI()
credential = DefaultAzureCredential()
@app.post("/chat")
async def chat(
request: ChatRequest,
user: dict = Depends(verify_entra_token) # Auth middleware
):
# 1. Validate input
sanitized_query = sanitize_input(request.query)
# 2. Embed query
query_embedding = embed(sanitized_query)
# 3. Retrieve with security filter
user_groups = user.get("groups", [])
security_filter = build_security_filter(user_groups, user["oid"])
search_client = SearchClient(
SEARCH_ENDPOINT, "rag-index", credential
)
results = search_client.search(
search_text=sanitized_query,
vector_queries=[VectorizedQuery(
vector=query_embedding,
k_nearest_neighbors=5,
fields="embedding"
)],
filter=security_filter,
query_type="semantic",
semantic_configuration_name="semantic-config",
top=5
)
chunks = [r["content"] for r in results]
sources = [r["source_file"] for r in results]
# 4. Generate answer
context = "\n\n---\n\n".join(chunks)
prompt = build_rag_prompt(sanitized_query, context)
openai_client = AzureOpenAI(
azure_endpoint=OPENAI_ENDPOINT,
azure_ad_token_provider=get_token_provider(credential)
)
response = openai_client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": prompt}
],
temperature=0.0, # Deterministic for RAG
max_tokens=1000
)
answer = response.choices[0].message.content
# 5. Safety check output
check_content_safety(answer)
# 6. Audit log
log_interaction(user["oid"], sanitized_query, sources, answer)
return {"answer": answer, "sources": sources}

Step 11 — CI/CD Pipeline (GitHub Actions)

# .github/workflows/deploy.yml
name: Deploy RAG Infrastructure
on:
push:
branches: [main]
pull_request:
branches: [main]
env:
TF_VERSION: "1.6.0"
ARM_CLIENT_ID: ${{ secrets.AZURE_CLIENT_ID }}
ARM_SUBSCRIPTION_ID: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
ARM_TENANT_ID: ${{ secrets.AZURE_TENANT_ID }}
jobs:
terraform:
name: Terraform Plan & Apply
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Azure Login (OIDC)
uses: azure/login@v1
with:
client-id: ${{ secrets.AZURE_CLIENT_ID }}
tenant-id: ${{ secrets.AZURE_TENANT_ID }}
subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
- name: Terraform Init
run: terraform init
working-directory: infra/
- name: Terraform Plan
run: terraform plan -out=tfplan
working-directory: infra/
- name: Terraform Apply
if: github.ref == 'refs/heads/main'
run: terraform apply tfplan
working-directory: infra/
build-and-push:
name: Build & Push Docker Image
runs-on: ubuntu-latest
needs: terraform
steps:
- uses: actions/checkout@v4
- name: Build Docker image
run: docker build -t rag-api:${{ github.sha }} ./app
- name: Push to ACR
run: |
az acr login --name ${{ secrets.ACR_NAME }}
docker tag rag-api:${{ github.sha }} \
${{ secrets.ACR_NAME }}.azurecr.io/rag-api:${{ github.sha }}
docker push ${{ secrets.ACR_NAME }}.azurecr.io/rag-api:${{ github.sha }}
deploy-app:
name: Deploy to App Service
runs-on: ubuntu-latest
needs: build-and-push
steps:
- name: Update App Service image
run: |
az webapp config container set \
--name ${{ secrets.APP_NAME }} \
--resource-group ${{ secrets.RG_NAME }} \
--docker-custom-image-name \
${{ secrets.ACR_NAME }}.azurecr.io/rag-api:${{ github.sha }}

Deployment Commands

# 1. Login to Azure
az login
az account set --subscription "your-subscription-id"
# 2. Create Terraform state backend
az group create --name rg-tfstate --location eastus
az storage account create --name stgtfstate --resource-group rg-tfstate \
--sku Standard_LRS
az storage container create --name tfstate \
--account-name stgtfstate
# 3. Deploy infrastructure
cd infra/
terraform init
terraform plan -var-file="environments/prod.tfvars"
terraform apply -var-file="environments/prod.tfvars"
# 4. Index your documents
python scripts/create_index.py
python scripts/ingest_documents.py --container documents
# 5. Test the API
curl -X POST https://your-app.azurewebsites.net/chat \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"query": "What is our refund policy?"}'

Production Checklist

Infrastructure
✅ All services deployed via IaC (Terraform / Bicep)
✅ Private endpoints on OpenAI, AI Search, Storage, Key Vault
✅ Public access disabled on all backend services
✅ Managed Identity — zero hardcoded secrets
Security
✅ Entra ID auth on API
✅ Document-level ACL enforced at retrieval
✅ Content Safety on input + output
✅ Key Vault for all secrets
✅ WAF on Front Door / APIM
Reliability
✅ AI Search replica_count ≥ 2
✅ App Service always_on = true
✅ Health check endpoint configured
✅ Auto-scaling rules set
Monitoring
✅ Application Insights connected
✅ Log Analytics workspace active
✅ Alerts on latency, error rate, cost
✅ Audit logs to BigQuery / Log Analytics
CI/CD
✅ Terraform state in remote backend
✅ OIDC auth (no client secrets in GitHub)
✅ PR plan, main apply workflow
✅ Image tagged with git SHA

This gives you a fully production-ready, secure RAG deployment on Azure — infrastructure as code, zero hardcoded secrets, private networking, and document-level access control from day one.

Leave a comment