Overview — What We’re Building
[ Documents ]→[ Ingestion Pipeline ]→[ AI Search + Embeddings ] ↓[ User ] → [ APIM ] → [ App Service / AKS ] → [ Azure OpenAI ] ↓ [ Monitoring + Security ]
Prerequisites
# Tools needed- Azure CLI (az)- Terraform or Bicep (IaC)- Docker- Python 3.11+- VS Code + Azure extension# Azure services needed- Azure Subscription- Contributor or Owner role
Option A — Deploy with Terraform (Recommended)
Project Structure
rag-azure/├── infra/│ ├── main.tf│ ├── variables.tf│ ├── outputs.tf│ ├── modules/│ │ ├── openai/│ │ ├── ai_search/│ │ ├── storage/│ │ ├── app_service/│ │ └── networking/├── app/│ ├── api/│ │ ├── main.py│ │ ├── retrieval.py│ │ ├── generation.py│ │ └── security.py│ ├── ingestion/│ │ ├── ingest.py│ │ └── chunker.py│ ├── Dockerfile│ └── requirements.txt├── scripts/│ ├── deploy.sh│ └── index_documents.sh└── .github/ └── workflows/ └── deploy.yml
Step 1 — Core Infrastructure (Terraform)
# infra/main.tf
terraform {
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "~> 3.80"
}
}
backend "azurerm" {
resource_group_name = "rg-tfstate"
storage_account_name = "stgtfstate"
container_name = "tfstate"
key = "rag.tfstate"
}
}
provider "azurerm" {
features {}
}
# ── Resource Group ──────────────────────────────────────────
resource "azurerm_resource_group" "rag" {
name = "rg-${var.project}-${var.env}"
location = var.location
tags = local.tags
}
# ── Virtual Network ─────────────────────────────────────────
resource "azurerm_virtual_network" "rag" {
name = "vnet-${var.project}-${var.env}"
resource_group_name = azurerm_resource_group.rag.name
location = azurerm_resource_group.rag.location
address_space = ["10.0.0.0/16"]
}
resource "azurerm_subnet" "app" {
name = "snet-app"
resource_group_name = azurerm_resource_group.rag.name
virtual_network_name = azurerm_virtual_network.rag.name
address_prefixes = ["10.0.1.0/24"]
delegation {
name = "app-service-delegation"
service_delegation {
name = "Microsoft.Web/serverFarms"
}
}
}
resource "azurerm_subnet" "private_endpoints" {
name = "snet-pe"
resource_group_name = azurerm_resource_group.rag.name
virtual_network_name = azurerm_virtual_network.rag.name
address_prefixes = ["10.0.2.0/24"]
}
Step 2 — Azure OpenAI
# infra/modules/openai/main.tf
resource "azurerm_cognitive_account" "openai" {
name = "oai-${var.project}-${var.env}"
resource_group_name = var.resource_group_name
location = var.location
kind = "OpenAI"
sku_name = "S0"
# Disable public access — private endpoint only
public_network_access_enabled = false
identity {
type = "SystemAssigned"
}
tags = var.tags
}
# Deploy models
resource "azurerm_cognitive_deployment" "gpt4o" {
name = "gpt-4o"
cognitive_account_id = azurerm_cognitive_account.openai.id
model {
format = "OpenAI"
name = "gpt-4o"
version = "2024-08-06"
}
scale {
type = "Standard"
capacity = 40 # TPM in thousands
}
}
resource "azurerm_cognitive_deployment" "embeddings" {
name = "text-embedding-3-large"
cognitive_account_id = azurerm_cognitive_account.openai.id
model {
format = "OpenAI"
name = "text-embedding-3-large"
version = "1"
}
scale {
type = "Standard"
capacity = 120
}
}
# Private Endpoint
resource "azurerm_private_endpoint" "openai" {
name = "pe-openai-${var.env}"
resource_group_name = var.resource_group_name
location = var.location
subnet_id = var.private_endpoint_subnet_id
private_service_connection {
name = "psc-openai"
private_connection_resource_id = azurerm_cognitive_account.openai.id
subresource_names = ["account"]
is_manual_connection = false
}
private_dns_zone_group {
name = "openai-dns"
private_dns_zone_ids = [var.openai_dns_zone_id]
}
}
Step 3 — Azure AI Search
# infra/modules/ai_search/main.tf
resource "azurerm_search_service" "rag" {
name = "srch-${var.project}-${var.env}"
resource_group_name = var.resource_group_name
location = var.location
sku = "standard" # Use standard for vector search
replica_count = 2 # HA for production
partition_count = 1
# Disable API key auth — use Entra ID only
local_authentication_enabled = false
public_network_access_enabled = false
identity {
type = "SystemAssigned"
}
tags = var.tags
}
# Private Endpoint
resource "azurerm_private_endpoint" "search" {
name = "pe-search-${var.env}"
resource_group_name = var.resource_group_name
location = var.location
subnet_id = var.private_endpoint_subnet_id
private_service_connection {
name = "psc-search"
private_connection_resource_id = azurerm_search_service.rag.id
subresource_names = ["searchService"]
is_manual_connection = false
}
}
Step 4 — Storage Account (Document Store)
# infra/modules/storage/main.tf
resource "azurerm_storage_account" "docs" {
name = "st${var.project}${var.env}"
resource_group_name = var.resource_group_name
location = var.location
account_tier = "Standard"
account_replication_type = "ZRS" # Zone-redundant
# Security settings
public_network_access_enabled = false
allow_nested_items_to_be_public = false
min_tls_version = "TLS1_2"
shared_access_key_enabled = false # Entra ID only
blob_properties {
versioning_enabled = true # Keep doc versions
delete_retention_policy {
days = 30
}
}
identity {
type = "SystemAssigned"
}
}
resource "azurerm_storage_container" "documents" {
name = "documents"
storage_account_name = azurerm_storage_account.docs.name
container_access_type = "private"
}
resource "azurerm_storage_container" "processed" {
name = "processed"
storage_account_name = azurerm_storage_account.docs.name
container_access_type = "private"
}
Step 5 — Key Vault
# infra/modules/keyvault/main.tf
data "azurerm_client_config" "current" {}
resource "azurerm_key_vault" "rag" {
name = "kv-${var.project}-${var.env}"
resource_group_name = var.resource_group_name
location = var.location
tenant_id = data.azurerm_client_config.current.tenant_id
sku_name = "premium" # HSM-backed keys
# Disable public access
public_network_access_enabled = false
# Require RBAC (not access policies)
enable_rbac_authorization = true
purge_protection_enabled = true
soft_delete_retention_days = 90
}
# Private Endpoint
resource "azurerm_private_endpoint" "keyvault" {
name = "pe-kv-${var.env}"
resource_group_name = var.resource_group_name
location = var.location
subnet_id = var.private_endpoint_subnet_id
private_service_connection {
name = "psc-kv"
private_connection_resource_id = azurerm_key_vault.rag.id
subresource_names = ["vault"]
is_manual_connection = false
}
}
Step 6 — App Service (RAG API)
# infra/modules/app_service/main.tfresource "azurerm_service_plan" "rag" { name = "asp-${var.project}-${var.env}" resource_group_name = var.resource_group_name location = var.location os_type = "Linux" sku_name = "P2v3" # Production tier}resource "azurerm_linux_web_app" "rag_api" { name = "app-${var.project}-${var.env}" resource_group_name = var.resource_group_name location = var.location service_plan_id = azurerm_service_plan.rag.id # VNet integration virtual_network_subnet_id = var.app_subnet_id https_only = true identity { type = "SystemAssigned" # Managed Identity } site_config { always_on = true http2_enabled = true ftps_state = "Disabled" min_tls_version = "1.2" application_stack { docker_image_name = "${var.acr_name}.azurecr.io/rag-api:latest" docker_registry_url = "https://${var.acr_name}.azurecr.io" } health_check_path = "/health" } app_settings = { # All values pulled from Key Vault via references "AZURE_OPENAI_ENDPOINT" = "@Microsoft.KeyVault(SecretUri=${var.kv_uri}secrets/openai-endpoint/)" "SEARCH_ENDPOINT" = "@Microsoft.KeyVault(SecretUri=${var.kv_uri}secrets/search-endpoint/)" "STORAGE_ACCOUNT_URL" = "@Microsoft.KeyVault(SecretUri=${var.kv_uri}secrets/storage-url/)" "APPLICATIONINSIGHTS_CONNECTION_STRING" = "@Microsoft.KeyVault(SecretUri=${var.kv_uri}secrets/appinsights-conn/)" "ENVIRONMENT" = var.env }}
Step 7 — RBAC Assignments
# infra/rbac.tflocals { app_principal_id = azurerm_linux_web_app.rag_api.identity[0].principal_id search_principal_id = azurerm_search_service.rag.identity[0].principal_id}# App → OpenAIresource "azurerm_role_assignment" "app_to_openai" { scope = module.openai.id role_definition_name = "Cognitive Services OpenAI User" principal_id = local.app_principal_id}# App → AI Searchresource "azurerm_role_assignment" "app_to_search" { scope = module.ai_search.id role_definition_name = "Search Index Data Reader" principal_id = local.app_principal_id}# App → Storageresource "azurerm_role_assignment" "app_to_storage" { scope = module.storage.id role_definition_name = "Storage Blob Data Reader" principal_id = local.app_principal_id}# App → Key Vaultresource "azurerm_role_assignment" "app_to_kv" { scope = module.keyvault.id role_definition_name = "Key Vault Secrets User" principal_id = local.app_principal_id}# Search → Storage (for indexer to read docs)resource "azurerm_role_assignment" "search_to_storage" { scope = module.storage.id role_definition_name = "Storage Blob Data Reader" principal_id = local.search_principal_id}# Search → OpenAI (for integrated vectorization)resource "azurerm_role_assignment" "search_to_openai" { scope = module.openai.id role_definition_name = "Cognitive Services OpenAI User" principal_id = local.search_principal_id}
Step 8 — Create the AI Search Index
# scripts/create_index.pyfrom azure.search.documents.indexes import SearchIndexClientfrom azure.search.documents.indexes.models import ( SearchIndex, SearchField, SearchFieldDataType, VectorSearch, HnswAlgorithmConfiguration, VectorSearchProfile, SemanticConfiguration, SemanticSearch, SemanticPrioritizedFields, SemanticField)from azure.identity import DefaultAzureCredentialcredential = DefaultAzureCredential()index_client = SearchIndexClient( endpoint=SEARCH_ENDPOINT, credential=credential)index = SearchIndex( name="rag-index", fields=[ SearchField(name="chunk_id", type=SearchFieldDataType.String, key=True), SearchField(name="content", type=SearchFieldDataType.String, searchable=True), SearchField(name="source_file", type=SearchFieldDataType.String, filterable=True), SearchField(name="page_number", type=SearchFieldDataType.Int32, filterable=True), SearchField(name="sensitivity", type=SearchFieldDataType.String, filterable=True), SearchField( name="allowed_groups", type=SearchFieldDataType.Collection(SearchFieldDataType.String), filterable=True ), SearchField( name="embedding", type=SearchFieldDataType.Collection(SearchFieldDataType.Single), searchable=True, vector_search_dimensions=3072, # text-embedding-3-large vector_search_profile_name="hnsw-profile" ), ], vector_search=VectorSearch( algorithms=[HnswAlgorithmConfiguration(name="hnsw-algo")], profiles=[VectorSearchProfile( name="hnsw-profile", algorithm_configuration_name="hnsw-algo" )] ), semantic_search=SemanticSearch( configurations=[SemanticConfiguration( name="semantic-config", prioritized_fields=SemanticPrioritizedFields( content_fields=[SemanticField(field_name="content")] ) )] ))index_client.create_or_update_index(index)print("✅ Index created")
Step 9 — Document Ingestion Pipeline
# app/ingestion/ingest.pyfrom azure.storage.blob import BlobServiceClientfrom azure.search.documents import SearchClientfrom azure.identity import DefaultAzureCredentialfrom openai import AzureOpenAIimport hashlib, jsoncredential = DefaultAzureCredential()def ingest_document(blob_name: str): # 1. Download from Blob Storage blob_client = BlobServiceClient( account_url=STORAGE_URL, credential=credential ).get_blob_client("documents", blob_name) content = blob_client.download_blob().readall().decode("utf-8") # 2. Chunk the document chunks = chunk_document(content, chunk_size=512, overlap=50) # 3. Embed each chunk openai_client = AzureOpenAI( azure_endpoint=OPENAI_ENDPOINT, azure_ad_token_provider=get_token_provider(credential) ) documents = [] for i, chunk in enumerate(chunks): embedding = openai_client.embeddings.create( input=chunk, model="text-embedding-3-large" ).data[0].embedding documents.append({ "chunk_id": hashlib.md5(f"{blob_name}-{i}".encode()).hexdigest(), "content": chunk, "source_file": blob_name, "page_number": i, "embedding": embedding, "allowed_groups": get_document_acl(blob_name), # from Purview / metadata "sensitivity": get_sensitivity_label(blob_name) }) # 4. Upload to AI Search search_client = SearchClient( endpoint=SEARCH_ENDPOINT, index_name="rag-index", credential=credential ) result = search_client.upload_documents(documents) print(f"✅ Indexed {len(documents)} chunks from {blob_name}")
Step 10 — RAG API (FastAPI)
# app/api/main.pyfrom fastapi import FastAPI, Depends, HTTPExceptionfrom azure.identity import DefaultAzureCredentialfrom azure.search.documents import SearchClientfrom azure.search.documents.models import VectorizedQueryfrom openai import AzureOpenAIapp = FastAPI()credential = DefaultAzureCredential()@app.post("/chat")async def chat( request: ChatRequest, user: dict = Depends(verify_entra_token) # Auth middleware): # 1. Validate input sanitized_query = sanitize_input(request.query) # 2. Embed query query_embedding = embed(sanitized_query) # 3. Retrieve with security filter user_groups = user.get("groups", []) security_filter = build_security_filter(user_groups, user["oid"]) search_client = SearchClient( SEARCH_ENDPOINT, "rag-index", credential ) results = search_client.search( search_text=sanitized_query, vector_queries=[VectorizedQuery( vector=query_embedding, k_nearest_neighbors=5, fields="embedding" )], filter=security_filter, query_type="semantic", semantic_configuration_name="semantic-config", top=5 ) chunks = [r["content"] for r in results] sources = [r["source_file"] for r in results] # 4. Generate answer context = "\n\n---\n\n".join(chunks) prompt = build_rag_prompt(sanitized_query, context) openai_client = AzureOpenAI( azure_endpoint=OPENAI_ENDPOINT, azure_ad_token_provider=get_token_provider(credential) ) response = openai_client.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": SYSTEM_PROMPT}, {"role": "user", "content": prompt} ], temperature=0.0, # Deterministic for RAG max_tokens=1000 ) answer = response.choices[0].message.content # 5. Safety check output check_content_safety(answer) # 6. Audit log log_interaction(user["oid"], sanitized_query, sources, answer) return {"answer": answer, "sources": sources}
Step 11 — CI/CD Pipeline (GitHub Actions)
# .github/workflows/deploy.ymlname: Deploy RAG Infrastructureon: push: branches: [main] pull_request: branches: [main]env: TF_VERSION: "1.6.0" ARM_CLIENT_ID: ${{ secrets.AZURE_CLIENT_ID }} ARM_SUBSCRIPTION_ID: ${{ secrets.AZURE_SUBSCRIPTION_ID }} ARM_TENANT_ID: ${{ secrets.AZURE_TENANT_ID }}jobs: terraform: name: Terraform Plan & Apply runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Azure Login (OIDC) uses: azure/login@v1 with: client-id: ${{ secrets.AZURE_CLIENT_ID }} tenant-id: ${{ secrets.AZURE_TENANT_ID }} subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }} - name: Terraform Init run: terraform init working-directory: infra/ - name: Terraform Plan run: terraform plan -out=tfplan working-directory: infra/ - name: Terraform Apply if: github.ref == 'refs/heads/main' run: terraform apply tfplan working-directory: infra/ build-and-push: name: Build & Push Docker Image runs-on: ubuntu-latest needs: terraform steps: - uses: actions/checkout@v4 - name: Build Docker image run: docker build -t rag-api:${{ github.sha }} ./app - name: Push to ACR run: | az acr login --name ${{ secrets.ACR_NAME }} docker tag rag-api:${{ github.sha }} \ ${{ secrets.ACR_NAME }}.azurecr.io/rag-api:${{ github.sha }} docker push ${{ secrets.ACR_NAME }}.azurecr.io/rag-api:${{ github.sha }} deploy-app: name: Deploy to App Service runs-on: ubuntu-latest needs: build-and-push steps: - name: Update App Service image run: | az webapp config container set \ --name ${{ secrets.APP_NAME }} \ --resource-group ${{ secrets.RG_NAME }} \ --docker-custom-image-name \ ${{ secrets.ACR_NAME }}.azurecr.io/rag-api:${{ github.sha }}
Deployment Commands
# 1. Login to Azureaz loginaz account set --subscription "your-subscription-id"# 2. Create Terraform state backendaz group create --name rg-tfstate --location eastusaz storage account create --name stgtfstate --resource-group rg-tfstate \ --sku Standard_LRSaz storage container create --name tfstate \ --account-name stgtfstate# 3. Deploy infrastructurecd infra/terraform initterraform plan -var-file="environments/prod.tfvars"terraform apply -var-file="environments/prod.tfvars"# 4. Index your documentspython scripts/create_index.pypython scripts/ingest_documents.py --container documents# 5. Test the APIcurl -X POST https://your-app.azurewebsites.net/chat \ -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ -d '{"query": "What is our refund policy?"}'
Production Checklist
Infrastructure ✅ All services deployed via IaC (Terraform / Bicep) ✅ Private endpoints on OpenAI, AI Search, Storage, Key Vault ✅ Public access disabled on all backend services ✅ Managed Identity — zero hardcoded secretsSecurity ✅ Entra ID auth on API ✅ Document-level ACL enforced at retrieval ✅ Content Safety on input + output ✅ Key Vault for all secrets ✅ WAF on Front Door / APIMReliability ✅ AI Search replica_count ≥ 2 ✅ App Service always_on = true ✅ Health check endpoint configured ✅ Auto-scaling rules setMonitoring ✅ Application Insights connected ✅ Log Analytics workspace active ✅ Alerts on latency, error rate, cost ✅ Audit logs to BigQuery / Log AnalyticsCI/CD ✅ Terraform state in remote backend ✅ OIDC auth (no client secrets in GitHub) ✅ PR plan, main apply workflow ✅ Image tagged with git SHA
This gives you a fully production-ready, secure RAG deployment on Azure — infrastructure as code, zero hardcoded secrets, private networking, and document-level access control from day one.