Deploying RAG Infrastructure on Azure: A Step-by-Step Guide

Overview — What We’re Building

			
[ Documents ]→[ Ingestion Pipeline ]→[ AI Search + Embeddings ]
                                              ↓
[ User ] → [ APIM ] → [ App Service / AKS ] → [ Azure OpenAI ]
                              ↓
                    [ Monitoring + Security ]

		

Prerequisites

			
# Tools needed
- Azure CLI (az)
- Terraform or Bicep (IaC)
- Docker
- Python 3.11+
- VS Code + Azure extension
# Azure services needed
- Azure Subscription
- Contributor or Owner role

		

Option A — Deploy with Terraform (Recommended)

Project Structure

			
rag-azure/
├── infra/
│   ├── main.tf
│   ├── variables.tf
│   ├── outputs.tf
│   ├── modules/
│   │   ├── openai/
│   │   ├── ai_search/
│   │   ├── storage/
│   │   ├── app_service/
│   │   └── networking/
├── app/
│   ├── api/
│   │   ├── main.py
│   │   ├── retrieval.py
│   │   ├── generation.py
│   │   └── security.py
│   ├── ingestion/
│   │   ├── ingest.py
│   │   └── chunker.py
│   ├── Dockerfile
│   └── requirements.txt
├── scripts/
│   ├── deploy.sh
│   └── index_documents.sh
└── .github/
    └── workflows/
        └── deploy.yml

		

Step 1 — Core Infrastructure (Terraform)

# infra/main.tf

terraform {
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 3.80"
    }
  }
  backend "azurerm" {
    resource_group_name  = "rg-tfstate"
    storage_account_name = "stgtfstate"
    container_name       = "tfstate"
    key                  = "rag.tfstate"
  }
}

provider "azurerm" {
  features {}
}

# ── Resource Group ──────────────────────────────────────────
resource "azurerm_resource_group" "rag" {
  name     = "rg-${var.project}-${var.env}"
  location = var.location
  tags     = local.tags
}

# ── Virtual Network ─────────────────────────────────────────
resource "azurerm_virtual_network" "rag" {
  name                = "vnet-${var.project}-${var.env}"
  resource_group_name = azurerm_resource_group.rag.name
  location            = azurerm_resource_group.rag.location
  address_space       = ["10.0.0.0/16"]
}

resource "azurerm_subnet" "app" {
  name                 = "snet-app"
  resource_group_name  = azurerm_resource_group.rag.name
  virtual_network_name = azurerm_virtual_network.rag.name
  address_prefixes     = ["10.0.1.0/24"]
  delegation {
    name = "app-service-delegation"
    service_delegation {
      name = "Microsoft.Web/serverFarms"
    }
  }
}

resource "azurerm_subnet" "private_endpoints" {
  name                 = "snet-pe"
  resource_group_name  = azurerm_resource_group.rag.name
  virtual_network_name = azurerm_virtual_network.rag.name
  address_prefixes     = ["10.0.2.0/24"]
}

Step 2 — Azure OpenAI

# infra/modules/openai/main.tf

resource "azurerm_cognitive_account" "openai" {
  name                = "oai-${var.project}-${var.env}"
  resource_group_name = var.resource_group_name
  location            = var.location
  kind                = "OpenAI"
  sku_name            = "S0"

  # Disable public access — private endpoint only
  public_network_access_enabled = false

  identity {
    type = "SystemAssigned"
  }

  tags = var.tags
}

# Deploy models
resource "azurerm_cognitive_deployment" "gpt4o" {
  name                 = "gpt-4o"
  cognitive_account_id = azurerm_cognitive_account.openai.id

  model {
    format  = "OpenAI"
    name    = "gpt-4o"
    version = "2024-08-06"
  }

  scale {
    type     = "Standard"
    capacity = 40  # TPM in thousands
  }
}

resource "azurerm_cognitive_deployment" "embeddings" {
  name                 = "text-embedding-3-large"
  cognitive_account_id = azurerm_cognitive_account.openai.id

  model {
    format  = "OpenAI"
    name    = "text-embedding-3-large"
    version = "1"
  }

  scale {
    type     = "Standard"
    capacity = 120
  }
}

# Private Endpoint
resource "azurerm_private_endpoint" "openai" {
  name                = "pe-openai-${var.env}"
  resource_group_name = var.resource_group_name
  location            = var.location
  subnet_id           = var.private_endpoint_subnet_id

  private_service_connection {
    name                           = "psc-openai"
    private_connection_resource_id = azurerm_cognitive_account.openai.id
    subresource_names              = ["account"]
    is_manual_connection           = false
  }

  private_dns_zone_group {
    name                 = "openai-dns"
    private_dns_zone_ids = [var.openai_dns_zone_id]
  }
}

Step 3 — Azure AI Search

# infra/modules/ai_search/main.tf

resource "azurerm_search_service" "rag" {
  name                = "srch-${var.project}-${var.env}"
  resource_group_name = var.resource_group_name
  location            = var.location
  sku                 = "standard"  # Use standard for vector search
  replica_count       = 2           # HA for production
  partition_count     = 1

  # Disable API key auth — use Entra ID only
  local_authentication_enabled   = false
  public_network_access_enabled  = false

  identity {
    type = "SystemAssigned"
  }

  tags = var.tags
}

# Private Endpoint
resource "azurerm_private_endpoint" "search" {
  name                = "pe-search-${var.env}"
  resource_group_name = var.resource_group_name
  location            = var.location
  subnet_id           = var.private_endpoint_subnet_id

  private_service_connection {
    name                           = "psc-search"
    private_connection_resource_id = azurerm_search_service.rag.id
    subresource_names              = ["searchService"]
    is_manual_connection           = false
  }
}

Step 4 — Storage Account (Document Store)

# infra/modules/storage/main.tf

resource "azurerm_storage_account" "docs" {
  name                     = "st${var.project}${var.env}"
  resource_group_name      = var.resource_group_name
  location                 = var.location
  account_tier             = "Standard"
  account_replication_type = "ZRS"        # Zone-redundant

  # Security settings
  public_network_access_enabled   = false
  allow_nested_items_to_be_public = false
  min_tls_version                 = "TLS1_2"
  shared_access_key_enabled       = false  # Entra ID only

  blob_properties {
    versioning_enabled = true              # Keep doc versions
    delete_retention_policy {
      days = 30
    }
  }

  identity {
    type = "SystemAssigned"
  }
}

resource "azurerm_storage_container" "documents" {
  name                  = "documents"
  storage_account_name  = azurerm_storage_account.docs.name
  container_access_type = "private"
}

resource "azurerm_storage_container" "processed" {
  name                  = "processed"
  storage_account_name  = azurerm_storage_account.docs.name
  container_access_type = "private"
}

Step 5 — Key Vault

# infra/modules/keyvault/main.tf

data "azurerm_client_config" "current" {}

resource "azurerm_key_vault" "rag" {
  name                = "kv-${var.project}-${var.env}"
  resource_group_name = var.resource_group_name
  location            = var.location
  tenant_id           = data.azurerm_client_config.current.tenant_id
  sku_name            = "premium"   # HSM-backed keys

  # Disable public access
  public_network_access_enabled = false

  # Require RBAC (not access policies)
  enable_rbac_authorization = true

  purge_protection_enabled   = true
  soft_delete_retention_days = 90
}

# Private Endpoint
resource "azurerm_private_endpoint" "keyvault" {
  name                = "pe-kv-${var.env}"
  resource_group_name = var.resource_group_name
  location            = var.location
  subnet_id           = var.private_endpoint_subnet_id

  private_service_connection {
    name                           = "psc-kv"
    private_connection_resource_id = azurerm_key_vault.rag.id
    subresource_names              = ["vault"]
    is_manual_connection           = false
  }
}

Step 6 — App Service (RAG API)

			
# infra/modules/app_service/main.tf
resource "azurerm_service_plan" "rag" {
  name                = "asp-${var.project}-${var.env}"
  resource_group_name = var.resource_group_name
  location            = var.location
  os_type             = "Linux"
  sku_name            = "P2v3"    # Production tier
}
resource "azurerm_linux_web_app" "rag_api" {
  name                = "app-${var.project}-${var.env}"
  resource_group_name = var.resource_group_name
  location            = var.location
  service_plan_id     = azurerm_service_plan.rag.id
  # VNet integration
  virtual_network_subnet_id = var.app_subnet_id
  https_only = true
  identity {
    type = "SystemAssigned"   # Managed Identity
  }
  site_config {
    always_on        = true
    http2_enabled    = true
    ftps_state       = "Disabled"
    min_tls_version  = "1.2"
    application_stack {
      docker_image_name   = "${var.acr_name}.azurecr.io/rag-api:latest"
      docker_registry_url = "https://${var.acr_name}.azurecr.io"
    }
    health_check_path = "/health"
  }
  app_settings = {
    # All values pulled from Key Vault via references
    "AZURE_OPENAI_ENDPOINT"    = "@Microsoft.KeyVault(SecretUri=${var.kv_uri}secrets/openai-endpoint/)"
    "SEARCH_ENDPOINT"          = "@Microsoft.KeyVault(SecretUri=${var.kv_uri}secrets/search-endpoint/)"
    "STORAGE_ACCOUNT_URL"      = "@Microsoft.KeyVault(SecretUri=${var.kv_uri}secrets/storage-url/)"
    "APPLICATIONINSIGHTS_CONNECTION_STRING" = "@Microsoft.KeyVault(SecretUri=${var.kv_uri}secrets/appinsights-conn/)"
    "ENVIRONMENT"              = var.env
  }
}

		

Step 7 — RBAC Assignments

			
# infra/rbac.tf
locals {
  app_principal_id    = azurerm_linux_web_app.rag_api.identity[0].principal_id
  search_principal_id = azurerm_search_service.rag.identity[0].principal_id
}
# App → OpenAI
resource "azurerm_role_assignment" "app_to_openai" {
  scope                = module.openai.id
  role_definition_name = "Cognitive Services OpenAI User"
  principal_id         = local.app_principal_id
}
# App → AI Search
resource "azurerm_role_assignment" "app_to_search" {
  scope                = module.ai_search.id
  role_definition_name = "Search Index Data Reader"
  principal_id         = local.app_principal_id
}
# App → Storage
resource "azurerm_role_assignment" "app_to_storage" {
  scope                = module.storage.id
  role_definition_name = "Storage Blob Data Reader"
  principal_id         = local.app_principal_id
}
# App → Key Vault
resource "azurerm_role_assignment" "app_to_kv" {
  scope                = module.keyvault.id
  role_definition_name = "Key Vault Secrets User"
  principal_id         = local.app_principal_id
}
# Search → Storage (for indexer to read docs)
resource "azurerm_role_assignment" "search_to_storage" {
  scope                = module.storage.id
  role_definition_name = "Storage Blob Data Reader"
  principal_id         = local.search_principal_id
}
# Search → OpenAI (for integrated vectorization)
resource "azurerm_role_assignment" "search_to_openai" {
  scope                = module.openai.id
  role_definition_name = "Cognitive Services OpenAI User"
  principal_id         = local.search_principal_id
}

		

Step 8 — Create the AI Search Index

			
# scripts/create_index.py
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import (
    SearchIndex, SearchField, SearchFieldDataType,
    VectorSearch, HnswAlgorithmConfiguration,
    VectorSearchProfile, SemanticConfiguration,
    SemanticSearch, SemanticPrioritizedFields,
    SemanticField
)
from azure.identity import DefaultAzureCredential
credential = DefaultAzureCredential()
index_client = SearchIndexClient(
    endpoint=SEARCH_ENDPOINT,
    credential=credential
)
index = SearchIndex(
    name="rag-index",
    fields=[
        SearchField(name="chunk_id",    type=SearchFieldDataType.String, key=True),
        SearchField(name="content",     type=SearchFieldDataType.String, searchable=True),
        SearchField(name="source_file", type=SearchFieldDataType.String, filterable=True),
        SearchField(name="page_number", type=SearchFieldDataType.Int32,  filterable=True),
        SearchField(name="sensitivity", type=SearchFieldDataType.String, filterable=True),
        SearchField(
            name="allowed_groups",
            type=SearchFieldDataType.Collection(SearchFieldDataType.String),
            filterable=True
        ),
        SearchField(
            name="embedding",
            type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
            searchable=True,
            vector_search_dimensions=3072,        # text-embedding-3-large
            vector_search_profile_name="hnsw-profile"
        ),
    ],
    vector_search=VectorSearch(
        algorithms=[HnswAlgorithmConfiguration(name="hnsw-algo")],
        profiles=[VectorSearchProfile(
            name="hnsw-profile",
            algorithm_configuration_name="hnsw-algo"
        )]
    ),
    semantic_search=SemanticSearch(
        configurations=[SemanticConfiguration(
            name="semantic-config",
            prioritized_fields=SemanticPrioritizedFields(
                content_fields=[SemanticField(field_name="content")]
            )
        )]
    )
)
index_client.create_or_update_index(index)
print("✅ Index created")

		

Step 9 — Document Ingestion Pipeline

			
# app/ingestion/ingest.py
from azure.storage.blob import BlobServiceClient
from azure.search.documents import SearchClient
from azure.identity import DefaultAzureCredential
from openai import AzureOpenAI
import hashlib, json
credential = DefaultAzureCredential()
def ingest_document(blob_name: str):
    # 1. Download from Blob Storage
    blob_client = BlobServiceClient(
        account_url=STORAGE_URL,
        credential=credential
    ).get_blob_client("documents", blob_name)
    content = blob_client.download_blob().readall().decode("utf-8")
    # 2. Chunk the document
    chunks = chunk_document(content, chunk_size=512, overlap=50)
    # 3. Embed each chunk
    openai_client = AzureOpenAI(
        azure_endpoint=OPENAI_ENDPOINT,
        azure_ad_token_provider=get_token_provider(credential)
    )
    documents = []
    for i, chunk in enumerate(chunks):
        embedding = openai_client.embeddings.create(
            input=chunk,
            model="text-embedding-3-large"
        ).data[0].embedding
        documents.append({
            "chunk_id":      hashlib.md5(f"{blob_name}-{i}".encode()).hexdigest(),
            "content":       chunk,
            "source_file":   blob_name,
            "page_number":   i,
            "embedding":     embedding,
            "allowed_groups": get_document_acl(blob_name),  # from Purview / metadata
            "sensitivity":   get_sensitivity_label(blob_name)
        })
    # 4. Upload to AI Search
    search_client = SearchClient(
        endpoint=SEARCH_ENDPOINT,
        index_name="rag-index",
        credential=credential
    )
    result = search_client.upload_documents(documents)
    print(f"✅ Indexed {len(documents)} chunks from {blob_name}")

		

Step 10 — RAG API (FastAPI)

			
# app/api/main.py
from fastapi import FastAPI, Depends, HTTPException
from azure.identity import DefaultAzureCredential
from azure.search.documents import SearchClient
from azure.search.documents.models import VectorizedQuery
from openai import AzureOpenAI
app = FastAPI()
credential = DefaultAzureCredential()
@app.post("/chat")
async def chat(
    request: ChatRequest,
    user: dict = Depends(verify_entra_token)   # Auth middleware
):
    # 1. Validate input
    sanitized_query = sanitize_input(request.query)
    # 2. Embed query
    query_embedding = embed(sanitized_query)
    # 3. Retrieve with security filter
    user_groups = user.get("groups", [])
    security_filter = build_security_filter(user_groups, user["oid"])
    search_client = SearchClient(
        SEARCH_ENDPOINT, "rag-index", credential
    )
    results = search_client.search(
        search_text=sanitized_query,
        vector_queries=[VectorizedQuery(
            vector=query_embedding,
            k_nearest_neighbors=5,
            fields="embedding"
        )],
        filter=security_filter,
        query_type="semantic",
        semantic_configuration_name="semantic-config",
        top=5
    )
    chunks = [r["content"] for r in results]
    sources = [r["source_file"] for r in results]
    # 4. Generate answer
    context = "\n\n---\n\n".join(chunks)
    prompt = build_rag_prompt(sanitized_query, context)
    openai_client = AzureOpenAI(
        azure_endpoint=OPENAI_ENDPOINT,
        azure_ad_token_provider=get_token_provider(credential)
    )
    response = openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user",   "content": prompt}
        ],
        temperature=0.0,    # Deterministic for RAG
        max_tokens=1000
    )
    answer = response.choices[0].message.content
    # 5. Safety check output
    check_content_safety(answer)
    # 6. Audit log
    log_interaction(user["oid"], sanitized_query, sources, answer)
    return {"answer": answer, "sources": sources}

		

Step 11 — CI/CD Pipeline (GitHub Actions)

			
# .github/workflows/deploy.yml
name: Deploy RAG Infrastructure
on:
  push:
    branches: [main]
  pull_request:
    branches: [main]
env:
  TF_VERSION: "1.6.0"
  ARM_CLIENT_ID: ${{ secrets.AZURE_CLIENT_ID }}
  ARM_SUBSCRIPTION_ID: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
  ARM_TENANT_ID: ${{ secrets.AZURE_TENANT_ID }}
jobs:
  terraform:
    name: Terraform Plan & Apply
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Azure Login (OIDC)
        uses: azure/login@v1
        with:
          client-id: ${{ secrets.AZURE_CLIENT_ID }}
          tenant-id: ${{ secrets.AZURE_TENANT_ID }}
          subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
      - name: Terraform Init
        run: terraform init
        working-directory: infra/
      - name: Terraform Plan
        run: terraform plan -out=tfplan
        working-directory: infra/
      - name: Terraform Apply
        if: github.ref == 'refs/heads/main'
        run: terraform apply tfplan
        working-directory: infra/
  build-and-push:
    name: Build & Push Docker Image
    runs-on: ubuntu-latest
    needs: terraform
    steps:
      - uses: actions/checkout@v4
      - name: Build Docker image
        run: docker build -t rag-api:${{ github.sha }} ./app
      - name: Push to ACR
        run: |
          az acr login --name ${{ secrets.ACR_NAME }}
          docker tag rag-api:${{ github.sha }} \
            ${{ secrets.ACR_NAME }}.azurecr.io/rag-api:${{ github.sha }}
          docker push ${{ secrets.ACR_NAME }}.azurecr.io/rag-api:${{ github.sha }}
  deploy-app:
    name: Deploy to App Service
    runs-on: ubuntu-latest
    needs: build-and-push
    steps:
      - name: Update App Service image
        run: |
          az webapp config container set \
            --name ${{ secrets.APP_NAME }} \
            --resource-group ${{ secrets.RG_NAME }} \
            --docker-custom-image-name \
              ${{ secrets.ACR_NAME }}.azurecr.io/rag-api:${{ github.sha }}

		

Deployment Commands

			
# 1. Login to Azure
az login
az account set --subscription "your-subscription-id"
# 2. Create Terraform state backend
az group create --name rg-tfstate --location eastus
az storage account create --name stgtfstate --resource-group rg-tfstate \
  --sku Standard_LRS
az storage container create --name tfstate \
  --account-name stgtfstate
# 3. Deploy infrastructure
cd infra/
terraform init
terraform plan -var-file="environments/prod.tfvars"
terraform apply -var-file="environments/prod.tfvars"
# 4. Index your documents
python scripts/create_index.py
python scripts/ingest_documents.py --container documents
# 5. Test the API
curl -X POST https://your-app.azurewebsites.net/chat \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"query": "What is our refund policy?"}'

		

Production Checklist

			
Infrastructure
  ✅ All services deployed via IaC (Terraform / Bicep)
  ✅ Private endpoints on OpenAI, AI Search, Storage, Key Vault
  ✅ Public access disabled on all backend services
  ✅ Managed Identity — zero hardcoded secrets
Security
  ✅ Entra ID auth on API
  ✅ Document-level ACL enforced at retrieval
  ✅ Content Safety on input + output
  ✅ Key Vault for all secrets
  ✅ WAF on Front Door / APIM
Reliability
  ✅ AI Search replica_count ≥ 2
  ✅ App Service always_on = true
  ✅ Health check endpoint configured
  ✅ Auto-scaling rules set
Monitoring
  ✅ Application Insights connected
  ✅ Log Analytics workspace active
  ✅ Alerts on latency, error rate, cost
  ✅ Audit logs to BigQuery / Log Analytics
CI/CD
  ✅ Terraform state in remote backend
  ✅ OIDC auth (no client secrets in GitHub)
  ✅ PR plan, main apply workflow
  ✅ Image tagged with git SHA

		

This gives you a fully production-ready, secure RAG deployment on Azure — infrastructure as code, zero hardcoded secrets, private networking, and document-level access control from day one.

Infra Cloud Solutions

Deploying RAG Infrastructure on Azure: A Step-by-Step Guide

Overview — What We’re Building

Prerequisites

Option A — Deploy with Terraform (Recommended)

Project Structure

Step 1 — Core Infrastructure (Terraform)

Step 2 — Azure OpenAI

Step 3 — Azure AI Search

Step 4 — Storage Account (Document Store)

Step 5 — Key Vault

Step 6 — App Service (RAG API)

Step 7 — RBAC Assignments

Step 8 — Create the AI Search Index

Step 9 — Document Ingestion Pipeline

Step 10 — RAG API (FastAPI)

Step 11 — CI/CD Pipeline (GitHub Actions)

Deployment Commands

Production Checklist

Leave a comment Cancel reply

Overview — What We’re Building

Prerequisites

Option A — Deploy with Terraform (Recommended)

Project Structure

Step 1 — Core Infrastructure (Terraform)

Step 2 — Azure OpenAI

Step 3 — Azure AI Search

Step 4 — Storage Account (Document Store)

Step 5 — Key Vault

Step 6 — App Service (RAG API)

Step 7 — RBAC Assignments

Step 8 — Create the AI Search Index

Step 9 — Document Ingestion Pipeline

Step 10 — RAG API (FastAPI)

Step 11 — CI/CD Pipeline (GitHub Actions)

Deployment Commands

Production Checklist

Share this:

Related

Leave a comment Cancel reply