A solid GenAI chatbot on Azure OpenAI usually looks like this:
User ↓Web / Mobile / Teams UI ↓Backend API / Orchestrator ├─ Auth (Microsoft Entra ID) ├─ Prompt assembly + guardrails ├─ Conversation state └─ Tool calling / business logic ↓Azure OpenAI ├─ Chat model └─ Embedding model ↓Azure AI Search ├─ Keyword + vector + semantic retrieval └─ Citations / grounding docs ↓Enterprise data sources ├─ Blob / SharePoint / SQL / Cosmos DB └─ Ingestion + chunking pipeline
For Azure, the most common production pattern is RAG: the app retrieves relevant chunks from your data with Azure AI Search, then sends those chunks to Azure OpenAI so answers stay grounded instead of relying only on model memory. Microsoft specifically recommends Azure AI Search as an index store for RAG, and its current docs distinguish classic RAG from newer agentic retrieval patterns. (Microsoft Learn)
Core components
Frontend
A web app, mobile app, or Teams app handles chat UI, file uploads, citations, and feedback.
Backend / orchestrator
This is the “brain” of the app. It manages auth, session history, prompt templates, retrieval calls, tool use, rate limiting, and logging. In Microsoft’s baseline enterprise chat architecture, the app layer sits in front of the model and retrieval services rather than having the client talk to the model directly. (Microsoft Learn)
Azure OpenAI
Use one deployment for chat and usually another for embeddings. The chat model generates the answer; the embedding model converts documents and queries into vectors for retrieval. Azure OpenAI “On Your Data” exists as a simpler way to ground answers in enterprise content, though Microsoft labels that path as “classic.” (Microsoft Learn)
Azure AI Search
This is the retrieval layer. It supports vector search, semantic ranking, hybrid search, enrichment, and newer agentic retrieval features for chatbot scenarios. Microsoft’s current guidance says Azure AI Search is a recommended retrieval/index layer for RAG workloads. (Microsoft Learn)
Data ingestion pipeline
Documents from Blob, SharePoint, SQL, PDFs, and other sources get extracted, chunked, enriched, and indexed. Azure AI Search supports enrichment for content such as PDFs and images that are not searchable in raw form. (Microsoft Learn)
Best-practice architecture
1. Start with RAG, not pure prompting
For an enterprise chatbot, keep company docs outside the prompt until query time. Store them in Azure AI Search, then retrieve only the relevant chunks for each question. Microsoft’s RAG guidance says this improves grounding and supports citations and better relevance. (Microsoft Learn)
2. Use hybrid retrieval
Use vector + keyword + semantic ranking together. Azure AI Search supports this combination, and it is usually stronger than relying on vectors alone for real-world business documents. (Microsoft Learn)
3. Add identity-aware filtering
If different users should see different documents, put Microsoft Entra ID in front of the app and apply Azure AI Search security filters or document-level access trimming. Microsoft documents this specifically for Azure OpenAI On Your Data with Azure AI Search. (Microsoft Learn)
4. Separate conversation memory from knowledge retrieval
Keep short-term chat history in app storage, but keep source-of-truth business content in the search index. This avoids bloated prompts and makes updates to your knowledge base easier. Microsoft’s baseline chat architecture separates the app/orchestration layer from the grounding data layer. (Microsoft Learn)
5. Prefer managed identity where possible
Microsoft’s Azure App Service RAG tutorial uses managed identities for passwordless authentication between services. That is the cleaner production pattern versus storing secrets in code. (Microsoft Learn)
Two good Azure patterns
Pattern A: Simpler RAG app
Use this when you want a straightforward chatbot fast.
App Service / AKS ↓Backend API ↓Azure AI Search ↓Azure OpenAI
This is the easier option and matches Microsoft’s tutorial-style architecture for grounded chat apps. (Microsoft Learn)
Pattern B: Agent-style chatbot
Use this when you need tool use, more complex reasoning, or multi-step workflows.
UI ↓Foundry Agent Service / custom orchestrator ├─ retrieval ├─ tools ├─ memory └─ policy checks ↓Azure OpenAI + Azure AI Search + enterprise APIs
Microsoft’s current architecture guidance includes Foundry Agent Service and a baseline Foundry chat reference architecture for enterprise chat applications. (Microsoft Learn)
What I’d recommend
For most teams:
- Frontend: React or Teams app
- Backend: App Service or AKS
- LLM: Azure OpenAI
- Retrieval: Azure AI Search
- Identity: Entra ID
- Secrets: Key Vault
- Telemetry: Application Insights / Azure Monitor
- Documents: Blob + ingestion pipeline
That gives you a practical, scalable architecture without too much complexity. Azure AI Search is the natural retrieval layer, and Azure’s current enterprise chat reference architectures are built around that same idea. (Microsoft Learn)
Common mistakes
- Letting the frontend call the model directly
- Sending entire documents to the model instead of retrieving chunks
- Skipping citations
- Mixing access control into prompt text instead of enforcing it in retrieval
- Using only vector search when hybrid retrieval would work better
- Treating chat history as your knowledge base
Quick starter version
Users ↓Azure App Service ↓Backend API ├─ Entra ID auth ├─ prompt templates ├─ chat history store └─ calls Azure AI Search ↓ top-k chunks + citations ↓ Azure OpenAI ↓ answer to user