Understanding Azure AI Search: Vector Indexes Explained

Azure AI Search — Vector Indexes, Fields & Configurations Explained


The Big Picture

Think of Azure AI Search like a smart library system for AI:

Your Documents
Convert to Vectors (embeddings)
Store in Vector Index
User asks a question → convert to vector → search → find similar docs
Return most relevant results

These three concepts — Vector Indexes, Vector Fields, and Vector Search Configurations — are the three layers that make this work.


1. Vector Index

A Vector Index is the overall container — like a database table — that holds all your documents and their vector representations.

What it is

A named, structured storage unit in Azure AI Search where you define the schema (what fields exist) and store all your data.

Analogy

A regular index = a filing cabinet with labeled folders A vector index = a filing cabinet that also stores the “meaning fingerprint” of every document, so you can search by meaning, not just keywords

How it looks (schema definition)

{
"name": "my-document-index",
"fields": [
{ "name": "chunk_id", "type": "Edm.String", "key": true },
{ "name": "text", "type": "Edm.String", "searchable": true },
{ "name": "source", "type": "Edm.String", "filterable": true },
{ "name": "embedding_vector", "type": "Collection(Edm.Single)",
"dimensions": 1536,
"vectorSearchProfile": "my-vector-profile"
}
],
"vectorSearch": { ... }
}

Key properties of a vector index

PropertyWhat it means
nameUnique identifier for the index
fieldsAll the columns of data stored
key fieldUnique ID per document (like a primary key)
vectorSearchConfiguration for how vector search behaves

Lifecycle

Create index (define schema)
Load documents + their vectors
Index is ready to search
Query it anytime via REST API

2. Vector Fields

A Vector Field is a specific field inside the index that stores the actual vector (embedding) — the numerical representation of a piece of text’s meaning.

What it is

A special type of field that holds a list of floating-point numbers (e.g. 1,536 numbers for OpenAI’s text-embedding-ada-002 model). Each number encodes some aspect of the text’s meaning.

Analogy

Regular text field = stores “Refunds are available within 30 days” Vector field = stores [0.023, -0.841, 0.334, 0.012, …] (1536 numbers representing the meaning of that sentence)

How a vector is generated

"Refunds are available within 30 days"
Azure OpenAI Embedding Model
[0.023, -0.841, 0.334, 0.012, 0.776, ...]
(1,536 floating-point numbers)
Stored in the vector field

Vector field definition

{
"name": "embedding_vector",
"type": "Collection(Edm.Single)",
"dimensions": 1536,
"vectorSearchProfile": "my-vector-profile",
"searchable": true,
"retrievable": false
}

Key properties explained

PropertyWhat it means
type: Collection(Edm.Single)Array of 32-bit floats — the vector
dimensions: 1536Must match the embedding model’s output size
vectorSearchProfileLinks to the algorithm config (see below)
searchable: trueThis field can be used in vector queries
retrievable: falseDon’t return raw vector in results (saves bandwidth)

Common embedding model dimensions

ModelDimensions
Azure OpenAI text-embedding-ada-0021,536
Azure OpenAI text-embedding-3-small1,536
Azure OpenAI text-embedding-3-large3,072
sentence-transformers (local)384 or 768

Why dimensions must match

Document embedded with ada-002 → 1536-dimensional vector
Query embedded with ada-002 → 1536-dimensional vector
✅ Same space → similarity search works
Document embedded with ada-002 → 1536-dimensional vector
Query embedded with text-3-large → 3072-dimensional vector
❌ Different space → results are meaningless

3. Vector Search Configurations

Vector Search Configuration is where you define how the similarity search algorithm works — the engine under the hood that finds the closest vectors.

What it is

A set of rules and parameters that control the search algorithm, the mathematical method for comparing vectors, and performance vs accuracy trade-offs.

It has two parts

Part A — Algorithm Configuration

Defines which algorithm to use for finding similar vectors.

Azure AI Search supports two algorithms:

HNSW (Hierarchical Navigable Small World) — recommended for most use cases

{
"name": "my-hnsw-config",
"kind": "hnsw",
"hnswParameters": {
"metric": "cosine",
"m": 4,
"efConstruction": 400,
"efSearch": 500
}
}

Exhaustive KNN (K-Nearest Neighbors) — brute-force, checks every vector

{
"name": "my-knn-config",
"kind": "exhaustiveKnn",
"exhaustiveKnnParameters": {
"metric": "cosine"
}
}

HNSW vs Exhaustive KNN

HNSWExhaustive KNN
SpeedVery fastSlow (checks everything)
AccuracyNear-perfectPerfect (100%)
ScaleMillions of vectorsSmall datasets only
Use caseProduction RAGTesting / small indexes

HNSW parameters explained

ParameterWhat it controls
metricHow similarity is measured (cosine, euclidean, dotProduct)
mNumber of links per node — higher = more accurate but uses more memory
efConstructionBuild-time accuracy — higher = better index quality, slower build
efSearchQuery-time accuracy — higher = more accurate results, slower query

Part B — Vector Search Profile

A profile links a field to an algorithm config. This is what a vector field references.

{
"vectorSearch": {
"algorithms": [
{
"name": "my-hnsw-config",
"kind": "hnsw",
"hnswParameters": {
"metric": "cosine",
"m": 4,
"efConstruction": 400,
"efSearch": 500
}
}
],
"profiles": [
{
"name": "my-vector-profile",
"algorithm": "my-hnsw-config"
}
]
}
}

The relationship:

Vector Field
└── references → Vector Search Profile
└── references → Algorithm Config
└── defines metric, m, ef values

Similarity Metrics Explained

The metric property defines how distance between two vectors is calculated:

MetricFormula ideaBest for
cosineAngle between vectorsText similarity (most common)
euclideanStraight-line distanceImage embeddings
dotProductMagnitude × directionNormalized vectors

For RAG with text, cosine is almost always the right choice — it measures semantic similarity regardless of text length.


How All Three Work Together

┌─────────────────────────────────────────┐
│ VECTOR INDEX │
│ "my-document-index" │
│ │
│ Fields: │
│ ┌──────────┐ ┌───────────────────┐ │
│ │ chunk_id │ │ embedding_vector │ │
│ │ text │ │ (VECTOR FIELD) │ │
│ │ source │ │ dim: 1536 │ │
│ └──────────┘ │ profile: →────────┼───┼──┐
│ └───────────────────┘ │ │
│ │ ▼
│ Vector Search Config: │ ┌────────────────────┐
│ ┌─────────────────────────────────┐ │ │ PROFILE │
│ │ Algorithm: HNSW │◄──┼──┤ "my-vector- │
│ │ metric: cosine │ │ │ profile" │
│ │ m: 4, efConstruction: 400 │ │ └────────────────────┘
│ └─────────────────────────────────┘ │
└─────────────────────────────────────────┘

A Complete Query Example

When a user asks “What is the refund policy?”:

1. Convert question to vector
[0.021, -0.834, 0.291, ...] (1536 numbers)
2. Send vector query to Azure AI Search
POST /indexes/my-document-index/docs/search
{
"vectorQueries": [{
"kind": "vector",
"vector": [0.021, -0.834, 0.291, ...],
"fields": "embedding_vector",
"k": 5
}]
}
3. HNSW algorithm runs
→ Finds 5 chunks whose vectors are most similar (cosine)
4. Returns top matches
→ "refund_policy.pdf" chunk: score 0.97 ✅
→ "shipping_policy.pdf" chunk: score 0.61
→ "returns_guide.pdf" chunk: score 0.58

Key Takeaway

ConceptRoleAnalogy
Vector IndexContainer for all dataDatabase table
Vector FieldStores the meaning fingerprintDNA of each document
Vector Search ConfigControls how similarity is foundSearch engine settings

Together they form a semantic search engine — instead of matching keywords, Azure AI Search matches meaning, making it the backbone of any production RAG system.

Leave a comment