Understanding Azure AI Search: Vector Indexes Explained

Azure AI Search — Vector Indexes, Fields & Configurations Explained

The Big Picture

Think of Azure AI Search like a smart library system for AI:

			
Your Documents
      ↓
Convert to Vectors (embeddings)
      ↓
Store in Vector Index
      ↓
User asks a question → convert to vector → search → find similar docs
      ↓
Return most relevant results

		

These three concepts — Vector Indexes, Vector Fields, and Vector Search Configurations — are the three layers that make this work.

1. Vector Index

A Vector Index is the overall container — like a database table — that holds all your documents and their vector representations.

What it is

A named, structured storage unit in Azure AI Search where you define the schema (what fields exist) and store all your data.

Analogy

A regular index = a filing cabinet with labeled folders A vector index = a filing cabinet that also stores the “meaning fingerprint” of every document, so you can search by meaning, not just keywords

How it looks (schema definition)

			
{
  "name": "my-document-index",
  "fields": [
    { "name": "chunk_id",         "type": "Edm.String", "key": true },
    { "name": "text",             "type": "Edm.String", "searchable": true },
    { "name": "source",           "type": "Edm.String", "filterable": true },
    { "name": "embedding_vector", "type": "Collection(Edm.Single)",
      "dimensions": 1536,
      "vectorSearchProfile": "my-vector-profile"
    }
  ],
  "vectorSearch": { ... }
}

		

Key properties of a vector index

Property	What it means
name	Unique identifier for the index
fields	All the columns of data stored
key field	Unique ID per document (like a primary key)
vectorSearch	Configuration for how vector search behaves

Lifecycle

			
Create index (define schema)
       ↓
Load documents + their vectors
       ↓
Index is ready to search
       ↓
Query it anytime via REST API

		

2. Vector Fields

A Vector Field is a specific field inside the index that stores the actual vector (embedding) — the numerical representation of a piece of text’s meaning.

What it is

A special type of field that holds a list of floating-point numbers (e.g. 1,536 numbers for OpenAI’s text-embedding-ada-002 model). Each number encodes some aspect of the text’s meaning.

Analogy

Regular text field = stores “Refunds are available within 30 days” Vector field = stores [0.023, -0.841, 0.334, 0.012, …] (1536 numbers representing the meaning of that sentence)

How a vector is generated

			
"Refunds are available within 30 days"
              ↓
   Azure OpenAI Embedding Model
              ↓
[0.023, -0.841, 0.334, 0.012, 0.776, ...]
(1,536 floating-point numbers)
              ↓
   Stored in the vector field

		

Vector field definition

			
{
  "name": "embedding_vector",
  "type": "Collection(Edm.Single)",
  "dimensions": 1536,
  "vectorSearchProfile": "my-vector-profile",
  "searchable": true,
  "retrievable": false
}

		

Key properties explained

Property	What it means
`type: Collection(Edm.Single)`	Array of 32-bit floats — the vector
`dimensions: 1536`	Must match the embedding model’s output size
`vectorSearchProfile`	Links to the algorithm config (see below)
`searchable: true`	This field can be used in vector queries
`retrievable: false`	Don’t return raw vector in results (saves bandwidth)

Common embedding model dimensions

Model	Dimensions
Azure OpenAI `text-embedding-ada-002`	1,536
Azure OpenAI `text-embedding-3-small`	1,536
Azure OpenAI `text-embedding-3-large`	3,072
sentence-transformers (local)	384 or 768

Why dimensions must match

			
Document embedded with ada-002   → 1536-dimensional vector
Query embedded with ada-002      → 1536-dimensional vector
✅ Same space → similarity search works
Document embedded with ada-002   → 1536-dimensional vector
Query embedded with text-3-large → 3072-dimensional vector
❌ Different space → results are meaningless

		

3. Vector Search Configurations

Vector Search Configuration is where you define how the similarity search algorithm works — the engine under the hood that finds the closest vectors.

What it is

A set of rules and parameters that control the search algorithm, the mathematical method for comparing vectors, and performance vs accuracy trade-offs.

It has two parts

Part A — Algorithm Configuration

Defines which algorithm to use for finding similar vectors.

Azure AI Search supports two algorithms:

HNSW (Hierarchical Navigable Small World) — recommended for most use cases

			
{
  "name": "my-hnsw-config",
  "kind": "hnsw",
  "hnswParameters": {
    "metric": "cosine",
    "m": 4,
    "efConstruction": 400,
    "efSearch": 500
  }
}

		

Exhaustive KNN (K-Nearest Neighbors) — brute-force, checks every vector

			
{
  "name": "my-knn-config",
  "kind": "exhaustiveKnn",
  "exhaustiveKnnParameters": {
    "metric": "cosine"
  }
}

		

HNSW vs Exhaustive KNN

	HNSW	Exhaustive KNN
Speed	Very fast	Slow (checks everything)
Accuracy	Near-perfect	Perfect (100%)
Scale	Millions of vectors	Small datasets only
Use case	Production RAG	Testing / small indexes

HNSW parameters explained

Parameter	What it controls
`metric`	How similarity is measured (cosine, euclidean, dotProduct)
`m`	Number of links per node — higher = more accurate but uses more memory
`efConstruction`	Build-time accuracy — higher = better index quality, slower build
`efSearch`	Query-time accuracy — higher = more accurate results, slower query

Part B — Vector Search Profile

A profile links a field to an algorithm config. This is what a vector field references.

			
{
  "vectorSearch": {
    "algorithms": [
      {
        "name": "my-hnsw-config",
        "kind": "hnsw",
        "hnswParameters": {
          "metric": "cosine",
          "m": 4,
          "efConstruction": 400,
          "efSearch": 500
        }
      }
    ],
    "profiles": [
      {
        "name": "my-vector-profile",
        "algorithm": "my-hnsw-config"
      }
    ]
  }
}

		

The relationship:

			
Vector Field
    └── references → Vector Search Profile
                          └── references → Algorithm Config
                                              └── defines metric, m, ef values

Similarity Metrics Explained

The metric property defines how distance between two vectors is calculated:

Metric	Formula idea	Best for
cosine	Angle between vectors	Text similarity (most common)
euclidean	Straight-line distance	Image embeddings
dotProduct	Magnitude × direction	Normalized vectors

For RAG with text, cosine is almost always the right choice — it measures semantic similarity regardless of text length.

How All Three Work Together

			
┌─────────────────────────────────────────┐
│           VECTOR INDEX                  │
│   "my-document-index"                   │
│                                         │
│   Fields:                               │
│   ┌──────────┐  ┌───────────────────┐   │
│   │ chunk_id │  │ embedding_vector  │   │
│   │ text     │  │ (VECTOR FIELD)    │   │
│   │ source   │  │ dim: 1536         │   │
│   └──────────┘  │ profile: →────────┼───┼──┐
│                 └───────────────────┘   │  │
│                                         │  ▼
│   Vector Search Config:                 │  ┌────────────────────┐
│   ┌─────────────────────────────────┐   │  │  PROFILE           │
│   │ Algorithm: HNSW                 │◄──┼──┤  "my-vector-       │
│   │ metric: cosine                  │   │  │   profile"         │
│   │ m: 4, efConstruction: 400       │   │  └────────────────────┘
│   └─────────────────────────────────┘   │
└─────────────────────────────────────────┘

		

A Complete Query Example

When a user asks “What is the refund policy?”:

			
1. Convert question to vector
   [0.021, -0.834, 0.291, ...] (1536 numbers)
2. Send vector query to Azure AI Search
   POST /indexes/my-document-index/docs/search
   {
     "vectorQueries": [{
       "kind": "vector",
       "vector": [0.021, -0.834, 0.291, ...],
       "fields": "embedding_vector",
       "k": 5
     }]
   }
3. HNSW algorithm runs
   → Finds 5 chunks whose vectors are most similar (cosine)
4. Returns top matches
   → "refund_policy.pdf" chunk: score 0.97  ✅
   → "shipping_policy.pdf" chunk: score 0.61
   → "returns_guide.pdf" chunk: score 0.58

		

Key Takeaway

Concept	Role	Analogy
Vector Index	Container for all data	Database table
Vector Field	Stores the meaning fingerprint	DNA of each document
Vector Search Config	Controls how similarity is found	Search engine settings

Together they form a semantic search engine — instead of matching keywords, Azure AI Search matches meaning, making it the backbone of any production RAG system.

Infra Cloud Solutions

Understanding Azure AI Search: Vector Indexes Explained

Azure AI Search — Vector Indexes, Fields & Configurations Explained

The Big Picture

1. Vector Index

What it is

Analogy

How it looks (schema definition)

Key properties of a vector index

Lifecycle

2. Vector Fields

What it is

Analogy

How a vector is generated

Vector field definition

Key properties explained

Common embedding model dimensions

Why dimensions must match

3. Vector Search Configurations

What it is

It has two parts

Part A — Algorithm Configuration

HNSW vs Exhaustive KNN

HNSW parameters explained

Part B — Vector Search Profile

Similarity Metrics Explained

How All Three Work Together

A Complete Query Example

Key Takeaway

Leave a comment Cancel reply

Azure AI Search — Vector Indexes, Fields & Configurations Explained

The Big Picture

1. Vector Index

What it is

Analogy

How it looks (schema definition)

Key properties of a vector index

Lifecycle

2. Vector Fields

What it is

Analogy

How a vector is generated

Vector field definition

Key properties explained

Common embedding model dimensions

Why dimensions must match

3. Vector Search Configurations

What it is

It has two parts

Part A — Algorithm Configuration

HNSW vs Exhaustive KNN

HNSW parameters explained

Part B — Vector Search Profile

Similarity Metrics Explained

How All Three Work Together

A Complete Query Example

Key Takeaway

Share this:

Related

Leave a comment Cancel reply