Understanding LangChain: How It Works

How LangChain Works — Deep Dive


The Big Picture

LangChain works by chaining together components — each component does one job, and they pass data to each other in a pipeline.

Input → [Component 1] → [Component 2] → [Component 3] → Output

Step-by-Step Execution Flow

┌─────────────────────────────────────────────────────┐
│ USER INPUT │
│ "Summarize my uploaded PDF" │
└──────────────────────┬──────────────────────────────┘
┌─────────────────────────────────────────────────────┐
│ 1. DOCUMENT LOADER │
│ Reads PDF → extracts raw text │
└──────────────────────┬──────────────────────────────┘
┌─────────────────────────────────────────────────────┐
│ 2. TEXT SPLITTER │
│ Splits text into smaller chunks (e.g. 500 │
│ tokens each) so LLM can process them │
└──────────────────────┬──────────────────────────────┘
┌─────────────────────────────────────────────────────┐
│ 3. EMBEDDINGS + VECTOR STORE │
│ Converts chunks into vectors → stores in DB │
│ (enables semantic search later) │
└──────────────────────┬──────────────────────────────┘
┌─────────────────────────────────────────────────────┐
│ 4. RETRIEVER │
│ User asks question → finds most relevant chunks │
└──────────────────────┬──────────────────────────────┘
┌─────────────────────────────────────────────────────┐
│ 5. PROMPT TEMPLATE │
│ Injects retrieved chunks + question into prompt │
└──────────────────────┬──────────────────────────────┘
┌─────────────────────────────────────────────────────┐
│ 6. LLM (Claude / GPT etc.) │
│ Generates answer based on context │
└──────────────────────┬──────────────────────────────┘
┌─────────────────────────────────────────────────────┐
│ 7. OUTPUT PARSER │
│ Formats raw LLM response → structured output │
└──────────────────────┬──────────────────────────────┘
FINAL ANSWER

Core Mechanism 1 — Chains

A Chain is the most basic unit. It connects components in sequence.

from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
# Step 1: Define a prompt template
prompt = PromptTemplate(
input_variables=["topic"],
template="Explain {topic} in simple terms."
)
# Step 2: Connect prompt → LLM
chain = LLMChain(llm=llm, prompt=prompt)
# Step 3: Run it
result = chain.run("quantum computing")
# Output: "Quantum computing is..."

Data flows like this:

"quantum computing"
PromptTemplate fills in → "Explain quantum computing in simple terms."
LLM generates response
Output returned

Core Mechanism 2 — Memory

Memory injects past conversation into every new prompt automatically.

from langchain.memory import ConversationBufferMemory
memory = ConversationBufferMemory()
# Turn 1
memory.save_context({"input": "My name is Alex"},
{"output": "Nice to meet you, Alex!"})
# Turn 2 — memory auto-injects history into next prompt
print(memory.load_memory_variables({}))
# → {"history": "Human: My name is Alex\nAI: Nice to meet you, Alex!"}

Internally, every prompt becomes:

[Past conversation history] ← injected by memory
[Current user message] ← new input
[LLM response]

Types of memory:

TypeHow it works
BufferMemoryStores full raw conversation
SummaryMemorySummarizes old turns to save tokens
WindowMemoryKeeps only last N turns
VectorStoreMemoryRetrieves semantically relevant past messages

Core Mechanism 3 — Retrieval (RAG)

RAG = Retrieval-Augmented Generation. Lets the LLM answer questions about YOUR data.

YOUR DATA (PDF, website, DB)
Split into chunks
Convert to vectors (embeddings)
Store in Vector DB (e.g. FAISS, Pinecone)
User asks: "What does page 5 say about revenue?"
Search vector DB → find top 3 relevant chunks
Inject chunks into prompt → LLM answers
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings
# Store documents as vectors
vectorstore = FAISS.from_documents(docs, OpenAIEmbeddings())
# Retrieve relevant chunks for a query
retriever = vectorstore.as_retriever()
relevant_docs = retriever.get_relevant_documents("What is the revenue?")

Core Mechanism 4 — Agents

Agents are the most powerful part. The LLM dynamically decides which tools to use and in what order.

User: "Search the web for today's Bitcoin price and convert it to CAD"
Agent thinks: "I need 2 tools — web_search, then currency_converter"
Step 1: calls web_search("Bitcoin price today")
Step 2: reads result → $63,000 USD
Step 3: calls currency_converter(63000, "USD", "CAD")
Step 4: reads result → $86,000 CAD
Agent responds: "Bitcoin is ~$86,000 CAD today"

The internal reasoning loop (ReAct pattern):

Thought: What do I need to do?
Action: Call tool X with input Y
Observation: Tool returned Z
Thought: Now I need to...
Action: Call tool A with input B
...repeat until...
Final Answer: [complete response]

How Components Connect — LCEL

Modern LangChain uses LCEL (LangChain Expression Language) — a clean pipe | syntax:

from langchain_core.runnables import RunnablePassthrough
# Build a RAG chain using pipes
rag_chain = (
{"context": retriever, "question": RunnablePassthrough()}
| prompt_template
| llm
| output_parser
)
# Run it
rag_chain.invoke("What is the company's revenue?")

Each | passes the output of one component as input to the next — just like Unix pipes.


Full Internal Flow Summary

User Query
[Memory] ──────────────────────────────────────┐
│ │
▼ │
[Retriever] → finds relevant docs │
│ │
▼ ▼
[Prompt Template] ← fills in: query + docs + history
[LLM Model] → generates raw text
[Output Parser] → structures the response
[Memory] ← saves this turn to history
Final Response → User

Key Takeaway

LangChain works by breaking AI applications into modular, composable pieces — each doing one job well — and connecting them into powerful pipelines that can remember, retrieve, reason, and act.

Leave a comment