How LangChain Works — Deep Dive
The Big Picture
LangChain works by chaining together components — each component does one job, and they pass data to each other in a pipeline.
Input → [Component 1] → [Component 2] → [Component 3] → Output
Step-by-Step Execution Flow
┌─────────────────────────────────────────────────────┐│ USER INPUT ││ "Summarize my uploaded PDF" │└──────────────────────┬──────────────────────────────┘ ↓┌─────────────────────────────────────────────────────┐│ 1. DOCUMENT LOADER ││ Reads PDF → extracts raw text │└──────────────────────┬──────────────────────────────┘ ↓┌─────────────────────────────────────────────────────┐│ 2. TEXT SPLITTER ││ Splits text into smaller chunks (e.g. 500 ││ tokens each) so LLM can process them │└──────────────────────┬──────────────────────────────┘ ↓┌─────────────────────────────────────────────────────┐│ 3. EMBEDDINGS + VECTOR STORE ││ Converts chunks into vectors → stores in DB ││ (enables semantic search later) │└──────────────────────┬──────────────────────────────┘ ↓┌─────────────────────────────────────────────────────┐│ 4. RETRIEVER ││ User asks question → finds most relevant chunks │└──────────────────────┬──────────────────────────────┘ ↓┌─────────────────────────────────────────────────────┐│ 5. PROMPT TEMPLATE ││ Injects retrieved chunks + question into prompt │└──────────────────────┬──────────────────────────────┘ ↓┌─────────────────────────────────────────────────────┐│ 6. LLM (Claude / GPT etc.) ││ Generates answer based on context │└──────────────────────┬──────────────────────────────┘ ↓┌─────────────────────────────────────────────────────┐│ 7. OUTPUT PARSER ││ Formats raw LLM response → structured output │└──────────────────────┬──────────────────────────────┘ ↓ FINAL ANSWER
Core Mechanism 1 — Chains
A Chain is the most basic unit. It connects components in sequence.
from langchain.chains import LLMChainfrom langchain.prompts import PromptTemplate# Step 1: Define a prompt templateprompt = PromptTemplate( input_variables=["topic"], template="Explain {topic} in simple terms.")# Step 2: Connect prompt → LLMchain = LLMChain(llm=llm, prompt=prompt)# Step 3: Run itresult = chain.run("quantum computing")# Output: "Quantum computing is..."
Data flows like this:
"quantum computing" ↓PromptTemplate fills in → "Explain quantum computing in simple terms." ↓LLM generates response ↓Output returned
Core Mechanism 2 — Memory
Memory injects past conversation into every new prompt automatically.
from langchain.memory import ConversationBufferMemorymemory = ConversationBufferMemory()# Turn 1memory.save_context({"input": "My name is Alex"}, {"output": "Nice to meet you, Alex!"})# Turn 2 — memory auto-injects history into next promptprint(memory.load_memory_variables({}))# → {"history": "Human: My name is Alex\nAI: Nice to meet you, Alex!"}
Internally, every prompt becomes:
[Past conversation history] ← injected by memory[Current user message] ← new input[LLM response]
Types of memory:
| Type | How it works |
|---|---|
BufferMemory | Stores full raw conversation |
SummaryMemory | Summarizes old turns to save tokens |
WindowMemory | Keeps only last N turns |
VectorStoreMemory | Retrieves semantically relevant past messages |
Core Mechanism 3 — Retrieval (RAG)
RAG = Retrieval-Augmented Generation. Lets the LLM answer questions about YOUR data.
YOUR DATA (PDF, website, DB) ↓ Split into chunks ↓ Convert to vectors (embeddings) ↓ Store in Vector DB (e.g. FAISS, Pinecone) ↓User asks: "What does page 5 say about revenue?" ↓ Search vector DB → find top 3 relevant chunks ↓ Inject chunks into prompt → LLM answers
from langchain.vectorstores import FAISSfrom langchain.embeddings import OpenAIEmbeddings# Store documents as vectorsvectorstore = FAISS.from_documents(docs, OpenAIEmbeddings())# Retrieve relevant chunks for a queryretriever = vectorstore.as_retriever()relevant_docs = retriever.get_relevant_documents("What is the revenue?")
Core Mechanism 4 — Agents
Agents are the most powerful part. The LLM dynamically decides which tools to use and in what order.
User: "Search the web for today's Bitcoin price and convert it to CAD" ↓ Agent thinks: "I need 2 tools — web_search, then currency_converter" ↓ Step 1: calls web_search("Bitcoin price today") ↓ Step 2: reads result → $63,000 USD ↓ Step 3: calls currency_converter(63000, "USD", "CAD") ↓ Step 4: reads result → $86,000 CAD ↓ Agent responds: "Bitcoin is ~$86,000 CAD today"
The internal reasoning loop (ReAct pattern):
Thought: What do I need to do?Action: Call tool X with input YObservation: Tool returned ZThought: Now I need to...Action: Call tool A with input B...repeat until...Final Answer: [complete response]
How Components Connect — LCEL
Modern LangChain uses LCEL (LangChain Expression Language) — a clean pipe | syntax:
from langchain_core.runnables import RunnablePassthrough# Build a RAG chain using pipesrag_chain = ( {"context": retriever, "question": RunnablePassthrough()} | prompt_template | llm | output_parser)# Run itrag_chain.invoke("What is the company's revenue?")
Each | passes the output of one component as input to the next — just like Unix pipes.
Full Internal Flow Summary
User Query │ ▼[Memory] ──────────────────────────────────────┐ │ │ ▼ │[Retriever] → finds relevant docs │ │ │ ▼ ▼[Prompt Template] ← fills in: query + docs + history │ ▼[LLM Model] → generates raw text │ ▼[Output Parser] → structures the response │ ▼[Memory] ← saves this turn to history │ ▼Final Response → User
Key Takeaway
LangChain works by breaking AI applications into modular, composable pieces — each doing one job well — and connecting them into powerful pipelines that can remember, retrieve, reason, and act.