LangChain Components
LangChain Components:
The Six Pillars You Need to Know
A structured deep-dive into Models, Prompts, Chains, Indexes, Memory & Agents — with code, diagrams, and the mental models behind each one.
LangChain isn't just another Python library — it's an orchestration framework that turns raw LLM API calls into structured, production-grade AI applications. Whether you're building a customer support bot, a document Q&A system, or an autonomous AI agent, LangChain gives you six core building blocks to do it.
This post breaks down all six components from the ground up — starting with the intuition, then the code, then the "why it matters" context. Let's go.
๐ค Models
In LangChain, "models" are the core interfaces through which you interact with AI models. Think of them as standardised adapters — you swap providers without rewriting your app logic.
The NLP pipeline that powers modern chatbots looks like this: raw NLP input → Natural Language Understanding (NLU) → Large Language Model (LLM) → response. LLMs are internet-scale, trained on billions of parameters (model sizes >100 GB are common), and accessed via server APIs — not locally.
LangChain wraps this pipeline into two major model families:
Language Models (LLMs)
Take a text prompt as input and return a text completion. Examples: GPT-4, Claude, Llama 3. Accessed via langchain_openai.ChatOpenAI or langchain_anthropic.ChatAnthropic.
Embedding Models
Transform text into a high-dimensional numerical vector. These vectors power semantic search — the backbone of Retrieval-Augmented Generation (RAG) systems.
The key insight LangChain provides: a unified interface. You call .invoke() on any model regardless of whether it's OpenAI, Anthropic, or a locally hosted Ollama model. Here's what that looks like for both major providers:
from langchain_openai import ChatOpenAI from dotenv import load_dotenv load_dotenv() # GPT-4 at temperature=0 → deterministic, consistent output model = ChatOpenAI(model='gpt-4', temperature=0) result = model.invoke("Now divide the result by 1.5") print(result.content)
from langchain_anthropic import ChatAnthropic from dotenv import load_dotenv load_dotenv() model = ChatAnthropic(model='claude-3-opus-20240229') result = model.invoke("Hi, who are you") print(result.content)
.invoke() call works for OpenAI, Claude, Gemini, Mistral, and locally-run models. This is the core value of LangChain's model abstraction — your application logic stays unchanged when you switch providers or upgrade model versions.
๐ Prompts
A prompt is the instruction you send to an LLM. LangChain's prompt tooling transforms raw strings into structured, reusable, and context-aware templates — the foundation of prompt engineering at scale.
Think of it this way: ChatGPT (and other interfaces) have a hidden system prompt baked in that shapes every response. With LangChain, you become the prompt engineer — you control exactly what context, tone, and format the model receives.
LangChain offers three progressively powerful patterns:
① Dynamic & Reusable Prompts
Use PromptTemplate to inject variables at runtime. Write the template once, reuse it infinitely with different inputs.
from langchain_core.prompts import PromptTemplate prompt = PromptTemplate.from_template( 'Summarize {topic} in {emotion} tone' ) # Renders to: "Summarize Cricket in fun tone" print(prompt.format(topic='Cricket', emotion='fun'))
② Role-Based Prompts (Chat Templates)
Modern LLMs like GPT-4 and Claude follow a system / user / assistant conversation structure. ChatPromptTemplate lets you encode multi-turn, role-aware conversations as templates.
from langchain_core.prompts import ChatPromptTemplate chat_prompt = ChatPromptTemplate.from_template([ ("system", "Hi you are an experienced {profession}"), ("user", "Tell me about {topic}"), ]) formatted = chat_prompt.format_messages( profession="Doctor", topic="Viral Fever" )
③ Few-Shot Prompting
Instead of telling the model how to behave, you show it with examples. This technique dramatically improves accuracy on classification, extraction, and structured output tasks — especially when fine-tuning isn't an option.
The classic pattern: give the model 3–4 labelled examples in the prompt, then present the unlabelled input. The model infers the pattern.
# Step 1: define labelled examples examples = [ {"input": "I was charged twice this month.", "output": "Billing Issue"}, {"input": "The app crashes when I log in.", "output": "Technical Problem"}, {"input": "Can you explain how to upgrade my plan?", "output": "General Inquiry"}, {"input": "I need a refund for an unauthorized payment.","output": "Billing Issue"}, ] # Step 2: define the per-example template example_template = """Ticket: {input} Category: {output}""" # Step 3: build the full few-shot prompt from langchain_core.prompts import FewShotPromptTemplate, PromptTemplate few_shot_prompt = FewShotPromptTemplate( examples=examples, example_prompt=PromptTemplate( input_variables=["input", "output"], template=example_template ), prefix="Classify tickets into: 'Billing Issue', 'Technical Problem', or 'General Inquiry'.\n\n", suffix="\nTicket: {user_input}\nCategory:", input_variables=["user_input"], )
FewShotPromptTemplate operationalises this at the application layer.
๐ Chains
A Chain is a pipeline — a sequence of steps where the output of one step becomes the input of the next. Chains replace manually stitching LLM calls together with structured, composable pipelines.
Without chains, connecting multiple LLM steps is brittle: you call LLM 1, extract its output, reformat, call LLM 2, and so on. With chains, this entire flow is declarative and composable.
The diagram above shows a real-world use case: a 1000-token English document → LLM 1 translates to Hindi (100 tokens) → LLM 2 summarises → final output. Neither step needed to be manually wired.
LangChain supports three chain patterns for complex workflows:
Sequential Chain
Steps run one after another. Output of step N is input to step N+1. Use for: translate → summarise, extract → format → send.
Parallel Chain
Multiple LLMs run on the same input simultaneously, then their outputs are merged. Use for: multi-perspective analysis, A/B report generation.
Conditional Chain
The pipeline branches based on the output. Use for: "if feedback is 'good' → send Thank You; if 'bad' → send email escalation."
LCEL (LangChain Expression Language)
The modern way to build chains using the pipe operator |. Composable, async-ready, and streaming-compatible: prompt | model | parser.
๐ Indexes & Retrieval (RAG)
Indexes connect your application to external knowledge — PDFs, websites, databases, rulebooks. This is the heart of Retrieval-Augmented Generation (RAG), the dominant pattern for grounding LLMs in private or up-to-date data.
LLMs are trained up to a knowledge cutoff and know nothing about your private documents. Indexes solve this by giving the model a way to look up relevant information before generating an answer.
The RAG pipeline has two distinct phases:
Phase 1 — Ingestion (offline): Your PDF (say, a 1000-page rulebook) is loaded by a DocumentLoader, sliced into chunks by a TextSplitter, converted into embedding vectors, and stored in a Vector Store (like FAISS, Chroma, or Pinecone).
Phase 2 — Retrieval (online): When a user asks a question, it's embedded into the same vector space, a semantic search finds the top-K most relevant chunks, those chunks are injected into the LLM prompt as context, and the model generates a grounded answer.
๐ง Memory
LLM API calls are stateless — every call is independent, with no knowledge of prior turns. Memory is LangChain's solution to building conversational continuity into what is fundamentally a memoryless system.
Here's the problem in concrete terms: you ask "Who is Narendra Modi?" and the LLM answers correctly. Then you immediately ask "How old is he?" — and the LLM has no idea who "he" refers to. Each API call starts fresh.
LangChain's memory layer solves this by injecting previous conversation history into each new API call. The key decision is how much history to inject — and that's where the four memory types differ:
| Memory Type | How It Works | Best For | Trade-off |
|---|---|---|---|
| ConversationBufferMemory | Stores the complete raw transcript of all turns | Short conversations, demos | Token count grows unbounded |
| ConversationBufferWindowMemory | Keeps only the last N interactions (sliding window) | Chat interfaces with token limits | Older context is silently dropped |
| Summarizer-Based Memory | Periodically summarises older turns into a compressed form | Long-running sessions, assistants | Summary fidelity depends on the model |
| Custom Memory | You define what state to store — user preferences, extracted facts, etc. | Advanced personalisation use cases | Requires explicit engineering effort |
ConversationBufferMemory could easily consume 10k+ tokens per request. For production systems, ConversationBufferWindowMemory (last N turns) or summariser-based approaches are strongly preferred to keep latency and costs in check.
๐ต️ Agents
An Agent is an LLM + Reasoning + Tools. Where a chatbot responds, an agent acts. It decides which tool to call, calls it, observes the result, and iterates until the goal is reached.
The jump from chatbot to AI agent is significant. A chatbot (LLM + NLU + text generation) can only generate text. An AI agent is a chatbot with superpowers — it can query APIs, run calculators, search the web, book flights, and chain all of these actions together autonomously.
The key mechanism is the ReAct loop (Reasoning + Acting): the agent thinks step-by-step (Chain of Thought), decides on a tool action, observes the tool's output, and repeats until it can produce a final answer.
A concrete example from the notes: "What is today's temperature in Delhi multiplied by 3?"
Notice what happened: the agent planned autonomously, called two different tools in sequence, and composed the result — all without any hardcoded if/else logic. This is why agents are the most powerful (and most complex) component in LangChain.
AgentExecutor is a direct implementation of this pattern.
๐บ The Big Picture: How All 6 Connect
These six components aren't independent features — they're a stack. A production AI application typically layers them like this:
Models and Prompts are the foundation. Chains orchestrate them. Indexes feed them external knowledge. Memory gives them state. Agents give them autonomy. Together, they turn a raw LLM API call into a complete AI application.
What's Next in This Series?
Now that the six components are clear, the next logical step is building real applications with them — a document Q&A bot using RAG, a customer support agent with tools, and a multi-agent workflow.
The key papers worth reading alongside this: the original ReAct paper (Yao et al., 2022), RAG (Lewis et al., 2020), and the GPT-3 few-shot learning paper (Brown et al., 2020) — all of which are directly reflected in the patterns LangChain implements.
Comments
Post a Comment