LangChain Components

LangChain isn't just another Python library — it's an orchestration framework that turns raw LLM API calls into structured, production-grade AI applications. Whether you're building a customer support bot, a document Q&A system, or an autonomous AI agent, LangChain gives you six core building blocks to do it.

This post breaks down all six components from the ground up — starting with the intuition, then the code, then the "why it matters" context. Let's go.

Component 01

🤖 Models

In LangChain, "models" are the core interfaces through which you interact with AI models. Think of them as standardised adapters — you swap providers without rewriting your app logic.

The NLP pipeline that powers modern chatbots looks like this: raw NLP input → Natural Language Understanding (NLU) → Large Language Model (LLM) → response. LLMs are internet-scale, trained on billions of parameters (model sizes >100 GB are common), and accessed via server APIs — not locally.

LangChain wraps this pipeline into two major model families:

💬

Text → Text

Language Models (LLMs)

Take a text prompt as input and return a text completion. Examples: GPT-4, Claude, Llama 3. Accessed via langchain_openai.ChatOpenAI or langchain_anthropic.ChatAnthropic.

📐

Text → Vector

Embedding Models

Transform text into a high-dimensional numerical vector. These vectors power semantic search — the backbone of Retrieval-Augmented Generation (RAG) systems.

The key insight LangChain provides: a unified interface. You call .invoke() on any model regardless of whether it's OpenAI, Anthropic, or a locally hosted Ollama model. Here's what that looks like for both major providers:

Python · OpenAI via LangChain

from langchain_openai import ChatOpenAI
from dotenv import load_dotenv

load_dotenv()

# GPT-4 at temperature=0 → deterministic, consistent output
model = ChatOpenAI(model='gpt-4', temperature=0)
result = model.invoke("Now divide the result by 1.5")
print(result.content)

Python · Claude (Anthropic) via LangChain

from langchain_anthropic import ChatAnthropic
from dotenv import load_dotenv

load_dotenv()

model = ChatAnthropic(model='claude-3-opus-20240229')
result = model.invoke("Hi, who are you")
print(result.content)

💡 Why this matters

The same .invoke() call works for OpenAI, Claude, Gemini, Mistral, and locally-run models. This is the core value of LangChain's model abstraction — your application logic stays unchanged when you switch providers or upgrade model versions.

Component 02

📝 Prompts

A prompt is the instruction you send to an LLM. LangChain's prompt tooling transforms raw strings into structured, reusable, and context-aware templates — the foundation of prompt engineering at scale.

Think of it this way: ChatGPT (and other interfaces) have a hidden system prompt baked in that shapes every response. With LangChain, you become the prompt engineer — you control exactly what context, tone, and format the model receives.

LangChain offers three progressively powerful patterns:

① Dynamic & Reusable Prompts

Use PromptTemplate to inject variables at runtime. Write the template once, reuse it infinitely with different inputs.

Python · PromptTemplate

from langchain_core.prompts import PromptTemplate

prompt = PromptTemplate.from_template(
    'Summarize {topic} in {emotion} tone'
)

# Renders to: "Summarize Cricket in fun tone"
print(prompt.format(topic='Cricket', emotion='fun'))

② Role-Based Prompts (Chat Templates)

Modern LLMs like GPT-4 and Claude follow a system / user / assistant conversation structure. ChatPromptTemplate lets you encode multi-turn, role-aware conversations as templates.

Python · ChatPromptTemplate

from langchain_core.prompts import ChatPromptTemplate

chat_prompt = ChatPromptTemplate.from_template([
    ("system", "Hi you are an experienced {profession}"),
    ("user",   "Tell me about {topic}"),
])

formatted = chat_prompt.format_messages(
    profession="Doctor",
    topic="Viral Fever"
)

③ Few-Shot Prompting

Instead of telling the model how to behave, you show it with examples. This technique dramatically improves accuracy on classification, extraction, and structured output tasks — especially when fine-tuning isn't an option.

The classic pattern: give the model 3–4 labelled examples in the prompt, then present the unlabelled input. The model infers the pattern.

Python · FewShotPromptTemplate (Support Ticket Classifier)

# Step 1: define labelled examples
examples = [
    {"input": "I was charged twice this month.",       "output": "Billing Issue"},
    {"input": "The app crashes when I log in.",           "output": "Technical Problem"},
    {"input": "Can you explain how to upgrade my plan?",   "output": "General Inquiry"},
    {"input": "I need a refund for an unauthorized payment.","output": "Billing Issue"},
]

# Step 2: define the per-example template
example_template = """Ticket: {input}
Category: {output}"""

# Step 3: build the full few-shot prompt
from langchain_core.prompts import FewShotPromptTemplate, PromptTemplate

few_shot_prompt = FewShotPromptTemplate(
    examples=examples,
    example_prompt=PromptTemplate(
        input_variables=["input", "output"],
        template=example_template
    ),
    prefix="Classify tickets into: 'Billing Issue', 'Technical Problem', or 'General Inquiry'.\n\n",
    suffix="\nTicket: {user_input}\nCategory:",
    input_variables=["user_input"],
)

🔬 Research Context: Few-Shot Learning

The few-shot prompting pattern was formalised in the GPT-3 paper "Language Models are Few-Shot Learners" (Brown et al., 2020, NeurIPS). The key finding: sufficiently large models can perform classification tasks from in-context examples alone — no gradient updates required. LangChain's FewShotPromptTemplate operationalises this at the application layer.

Component 03

🔗 Chains

A Chain is a pipeline — a sequence of steps where the output of one step becomes the input of the next. Chains replace manually stitching LLM calls together with structured, composable pipelines.

Without chains, connecting multiple LLM steps is brittle: you call LLM 1, extract its output, reformat, call LLM 2, and so on. With chains, this entire flow is declarative and composable.

The diagram above shows a real-world use case: a 1000-token English document → LLM 1 translates to Hindi (100 tokens) → LLM 2 summarises → final output. Neither step needed to be manually wired.

LangChain supports three chain patterns for complex workflows:

➡️

Sequential Chain

Steps run one after another. Output of step N is input to step N+1. Use for: translate → summarise, extract → format → send.

⚡

Parallel Chain

Multiple LLMs run on the same input simultaneously, then their outputs are merged. Use for: multi-perspective analysis, A/B report generation.

🔀

Conditional Chain

The pipeline branches based on the output. Use for: "if feedback is 'good' → send Thank You; if 'bad' → send email escalation."

🏗️

LCEL (LangChain Expression Language)

The modern way to build chains using the pipe operator |. Composable, async-ready, and streaming-compatible: prompt | model | parser.

Component 04

🗂 Indexes & Retrieval (RAG)

Indexes connect your application to external knowledge — PDFs, websites, databases, rulebooks. This is the heart of Retrieval-Augmented Generation (RAG), the dominant pattern for grounding LLMs in private or up-to-date data.

LLMs are trained up to a knowledge cutoff and know nothing about your private documents. Indexes solve this by giving the model a way to look up relevant information before generating an answer.

The RAG pipeline has two distinct phases:

Phase 1 — Ingestion (offline): Your PDF (say, a 1000-page rulebook) is loaded by a DocumentLoader, sliced into chunks by a TextSplitter, converted into embedding vectors, and stored in a Vector Store (like FAISS, Chroma, or Pinecone).

Phase 2 — Retrieval (online): When a user asks a question, it's embedded into the same vector space, a semantic search finds the top-K most relevant chunks, those chunks are injected into the LLM prompt as context, and the model generates a grounded answer.

📖 Why "Semantic" Search?

Unlike keyword search (which matches exact words), semantic search uses cosine similarity between embedding vectors to find conceptually related content. A query like "fever treatment" will retrieve chunks about "managing high body temperature" even if those exact words never appear — because their vector representations are geometrically close.

Component 05

🧠 Memory

LLM API calls are stateless — every call is independent, with no knowledge of prior turns. Memory is LangChain's solution to building conversational continuity into what is fundamentally a memoryless system.

Here's the problem in concrete terms: you ask "Who is Narendra Modi?" and the LLM answers correctly. Then you immediately ask "How old is he?" — and the LLM has no idea who "he" refers to. Each API call starts fresh.

LangChain's memory layer solves this by injecting previous conversation history into each new API call. The key decision is how much history to inject — and that's where the four memory types differ:

Memory Type	How It Works	Best For	Trade-off
ConversationBufferMemory	Stores the complete raw transcript of all turns	Short conversations, demos	Token count grows unbounded
ConversationBufferWindowMemory	Keeps only the last N interactions (sliding window)	Chat interfaces with token limits	Older context is silently dropped
Summarizer-Based Memory	Periodically summarises older turns into a compressed form	Long-running sessions, assistants	Summary fidelity depends on the model
Custom Memory	You define what state to store — user preferences, extracted facts, etc.	Advanced personalisation use cases	Requires explicit engineering effort

⚠️ The Token Cost of Memory

Every message you inject from memory consumes input tokens — which you pay for. A 100-turn conversation stored in ConversationBufferMemory could easily consume 10k+ tokens per request. For production systems, ConversationBufferWindowMemory (last N turns) or summariser-based approaches are strongly preferred to keep latency and costs in check.

Component 06

🕵️ Agents

An Agent is an LLM + Reasoning + Tools. Where a chatbot responds, an agent acts. It decides which tool to call, calls it, observes the result, and iterates until the goal is reached.

The jump from chatbot to AI agent is significant. A chatbot (LLM + NLU + text generation) can only generate text. An AI agent is a chatbot with superpowers — it can query APIs, run calculators, search the web, book flights, and chain all of these actions together autonomously.

The key mechanism is the ReAct loop (Reasoning + Acting): the agent thinks step-by-step (Chain of Thought), decides on a tool action, observes the tool's output, and repeats until it can produce a final answer.

A concrete example from the notes: "What is today's temperature in Delhi multiplied by 3?"

Thought 1: I need today's temperature in Delhi → use Weather API
Action:  weather_api("Delhi temperature today")
Observation: Delhi temp = 25°C

          Thought 2: Now I need to multiply 25 by 3 → use Calculator
Action:  calculator("25 × 3")
Observation: Result = 75

          Final Answer: Today's Delhi temperature (25°C) × 3 = 75

Notice what happened: the agent planned autonomously, called two different tools in sequence, and composed the result — all without any hardcoded if/else logic. This is why agents are the most powerful (and most complex) component in LangChain.

🔬 Research Context: ReAct Agents

The ReAct (Reasoning + Acting) paradigm was introduced in the paper "ReAct: Synergizing Reasoning and Acting in Language Models" (Yao et al., 2022). The core insight: interleaving reasoning traces with action execution significantly outperforms either reasoning alone (Chain-of-Thought) or acting alone. LangChain's AgentExecutor is a direct implementation of this pattern.

🗺 The Big Picture: How All 6 Connect

These six components aren't independent features — they're a stack. A production AI application typically layers them like this:

Models and Prompts are the foundation. Chains orchestrate them. Indexes feed them external knowledge. Memory gives them state. Agents give them autonomy. Together, they turn a raw LLM API call into a complete AI application.

What's Next in This Series?

Now that the six components are clear, the next logical step is building real applications with them — a document Q&A bot using RAG, a customer support agent with tools, and a multi-agent workflow.

The key papers worth reading alongside this: the original ReAct paper (Yao et al., 2022), RAG (Lewis et al., 2020), and the GPT-3 few-shot learning paper (Brown et al., 2020) — all of which are directly reflected in the patterns LangChain implements.

langchain LLMs RAG AI Agents PromptEngineering Python VectorDB ChatGPT

Search This Blog