Orchestration Options: LangChain, LlamaIndex, and Beyond
By Marcin Piekarski builtweb.com.au · Last Updated: 11 February 2026
TL;DR: Frameworks for building AI workflows. Compare LangChain, LlamaIndex, Haystack, and custom solutions.
TL;DR
Orchestration frameworks like LangChain, LlamaIndex, and Haystack simplify building complex AI workflows by providing abstractions for common patterns like retrieval, chaining, and agent behaviors. Each has different strengths: LangChain for general-purpose chains and agents, LlamaIndex for document indexing and RAG, Haystack for production search pipelines. Start with custom code for simple use cases, adopt frameworks when complexity grows, and avoid over-abstraction by understanding what's happening under the hood.
What Orchestration Frameworks Do
Building AI applications involves more than calling openai.ChatCompletion.create(). Real systems need to:
- Chain multiple LLM calls together with context flow
- Retrieve relevant documents before generating answers (RAG)
- Parse outputs and call external tools based on model decisions
- Manage conversation memory across turns
- Handle errors, retries, and rate limits
- Log and monitor complex workflows
Orchestration frameworks abstract these patterns into reusable components. They save you from reinventing common patterns, but add dependencies and learning curves. The key question is whether the abstraction helps or hinders your specific use case.
The Major Players
LangChain: The Swiss Army Knife
LangChain is the most popular general-purpose orchestration framework. It provides abstractions for chains (sequential operations), agents (LLMs that choose actions), memory systems, document loaders, and tool integration.
Simple chain example:
from langchain.chat_models import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.schema.output_parser import StrOutputParser
# Create a chain: prompt -> model -> parse output
prompt = ChatPromptTemplate.from_template("Translate {text} to {language}")
model = ChatOpenAI(model="gpt-4")
chain = prompt | model | StrOutputParser()
# Run it
result = chain.invoke({"text": "Hello world", "language": "Spanish"})
# Output: "Hola mundo"
RAG example:
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains import RetrievalQA
# Load and chunk documents
splitter = RecursiveCharacterTextSplitter(chunk_size=1000)
chunks = splitter.split_documents(documents)
# Create vector store
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(chunks, embeddings)
# Create QA chain
qa = RetrievalQA.from_chain_type(
llm=ChatOpenAI(),
retriever=vectorstore.as_retriever(),
return_source_documents=True
)
answer = qa({"query": "What are the main findings?"})
Pros:
- Extremely comprehensive - covers most use cases
- Large ecosystem of integrations (100+ LLM providers, vector stores, tools)
- Active community and frequent updates
- Good for rapid prototyping
Cons:
- Complex API with frequent breaking changes
- Heavy abstraction can obscure what's happening
- Performance overhead from generic abstractions
- Difficult to debug when things go wrong
- Documentation quality varies
Best for: Rapid prototyping, teams exploring multiple approaches, projects needing many integrations.
LlamaIndex: Built for RAG
LlamaIndex (formerly GPT Index) specializes in indexing and querying over documents. It excels at loading data from various sources, creating efficient indexes, and powering question-answering systems.
Basic RAG example:
from llama_index import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms import OpenAI
# Load documents
documents = SimpleDirectoryReader("./data").load_data()
# Create index
index = VectorStoreIndex.from_documents(documents)
# Query
query_engine = index.as_query_engine(llm=OpenAI(model="gpt-4"))
response = query_engine.query("Summarize the key points")
print(response.response)
print(response.source_nodes) # See which chunks were used
Advanced indexing:
from llama_index import TreeIndex, KeywordTableIndex
from llama_index.composability import ComposableGraph
# Create multiple indexes for different query types
vector_index = VectorStoreIndex.from_documents(docs)
tree_index = TreeIndex.from_documents(docs)
keyword_index = KeywordTableIndex.from_documents(docs)
# Compose them
graph = ComposableGraph.from_indices(
TreeIndex,
[vector_index, tree_index, keyword_index],
index_summaries=["semantic search", "hierarchical", "keywords"]
)
query_engine = graph.as_query_engine()
response = query_engine.query("Complex question requiring multiple strategies")
Pros:
- Laser-focused on document indexing and retrieval
- Cleaner, more stable API than LangChain
- Excellent data connectors (databases, APIs, file formats)
- Built-in observability for understanding retrieval
- More opinionated = fewer decisions
Cons:
- Less flexible for non-RAG use cases
- Smaller ecosystem than LangChain
- Less suitable for agent-based workflows
- Fewer tool integrations
Best for: Document QA systems, knowledge bases, semantic search applications, teams focused on retrieval quality.
Haystack: Production-Grade Pipelines
Haystack, developed by deepset, emphasizes production-ready search and NLP pipelines. It's more opinionated about architecture and includes strong support for hybrid search and deployment.
RAG pipeline example:
from haystack import Pipeline
from haystack.nodes import EmbeddingRetriever, PromptNode, PromptTemplate
from haystack.document_stores import FAISSDocumentStore
# Setup document store
document_store = FAISSDocumentStore(embedding_dim=1536)
document_store.write_documents(documents)
# Create retriever
retriever = EmbeddingRetriever(
document_store=document_store,
embedding_model="text-embedding-ada-002",
model_format="openai"
)
document_store.update_embeddings(retriever)
# Create prompt
template = PromptTemplate(
prompt="Given the context: {join(documents)}\n\nAnswer: {query}",
output_parser={"type": "AnswerParser"}
)
prompt_node = PromptNode(
model_name_or_path="gpt-4",
default_prompt_template=template
)
# Build pipeline
pipeline = Pipeline()
pipeline.add_node(component=retriever, name="Retriever", inputs=["Query"])
pipeline.add_node(component=prompt_node, name="PromptNode", inputs=["Retriever"])
# Run
result = pipeline.run(query="What are the findings?")
Pros:
- Designed for production from the ground up
- Strong support for hybrid search (keyword + semantic)
- Built-in evaluation and benchmarking tools
- RESTful API for deployment
- Clear pipeline abstraction
Cons:
- Steeper learning curve
- More rigid structure
- Smaller community than LangChain/LlamaIndex
- Less suitable for rapid experimentation
Best for: Production deployments, enterprise search systems, teams prioritizing stability and monitoring.
Custom Code: The Control Option
Sometimes the best framework is no framework. Simple use cases often don't justify the complexity.
Simple RAG without frameworks:
import openai
from sentence_transformers import SentenceTransformer
import numpy as np
# Setup
model = SentenceTransformer('all-MiniLM-L6-v2')
def retrieve(query, documents, top_k=3):
# Embed query and documents
query_emb = model.encode([query])
doc_embs = model.encode(documents)
# Cosine similarity
scores = np.dot(query_emb, doc_embs.T)[0]
top_indices = np.argsort(scores)[-top_k:][::-1]
return [documents[i] for i in top_indices]
def answer_question(query, documents):
# Retrieve relevant context
context = retrieve(query, documents)
# Generate answer
prompt = f"Context: {' '.join(context)}\n\nQuestion: {query}\nAnswer:"
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}]
)
return response.choices[0].message.content
# Use it
docs = ["Python is a programming language.", "Python was created by Guido van Rossum."]
answer = answer_question("Who created Python?", docs)
Pros:
- Complete control and transparency
- No external dependencies
- Easy to debug and modify
- No framework learning curve
- Better performance (no abstraction overhead)
Cons:
- Need to implement everything yourself
- Risk of bugs in custom code
- Must stay current with best practices
- Harder to onboard new team members
Best for: Simple use cases, teams with strong engineering skills, performance-critical applications, projects with unique requirements.
Comparison Framework
| Criterion | LangChain | LlamaIndex | Haystack | Custom |
|---|---|---|---|---|
| Learning Curve | Steep | Moderate | Steep | None |
| Flexibility | High | Medium | Medium | Maximum |
| RAG Quality | Good | Excellent | Excellent | Depends |
| Agent Support | Excellent | Limited | Limited | DIY |
| Production Ready | Requires work | Moderate | Excellent | Depends |
| Community | Largest | Large | Medium | N/A |
| Stability | Frequent changes | More stable | Very stable | You control |
| Best Use Case | Exploration | Document QA | Enterprise search | Simple/unique |
Decision Framework
Use LangChain if:
- You're exploring multiple approaches rapidly
- You need agent capabilities (tool use, reasoning loops)
- You want maximum ecosystem integration
- You have time to deal with API changes
Use LlamaIndex if:
- Your primary use case is RAG/document QA
- You want a focused, stable API
- Retrieval quality is your top priority
- You need strong data connector support
Use Haystack if:
- You're building production search systems
- You need hybrid search (keyword + semantic)
- Evaluation and monitoring are critical
- You want RESTful deployment built-in
Use custom code if:
- Your use case is simple (single LLM call, basic retrieval)
- You have specific performance requirements
- Your team has strong engineering capabilities
- You need complete control and transparency
Common Pitfalls
Over-abstraction: Frameworks can hide what's actually happening. When debugging, you're fighting both your code and the framework's abstractions. Start simple and add complexity only when needed.
Framework lock-in: Deep integration with a framework makes switching costly. Keep your business logic separate from framework-specific code.
Version churn: LangChain especially moves fast with breaking changes. Pin versions and test thoroughly before upgrading.
Performance overhead: Generic abstractions have costs. For high-throughput systems, measure whether framework overhead is acceptable.
Debugging difficulty: Stack traces through framework code are painful. Invest in good logging and understand the framework's execution model.
Practical Recommendations
Start small: Begin with custom code or the simplest framework approach. Add abstraction when complexity justifies it.
Understand the basics: Learn how embeddings, vector search, and prompt engineering work before adopting frameworks. This makes debugging possible.
Keep it modular: Separate your business logic from framework code. Use dependency injection so you can swap implementations.
Monitor everything: Frameworks make it easy to build complex systems. Make sure you can observe what's happening in production.
Test without the framework: Write integration tests that verify behavior without relying on framework internals. This protects against breaking changes.
Read the source: When documentation fails or bugs appear, framework source code is your friend. It's often clearer than the docs.
The Bottom Line
Orchestration frameworks solve real problems, but they're not magic. LangChain offers breadth, LlamaIndex offers RAG excellence, Haystack offers production maturity. Custom code offers control.
The right choice depends on your use case complexity, team skills, and time constraints. Start with the simplest thing that works, measure carefully, and add abstraction only when the benefit is clear. The best framework is the one that disappears into the background and lets you focus on building value for users.
Frequently Asked Questions
Should I use LangChain or LlamaIndex for my RAG application?
If your primary use case is document question-answering and retrieval, LlamaIndex is the better choice with its focused, stable API. If you need agent capabilities, tool integration, and broader flexibility beyond RAG, LangChain offers more options. Many teams prototype with LangChain and refine with LlamaIndex.
When should I build custom orchestration instead of using a framework?
Build custom when your use case is simple (single LLM call with basic retrieval), you have strict performance requirements, or you need complete control for security and compliance. Frameworks add dependencies and abstraction overhead that may not be justified for straightforward applications.
How do I avoid vendor lock-in with orchestration frameworks?
Keep your business logic separate from framework-specific code. Use dependency injection so you can swap implementations. Write integration tests that verify behavior without relying on framework internals. This protects you when frameworks release breaking changes.
Is LangChain too complex for beginners?
LangChain has a steep learning curve and frequent API changes that can frustrate beginners. Start with the simplest approach that works for your use case, even plain API calls. Add framework abstraction only when complexity justifies it. LlamaIndex is generally easier to start with for RAG-focused projects.
Was this guide helpful?
Your feedback helps us improve our guides
About the Authors
Marcin Piekarski· Frontend Lead & AI Educator
Marcin is a Frontend Lead with 20+ years in tech. Currently building headless ecommerce at Harvey Norman (Next.js, Node.js, GraphQL). He created Field Guide to AI to help others understand AI tools practically—without the jargon.
Credentials & Experience:
- 20+ years web development experience
- Frontend Lead at Harvey Norman (10 years)
- Worked with: Gumtree, CommBank, Woolworths, Optus, M&C Saatchi
- Runs AI workshops for teams
- Founder of builtweb.com.au
- Daily AI tools user: ChatGPT, Claude, Gemini, AI coding assistants
- Specializes in React ecosystem: React, Next.js, Node.js
Areas of Expertise:
Prism AI· AI Research & Writing Assistant
Prism AI is the AI ghostwriter behind Field Guide to AI—a collaborative ensemble of frontier models (Claude, ChatGPT, Gemini, and others) that assist with research, drafting, and content synthesis. Like light through a prism, human expertise is refracted through multiple AI perspectives to create clear, comprehensive guides. All AI-generated content is reviewed, fact-checked, and refined by Marcin before publication.
Transparency Note: All AI-assisted content is thoroughly reviewed, fact-checked, and refined by Marcin Piekarski before publication.
Key Terms Used in This Guide
LangChain
An open-source framework for building applications powered by large language models, providing tools for chaining prompts, managing memory, connecting to external tools, and creating AI agents.
LlamaIndex
An open-source framework designed for building LLM applications that connect to your own data, with particular strength in retrieval-augmented generation (RAG) systems and data indexing.
Orchestration
The process of coordinating multiple AI components—model calls, tool integrations, data retrieval, and decision logic—into a coherent workflow that accomplishes complex multi-step tasks.
AI (Artificial Intelligence)
Making machines perform tasks that typically require human intelligence—like understanding language, recognizing patterns, or making decisions.
Related Guides
Context Management: Handling Long Conversations and Documents
IntermediateMaster context window management for AI. Learn strategies for long conversations, document processing, memory systems, and context optimization.
12 min readDeployment Patterns: Serverless, Edge, and Containers
IntermediateHow to deploy AI systems in production. Compare serverless, edge, container, and self-hosted options.
13 min readFine-Tuning vs RAG: Which Should You Use?
IntermediateCompare fine-tuning and RAG to customize AI. Learn when each approach works best, how they differ, and how to combine them.
12 min read