Intermediate12 min read

Orchestration Options: LangChain, LlamaIndex, and Beyond

Frameworks for building AI workflows. Compare LangChain, LlamaIndex, Haystack, and custom solutions.

orchestrationframeworksLangChainLlamaIndexworkflows

TL;DR

Orchestration frameworks like LangChain, LlamaIndex, and Haystack simplify building complex AI workflows by providing abstractions for common patterns like retrieval, chaining, and agent behaviors. Each has different strengths: LangChain for general-purpose chains and agents, LlamaIndex for document indexing and RAG, Haystack for production search pipelines. Start with custom code for simple use cases, adopt frameworks when complexity grows, and avoid over-abstraction by understanding what's happening under the hood.

What Orchestration Frameworks Do

Building AI applications involves more than calling openai.ChatCompletion.create(). Real systems need to:

Chain multiple LLM calls together with context flow
Retrieve relevant documents before generating answers (RAG)
Parse outputs and call external tools based on model decisions
Manage conversation memory across turns
Handle errors, retries, and rate limits
Log and monitor complex workflows

Orchestration frameworks abstract these patterns into reusable components. They save you from reinventing common patterns, but add dependencies and learning curves. The key question is whether the abstraction helps or hinders your specific use case.

The Major Players

LangChain: The Swiss Army Knife

LangChain is the most popular general-purpose orchestration framework. It provides abstractions for chains (sequential operations), agents (LLMs that choose actions), memory systems, document loaders, and tool integration.

Simple chain example:

from langchain.chat_models import ChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.schema.output_parser import StrOutputParser

# Create a chain: prompt -> model -> parse output
prompt = ChatPromptTemplate.from_template("Translate {text} to {language}")
model = ChatOpenAI(model="gpt-4")
chain = prompt | model | StrOutputParser()

# Run it
result = chain.invoke({"text": "Hello world", "language": "Spanish"})
# Output: "Hola mundo"

RAG example:

from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains import RetrievalQA

# Load and chunk documents
splitter = RecursiveCharacterTextSplitter(chunk_size=1000)
chunks = splitter.split_documents(documents)

# Create vector store
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(chunks, embeddings)

# Create QA chain
qa = RetrievalQA.from_chain_type(
    llm=ChatOpenAI(),
    retriever=vectorstore.as_retriever(),
    return_source_documents=True
)

answer = qa({"query": "What are the main findings?"})

Pros:

Extremely comprehensive - covers most use cases
Large ecosystem of integrations (100+ LLM providers, vector stores, tools)
Active community and frequent updates
Good for rapid prototyping

Cons:

Complex API with frequent breaking changes
Heavy abstraction can obscure what's happening
Performance overhead from generic abstractions
Difficult to debug when things go wrong
Documentation quality varies

Best for: Rapid prototyping, teams exploring multiple approaches, projects needing many integrations.

LlamaIndex: Built for RAG

LlamaIndex (formerly GPT Index) specializes in indexing and querying over documents. It excels at loading data from various sources, creating efficient indexes, and powering question-answering systems.

Basic RAG example:

from llama_index import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms import OpenAI

# Load documents
documents = SimpleDirectoryReader("./data").load_data()

# Create index
index = VectorStoreIndex.from_documents(documents)

# Query
query_engine = index.as_query_engine(llm=OpenAI(model="gpt-4"))
response = query_engine.query("Summarize the key points")

print(response.response)
print(response.source_nodes)  # See which chunks were used

Advanced indexing:

from llama_index import TreeIndex, KeywordTableIndex
from llama_index.composability import ComposableGraph

# Create multiple indexes for different query types
vector_index = VectorStoreIndex.from_documents(docs)
tree_index = TreeIndex.from_documents(docs)
keyword_index = KeywordTableIndex.from_documents(docs)

# Compose them
graph = ComposableGraph.from_indices(
    TreeIndex,
    [vector_index, tree_index, keyword_index],
    index_summaries=["semantic search", "hierarchical", "keywords"]
)

query_engine = graph.as_query_engine()
response = query_engine.query("Complex question requiring multiple strategies")

Pros:

Laser-focused on document indexing and retrieval
Cleaner, more stable API than LangChain
Excellent data connectors (databases, APIs, file formats)
Built-in observability for understanding retrieval
More opinionated = fewer decisions

Cons:

Less flexible for non-RAG use cases
Smaller ecosystem than LangChain
Less suitable for agent-based workflows
Fewer tool integrations

Best for: Document QA systems, knowledge bases, semantic search applications, teams focused on retrieval quality.

Haystack: Production-Grade Pipelines

Haystack, developed by deepset, emphasizes production-ready search and NLP pipelines. It's more opinionated about architecture and includes strong support for hybrid search and deployment.

RAG pipeline example:

from haystack import Pipeline
from haystack.nodes import EmbeddingRetriever, PromptNode, PromptTemplate
from haystack.document_stores import FAISSDocumentStore

# Setup document store
document_store = FAISSDocumentStore(embedding_dim=1536)
document_store.write_documents(documents)

# Create retriever
retriever = EmbeddingRetriever(
    document_store=document_store,
    embedding_model="text-embedding-ada-002",
    model_format="openai"
)
document_store.update_embeddings(retriever)

# Create prompt
template = PromptTemplate(
    prompt="Given the context: {join(documents)}\n\nAnswer: {query}",
    output_parser={"type": "AnswerParser"}
)

prompt_node = PromptNode(
    model_name_or_path="gpt-4",
    default_prompt_template=template
)

# Build pipeline
pipeline = Pipeline()
pipeline.add_node(component=retriever, name="Retriever", inputs=["Query"])
pipeline.add_node(component=prompt_node, name="PromptNode", inputs=["Retriever"])

# Run
result = pipeline.run(query="What are the findings?")

Pros:

Designed for production from the ground up
Strong support for hybrid search (keyword + semantic)
Built-in evaluation and benchmarking tools
RESTful API for deployment
Clear pipeline abstraction

Cons:

Steeper learning curve
More rigid structure
Smaller community than LangChain/LlamaIndex
Less suitable for rapid experimentation

Best for: Production deployments, enterprise search systems, teams prioritizing stability and monitoring.

Custom Code: The Control Option

Sometimes the best framework is no framework. Simple use cases often don't justify the complexity.

Simple RAG without frameworks:

import openai
from sentence_transformers import SentenceTransformer
import numpy as np

# Setup
model = SentenceTransformer('all-MiniLM-L6-v2')

def retrieve(query, documents, top_k=3):
    # Embed query and documents
    query_emb = model.encode([query])
    doc_embs = model.encode(documents)

    # Cosine similarity
    scores = np.dot(query_emb, doc_embs.T)[0]
    top_indices = np.argsort(scores)[-top_k:][::-1]

    return [documents[i] for i in top_indices]

def answer_question(query, documents):
    # Retrieve relevant context
    context = retrieve(query, documents)

    # Generate answer
    prompt = f"Context: {' '.join(context)}\n\nQuestion: {query}\nAnswer:"

    response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=[{"role": "user", "content": prompt}]
    )

    return response.choices[0].message.content

# Use it
docs = ["Python is a programming language.", "Python was created by Guido van Rossum."]
answer = answer_question("Who created Python?", docs)

Pros:

Complete control and transparency
No external dependencies
Easy to debug and modify
No framework learning curve
Better performance (no abstraction overhead)

Cons:

Need to implement everything yourself
Risk of bugs in custom code
Must stay current with best practices
Harder to onboard new team members

Best for: Simple use cases, teams with strong engineering skills, performance-critical applications, projects with unique requirements.

Comparison Framework

Criterion	LangChain	LlamaIndex	Haystack	Custom
Learning Curve	Steep	Moderate	Steep	None
Flexibility	High	Medium	Medium	Maximum
RAG Quality	Good	Excellent	Excellent	Depends
Agent Support	Excellent	Limited	Limited	DIY
Production Ready	Requires work	Moderate	Excellent	Depends
Community	Largest	Large	Medium	N/A
Stability	Frequent changes	More stable	Very stable	You control
Best Use Case	Exploration	Document QA	Enterprise search	Simple/unique

Decision Framework

Use LangChain if:

You're exploring multiple approaches rapidly
You need agent capabilities (tool use, reasoning loops)
You want maximum ecosystem integration
You have time to deal with API changes

Use LlamaIndex if:

Your primary use case is RAG/document QA
You want a focused, stable API
Retrieval quality is your top priority
You need strong data connector support

Use Haystack if:

You're building production search systems
You need hybrid search (keyword + semantic)
Evaluation and monitoring are critical
You want RESTful deployment built-in

Use custom code if:

Your use case is simple (single LLM call, basic retrieval)
You have specific performance requirements
Your team has strong engineering capabilities
You need complete control and transparency

Common Pitfalls

Over-abstraction: Frameworks can hide what's actually happening. When debugging, you're fighting both your code and the framework's abstractions. Start simple and add complexity only when needed.

Framework lock-in: Deep integration with a framework makes switching costly. Keep your business logic separate from framework-specific code.

Version churn: LangChain especially moves fast with breaking changes. Pin versions and test thoroughly before upgrading.

Performance overhead: Generic abstractions have costs. For high-throughput systems, measure whether framework overhead is acceptable.

Debugging difficulty: Stack traces through framework code are painful. Invest in good logging and understand the framework's execution model.

Practical Recommendations

Start small: Begin with custom code or the simplest framework approach. Add abstraction when complexity justifies it.
Understand the basics: Learn how embeddings, vector search, and prompt engineering work before adopting frameworks. This makes debugging possible.
Keep it modular: Separate your business logic from framework code. Use dependency injection so you can swap implementations.
Monitor everything: Frameworks make it easy to build complex systems. Make sure you can observe what's happening in production.
Test without the framework: Write integration tests that verify behavior without relying on framework internals. This protects against breaking changes.
Read the source: When documentation fails or bugs appear, framework source code is your friend. It's often clearer than the docs.

The Bottom Line

Orchestration frameworks solve real problems, but they're not magic. LangChain offers breadth, LlamaIndex offers RAG excellence, Haystack offers production maturity. Custom code offers control.

The right choice depends on your use case complexity, team skills, and time constraints. Start with the simplest thing that works, measure carefully, and add abstraction only when the benefit is clear. The best framework is the one that disappears into the background and lets you focus on building value for users.

Was this guide helpful?

Your feedback helps us improve our guides

Key Terms Used in This Guide

LangChain

An open-source framework for building applications with LLMs, providing tools for chaining prompts, managing memory, connecting to external tools, and building AI agents.

LlamaIndex

An open-source framework for building LLM applications with data connectors, indexing, and retrieval—particularly strong for RAG (Retrieval Augmented Generation) systems.

Orchestration

Coordinating multiple AI calls, tools, and logic to accomplish complex tasks that require multiple steps.

AI (Artificial Intelligence)

Making machines perform tasks that typically require human intelligence—like understanding language, recognizing patterns, or making decisions.

Related Guides

Context Management: Handling Long Conversations and Documents

Intermediate

Master context window management for AI. Learn strategies for long conversations, document processing, memory systems, and context optimization.

12 min read

Deployment Patterns: Serverless, Edge, and Containers

Intermediate

How to deploy AI systems in production. Compare serverless, edge, container, and self-hosted options.

13 min read

Fine-Tuning vs RAG: Which Should You Use?

Intermediate

Compare fine-tuning and RAG to customize AI. Learn when each approach works best, how they differ, and how to combine them.

12 min read