cheat sheet
Haystack 2.x
Build production-grade LLM pipelines with Haystack 2.x. Covers components, the pipeline graph, indexing and querying, retrievers, generators, RAG patterns, and evaluation.
Haystack 2.x — Pipelines for LLM Applications
What it is
Haystack is an open-source Python framework from deepset for building production LLM applications around an explicit pipeline graph. Every step — document loading, splitting, embedding, retrieval, prompt building, generation, evaluation — is a typed Component with declared input and output sockets, and a Pipeline is a directed graph that connects those sockets. Haystack 2.x (released late 2024) is a ground-up rewrite of the original 1.x API: components are dataclass-like Python classes, pipelines are explicitly wired, and the framework is built for both indexing (writing documents) and querying (reading them) with the same primitives.
Compared to LangChain's LCEL pipe operator or LlamaIndex's query engines, Haystack's mental model is closer to a dataflow DAG: you wire component_a.output_socket → component_b.input_socket and the framework validates the connection types up-front. This makes Haystack pipelines easy to serialise to YAML, deploy as REST endpoints with hayhooks, and reason about in code review.
Install
Haystack ships as haystack-ai on PyPI (the older 1.x package was farm-haystack).
pip install haystack-ai
pip install "haystack-ai[chroma]"
pip install "haystack-ai[qdrant]"
pip install sentence-transformers
pip install anthropic-haystack
Output:
Successfully installed haystack-ai-2.x.x ...
haystack-aivsfarm-haystack— only install one.farm-haystackis the legacy 1.x package; 2.x lives inhaystack-aiwithfrom haystack import ...imports. Mixing them in the same environment causes import shadowing.
Quick example — indexing + querying
A minimal end-to-end RAG flow: index three documents into an in-memory store, then ask a question against them. The two pipelines share the same DocumentStore instance.
from haystack import Document, Pipeline
from haystack.components.embedders import SentenceTransformersDocumentEmbedder, SentenceTransformersTextEmbedder
from haystack.components.writers import DocumentWriter
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator
from haystack.document_stores.in_memory import InMemoryDocumentStore
import os
store = InMemoryDocumentStore()
docs = [
Document(content="Haystack 2.x pipelines are directed graphs of typed components."),
Document(content="Each component declares input and output sockets."),
Document(content="Pipelines can be serialised to YAML and deployed via hayhooks."),
]
index_pipe = Pipeline()
index_pipe.add_component("embedder", SentenceTransformersDocumentEmbedder(model="BAAI/bge-small-en-v1.5"))
index_pipe.add_component("writer", DocumentWriter(document_store=store))
index_pipe.connect("embedder.documents", "writer.documents")
index_pipe.run({"embedder": {"documents": docs}})
template = """Answer the question using only the context.
Context:
{% for d in documents %}- {{ d.content }}
{% endfor %}
Question: {{ question }}
Answer:"""
query_pipe = Pipeline()
query_pipe.add_component("text_embedder", SentenceTransformersTextEmbedder(model="BAAI/bge-small-en-v1.5"))
query_pipe.add_component("retriever", InMemoryEmbeddingRetriever(document_store=store))
query_pipe.add_component("prompt", PromptBuilder(template=template))
query_pipe.add_component("llm", OpenAIGenerator(api_key_env_var="OPENAI_API_KEY", model="gpt-4o-mini"))
query_pipe.connect("text_embedder.embedding", "retriever.query_embedding")
query_pipe.connect("retriever.documents", "prompt.documents")
query_pipe.connect("prompt.prompt", "llm.prompt")
q = "What can Haystack pipelines be serialised to?"
result = query_pipe.run({"text_embedder": {"text": q}, "prompt": {"question": q}})
print(result["llm"]["replies"][0])
Output:
Haystack pipelines can be serialised to YAML and deployed via hayhooks.
When / why to use it
- Building production RAG where you want explicit, debuggable component boundaries instead of nested chains.
- Indexing pipelines and query pipelines that need to share a
DocumentStoreconfiguration. - Serialising LLM apps to YAML so non-engineers can review or edit pipeline topology.
- Deploying pipelines as REST endpoints with hayhooks.
- Evaluation-driven RAG: the framework has first-class
Evaluatorcomponents for context relevance, faithfulness, and SAS. - Mixing closed (OpenAI, Anthropic, Cohere) and open (Hugging Face, vLLM, Ollama) models behind a uniform
Generatorinterface.
Common pitfalls
Socket name mismatch —
Pipeline.connect("a.foo", "b.bar")fails fast atconnect()time ifahas no output namedfooorbhas no input namedbar. The error includes the available socket names — read them carefully rather than guessing.
Embedder model mismatch between index and query — if you index with
bge-small-en-v1.5and query withall-MiniLM-L6-v2, vectors live in different spaces and retrieval returns garbage. Always use the samemodel=string inSentenceTransformersDocumentEmbedderandSentenceTransformersTextEmbedder.
PromptBuilderis Jinja2 —{{ var }}and{% for %}are Jinja, not Python f-strings. Forgetting to escape literal{causes silent template errors.
In-memory stores are not persistent —
InMemoryDocumentStorelives only for the process lifetime. For anything beyond a notebook, useChromaDocumentStore,QdrantDocumentStore,WeaviateDocumentStore, orElasticsearchDocumentStore.
Call
pipeline.draw("pipeline.png")to render a graphviz diagram of the wiring. Invaluable for code review of multi-stage RAG pipelines.
pipeline.dumps()returns YAML;Pipeline.loads(yaml_str)rebuilds the pipeline. Commit the YAML to git andloads()at startup to keep topology declarative.
Components — the atom of Haystack
A Component is a Python class decorated with @component, exposing run(...) whose parameter names become input sockets and whose return-typed dict becomes output sockets. Components are reusable across pipelines.
from haystack import component
from typing import List
@component
class UppercaseTagger:
"""Tag each document by uppercasing the first 20 characters of its content."""
@component.output_types(documents=List["Document"])
def run(self, documents: list):
for d in documents:
d.meta["tag"] = d.content[:20].upper()
return {"documents": documents}
The @component.output_types(...) decorator names and types each output socket. The run signature names each input socket and uses standard type hints — Haystack uses these to validate connect() calls.
Document — the data primitive
Documents carry content (text or bytes), meta (dict for filters and provenance), id (auto-generated SHA-256), and an embedding vector once embedded.
from haystack import Document
doc = Document(content="Hello world", meta={"source": "readme.md", "section": "intro"})
print(doc.id, doc.meta)
Output:
e0c9035898dd52fc65c41454cec9c4d2611bfb37 {'source': 'readme.md', 'section': 'intro'}
Document.meta is the standard place to put filtering metadata — every retriever accepts a filters= argument that operates on meta.
Pipeline — wiring components together
A Pipeline is a directed graph. add_component(name, instance) registers a component under a unique name; connect("a.out", "b.in") wires sockets. Inputs that are not connected to any upstream socket must be supplied at run() time via the {component_name: {socket_name: value}} dict.
from haystack import Pipeline
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator
p = Pipeline()
p.add_component("prompt", PromptBuilder(template="Define {{ word }} in one sentence."))
p.add_component("llm", OpenAIGenerator(model="gpt-4o-mini"))
p.connect("prompt.prompt", "llm.prompt")
print(p.run({"prompt": {"word": "monad"}})["llm"]["replies"][0])
Output:
A monad is a design pattern that wraps a value and a function for chaining computations
while controlling side effects, used heavily in functional programming.
Indexing pipelines
Indexing is the one-time (or incremental) flow that turns raw files into embedded Documents in a store. A typical chain: file source → converter → cleaner → splitter → embedder → writer.
from haystack import Pipeline
from haystack.components.converters import TextFileToDocument, PyPDFToDocument
from haystack.components.preprocessors import DocumentSplitter, DocumentCleaner
from haystack.components.embedders import SentenceTransformersDocumentEmbedder
from haystack.components.writers import DocumentWriter
from haystack.document_stores.in_memory import InMemoryDocumentStore
store = InMemoryDocumentStore()
idx = Pipeline()
idx.add_component("converter", TextFileToDocument())
idx.add_component("cleaner", DocumentCleaner(remove_empty_lines=True))
idx.add_component("splitter", DocumentSplitter(split_by="word", split_length=200, split_overlap=20))
idx.add_component("embedder", SentenceTransformersDocumentEmbedder(model="BAAI/bge-small-en-v1.5"))
idx.add_component("writer", DocumentWriter(document_store=store))
idx.connect("converter.documents", "cleaner.documents")
idx.connect("cleaner.documents", "splitter.documents")
idx.connect("splitter.documents", "embedder.documents")
idx.connect("embedder.documents", "writer.documents")
result = idx.run({"converter": {"sources": ["./docs/intro.txt", "./docs/api.txt"]}})
print(f"Wrote {result['writer']['documents_written']} chunks")
Output:
Wrote 42 chunks
Choosing a splitter
DocumentSplitter(split_by=...) accepts "word", "sentence", "passage", "page", or "function" (custom callable). split_overlap overlaps consecutive chunks — set ~10–20% of split_length to preserve context across boundaries.
For markdown and code, prefer
split_by="passage"(splits on\n\n) over"word"— it respects natural section boundaries.
Retrievers
Retrievers fetch the top-k documents for a query. Haystack has retriever components per store and per retrieval mode (BM25, dense embedding, hybrid).
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever, InMemoryEmbeddingRetriever
from haystack.document_stores.in_memory import InMemoryDocumentStore
store = InMemoryDocumentStore()
bm25 = InMemoryBM25Retriever(document_store=store, top_k=5)
embedding = InMemoryEmbeddingRetriever(document_store=store, top_k=5, filters={"field": "meta.section", "operator": "==", "value": "api"})
For other stores, swap in the matching retriever:
| Store | Embedding retriever |
|---|---|
| Chroma | ChromaEmbeddingRetriever |
| Qdrant | QdrantEmbeddingRetriever |
| Weaviate | WeaviateEmbeddingRetriever |
| Elasticsearch | ElasticsearchEmbeddingRetriever |
| pgvector | PgvectorEmbeddingRetriever |
Hybrid retrieval and reranking
Combine BM25 and dense retrieval with a DocumentJoiner, then rerank with a cross-encoder.
from haystack import Pipeline
from haystack.components.joiners import DocumentJoiner
from haystack.components.rankers import TransformersSimilarityRanker
hybrid = Pipeline()
hybrid.add_component("text_embedder", SentenceTransformersTextEmbedder(model="BAAI/bge-small-en-v1.5"))
hybrid.add_component("bm25", InMemoryBM25Retriever(document_store=store, top_k=10))
hybrid.add_component("dense", InMemoryEmbeddingRetriever(document_store=store, top_k=10))
hybrid.add_component("joiner", DocumentJoiner(join_mode="reciprocal_rank_fusion"))
hybrid.add_component("ranker", TransformersSimilarityRanker(model="BAAI/bge-reranker-base", top_k=5))
hybrid.connect("text_embedder.embedding", "dense.query_embedding")
hybrid.connect("bm25.documents", "joiner.documents")
hybrid.connect("dense.documents", "joiner.documents")
hybrid.connect("joiner.documents", "ranker.documents")
DocumentJoiner(join_mode="reciprocal_rank_fusion") is the standard hybrid fusion algorithm; "concatenate" or "merge" are alternatives when scores are comparable.
Generators
Generators wrap LLMs. Single-turn generators return replies: list[str]; chat generators accept and return ChatMessage objects.
from haystack.components.generators import OpenAIGenerator, HuggingFaceLocalGenerator
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.dataclasses import ChatMessage
text_gen = OpenAIGenerator(api_key_env_var="OPENAI_API_KEY", model="gpt-4o-mini")
print(text_gen.run(prompt="Name three databases.")["replies"][0])
chat_gen = OpenAIChatGenerator(model="gpt-4o-mini")
msgs = [
ChatMessage.from_system("You are concise."),
ChatMessage.from_user("Define vector similarity."),
]
print(chat_gen.run(messages=msgs)["replies"][0].text)
Output:
PostgreSQL, MongoDB, and SQLite.
Vector similarity measures how close two vectors are in an embedding space, typically via cosine or dot product.
For local models via Hugging Face Transformers:
local = HuggingFaceLocalGenerator(model="microsoft/Phi-3-mini-4k-instruct", task="text-generation")
local.warm_up()
print(local.run(prompt="What is RAG?")["replies"][0])
All generators support a streaming callback via
streaming_callback=. Useprint_streaming_chunkfromhaystack.components.generators.utilsfor stdout streaming during development.
RAG with sources and citations
Return both the answer and the source documents so the UI can render citations.
from haystack import Pipeline
from haystack.components.builders import PromptBuilder, AnswerBuilder
from haystack.components.embedders import SentenceTransformersTextEmbedder
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack.components.generators import OpenAIGenerator
template = """Answer using only the context. If the answer is not present, say "I don't know".
{% for d in documents %}
[{{ loop.index }}] {{ d.content }}
{% endfor %}
Question: {{ question }}
"""
rag = Pipeline()
rag.add_component("text_embedder", SentenceTransformersTextEmbedder(model="BAAI/bge-small-en-v1.5"))
rag.add_component("retriever", InMemoryEmbeddingRetriever(document_store=store, top_k=4))
rag.add_component("prompt", PromptBuilder(template=template))
rag.add_component("llm", OpenAIGenerator(model="gpt-4o-mini"))
rag.add_component("answer", AnswerBuilder())
rag.connect("text_embedder.embedding", "retriever.query_embedding")
rag.connect("retriever.documents", "prompt.documents")
rag.connect("retriever.documents", "answer.documents")
rag.connect("prompt.prompt", "llm.prompt")
rag.connect("llm.replies", "answer.replies")
q = "What does DocumentSplitter do?"
out = rag.run({"text_embedder": {"text": q}, "prompt": {"question": q}, "answer": {"query": q}})
ans = out["answer"]["answers"][0]
print(ans.data)
for d in ans.documents:
print(f" - {d.meta.get('source', '?')}: {d.content[:60]}...")
Output:
DocumentSplitter chunks a document by word, sentence, passage, or page.
- splitter.md: DocumentSplitter accepts split_by, split_length, split_overlap...
- splitter.md: For markdown documents, split_by="passage" preserves boundaries...
AnswerBuilder packages the LLM reply and the retrieved documents into an Answer object, the standard return type for RAG pipelines.
Tool calling and agents
Haystack supports OpenAI-style tool calling through OpenAIChatGenerator plus ToolInvoker. Define a Tool from any Python function and add it to the chat generator.
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.components.tools import ToolInvoker
from haystack.tools import Tool
from haystack.dataclasses import ChatMessage
def get_weather(city: str) -> str:
"""Return the current weather for the given city."""
return f"In {city}: 21 C and sunny."
weather_tool = Tool(
name="get_weather",
description="Return current weather for a city.",
parameters={"type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"]},
function=get_weather,
)
chat = OpenAIChatGenerator(model="gpt-4o-mini", tools=[weather_tool])
invoker = ToolInvoker(tools=[weather_tool])
msgs = [ChatMessage.from_user("What is the weather in Amsterdam?")]
out = chat.run(messages=msgs)
if out["replies"][0].tool_calls:
tool_results = invoker.run(messages=out["replies"])["tool_messages"]
final = chat.run(messages=msgs + out["replies"] + tool_results)
print(final["replies"][0].text)
Output:
The current weather in Amsterdam is 21 C and sunny.
For multi-step agents that loop until done, wrap this in Agent from haystack.components.agents.Agent, which auto-handles the tool-call/tool-result cycle.
Evaluation
Haystack ships several evaluators that score pipeline output against ground-truth or against retrieved context.
from haystack.components.evaluators import (
ContextRelevanceEvaluator,
FaithfulnessEvaluator,
SASEvaluator,
)
ctx_eval = ContextRelevanceEvaluator()
result = ctx_eval.run(
questions=["What does Haystack do?"],
contexts=[["Haystack builds LLM pipelines."]],
)
print(result["individual_scores"])
faith = FaithfulnessEvaluator()
print(faith.run(
questions=["What does Haystack do?"],
contexts=[["Haystack builds LLM pipelines."]],
predicted_answers=["Haystack lets you train LLMs from scratch."],
)["individual_scores"])
sas = SASEvaluator(model="cross-encoder/stsb-distilroberta-base")
sas.warm_up()
print(sas.run(
ground_truth_answers=["Haystack builds LLM pipelines."],
predicted_answers=["Haystack is a framework for building LLM pipelines."],
)["score"])
Output:
[1.0]
[0.0]
0.91
SASEvaluator (Semantic Answer Similarity) is the standard semantic-match metric for QA. Pair these with EvaluationRunResult.aggregate_report() to produce a CSV for CI.
Serialising to YAML
Pipelines round-trip to YAML, which is the deployment unit for hayhooks.
yaml_str = rag.dumps()
print(yaml_str[:200])
from haystack import Pipeline
rag2 = Pipeline.loads(yaml_str)
YAML looks like:
components:
text_embedder:
type: haystack.components.embedders.SentenceTransformersTextEmbedder
init_parameters:
model: BAAI/bge-small-en-v1.5
connections:
- sender: text_embedder.embedding
receiver: retriever.query_embedding
Commit the YAML to git as a configuration artefact. CI loads it, runs
pipeline.warm_up(), and asserts the topology validates before deploy.
Deploying with hayhooks
hayhooks turns Haystack pipelines into REST endpoints with one command.
pip install hayhooks
hayhooks pipeline deploy-files -n my-rag ./pipelines/rag/
hayhooks run
Output:
INFO: Uvicorn running on http://0.0.0.0:1416
INFO: Pipeline 'my-rag' deployed at POST /my-rag/run
Each deployed pipeline accepts a JSON body matching the pipeline's required inputs and returns the pipeline's outputs as JSON.
Document stores — overview
| Store | Persistence | Filters | Hybrid search | Best for |
|---|---|---|---|---|
InMemoryDocumentStore | none | yes | yes (BM25 + dense) | Notebooks, tests, small demos |
ChromaDocumentStore | local SQLite/duckdb | yes | dense only | Local dev, single-node |
QdrantDocumentStore | server / embedded | rich | dense + sparse | Production, payload filtering |
WeaviateDocumentStore | server | rich | hybrid | Multi-tenant, GraphQL |
ElasticsearchDocumentStore | server | rich | BM25 + dense | Existing Elastic infra |
OpenSearchDocumentStore | server | rich | BM25 + dense | AWS-managed search |
PgvectorDocumentStore | Postgres | SQL | dense | Postgres-heavy stacks |
Real-world recipes
Recipe — re-index changed files only
Skip embedding cost by hashing file contents and writing only when meta["hash"] differs.
import hashlib
from pathlib import Path
from haystack import Document
def changed_docs(paths, store) -> list[Document]:
out = []
for p in paths:
text = Path(p).read_text()
h = hashlib.sha256(text.encode()).hexdigest()
existing = store.filter_documents(filters={"field": "meta.path", "operator": "==", "value": p})
if existing and existing[0].meta.get("hash") == h:
continue
out.append(Document(content=text, meta={"path": p, "hash": h}))
return out
new = changed_docs(["docs/intro.txt", "docs/api.txt"], store)
if new:
idx.run({"converter": {"sources": []}, "embedder": {"documents": new}})
Recipe — multi-query expansion before retrieval
Have the LLM rewrite the question into N variants, retrieve for each, then fuse.
from haystack.components.builders import PromptBuilder
expander_template = """Generate 3 paraphrases of this question, one per line:
{{ question }}"""
expander = Pipeline()
expander.add_component("prompt", PromptBuilder(template=expander_template))
expander.add_component("llm", OpenAIGenerator(model="gpt-4o-mini"))
expander.connect("prompt.prompt", "llm.prompt")
variants = expander.run({"prompt": {"question": "What is hybrid retrieval?"}})["llm"]["replies"][0].splitlines()
all_docs = []
for v in variants:
all_docs.extend(rag.run({"text_embedder": {"text": v}})["retriever"]["documents"])
Recipe — guard against empty retrieval
If retrieval returns nothing, short-circuit to a "I don't know" answer rather than letting the LLM hallucinate.
out = rag.run(...)
docs = out["retriever"]["documents"]
if not docs or max(d.score for d in docs) < 0.3:
answer = "I could not find a relevant answer in the indexed documents."
else:
answer = out["llm"]["replies"][0]
Recipe — pipeline draw for code review
rag.draw("rag_pipeline.png")
Output:
(rag_pipeline.png written; renders the component graph via graphviz)
Attach the PNG to the PR — reviewers immediately see whether retriever, prompt, and answer-builder are wired correctly.
Recipe — async batch evaluation in CI
from haystack.evaluation import EvaluationRunResult
questions = ["What is RAG?", "What does the splitter do?"]
truths = ["Retrieval-augmented generation.", "Splits documents into chunks."]
predictions = [rag.run({...})["answer"]["answers"][0].data for _ in questions]
result = EvaluationRunResult(
run_name="ci_eval",
inputs={"questions": questions, "ground_truth_answers": truths, "predicted_answers": predictions},
results={"sas": sas.run(ground_truth_answers=truths, predicted_answers=predictions)},
)
print(result.aggregated_report())
Quick reference
| Task | Code |
|---|---|
| Install core | pip install haystack-ai |
| Create pipeline | p = Pipeline() |
| Add component | p.add_component("name", Instance()) |
| Connect sockets | p.connect("a.out", "b.in") |
| Run pipeline | p.run({"comp": {"socket": value}}) |
| Draw graph | p.draw("p.png") |
| Dump YAML | p.dumps() |
| Load YAML | Pipeline.loads(yaml) |
| Document | Document(content="...", meta={...}) |
| In-memory store | InMemoryDocumentStore() |
| BM25 retriever | InMemoryBM25Retriever(document_store=s) |
| Dense retriever | InMemoryEmbeddingRetriever(document_store=s) |
| Hybrid fusion | DocumentJoiner(join_mode="reciprocal_rank_fusion") |
| Reranker | TransformersSimilarityRanker(model="...") |
| Prompt | PromptBuilder(template="... {{ var }} ...") |
| OpenAI generator | OpenAIGenerator(model="gpt-4o-mini") |
| Chat generator | OpenAIChatGenerator(model="gpt-4o-mini") |
| Local generator | HuggingFaceLocalGenerator(model="...") |
| Answer builder | AnswerBuilder() |
| Context eval | ContextRelevanceEvaluator() |
| Faithfulness eval | FaithfulnessEvaluator() |
| Semantic similarity | SASEvaluator(model="cross-encoder/...") |
| Tool call | Tool(name=, description=, parameters=, function=) |
| Deploy as REST | hayhooks pipeline deploy-files -n name ./dir/ |