cheat sheet

Haystack 2.x

Build production-grade LLM pipelines with Haystack 2.x. Covers components, the pipeline graph, indexing and querying, retrievers, generators, RAG patterns, and evaluation.

Haystack 2.x — Pipelines for LLM Applications

What it is

Haystack is an open-source Python framework from deepset for building production LLM applications around an explicit pipeline graph. Every step — document loading, splitting, embedding, retrieval, prompt building, generation, evaluation — is a typed Component with declared input and output sockets, and a Pipeline is a directed graph that connects those sockets. Haystack 2.x (released late 2024) is a ground-up rewrite of the original 1.x API: components are dataclass-like Python classes, pipelines are explicitly wired, and the framework is built for both indexing (writing documents) and querying (reading them) with the same primitives.

Compared to LangChain's LCEL pipe operator or LlamaIndex's query engines, Haystack's mental model is closer to a dataflow DAG: you wire component_a.output_socket → component_b.input_socket and the framework validates the connection types up-front. This makes Haystack pipelines easy to serialise to YAML, deploy as REST endpoints with hayhooks, and reason about in code review.

Install

Haystack ships as haystack-ai on PyPI (the older 1.x package was farm-haystack).

bash
pip install haystack-ai

pip install "haystack-ai[chroma]"
pip install "haystack-ai[qdrant]"
pip install sentence-transformers
pip install anthropic-haystack

Output:

text
Successfully installed haystack-ai-2.x.x ...

haystack-ai vs farm-haystack — only install one. farm-haystack is the legacy 1.x package; 2.x lives in haystack-ai with from haystack import ... imports. Mixing them in the same environment causes import shadowing.

Quick example — indexing + querying

A minimal end-to-end RAG flow: index three documents into an in-memory store, then ask a question against them. The two pipelines share the same DocumentStore instance.

python
from haystack import Document, Pipeline
from haystack.components.embedders import SentenceTransformersDocumentEmbedder, SentenceTransformersTextEmbedder
from haystack.components.writers import DocumentWriter
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator
from haystack.document_stores.in_memory import InMemoryDocumentStore
import os

store = InMemoryDocumentStore()

docs = [
    Document(content="Haystack 2.x pipelines are directed graphs of typed components."),
    Document(content="Each component declares input and output sockets."),
    Document(content="Pipelines can be serialised to YAML and deployed via hayhooks."),
]

index_pipe = Pipeline()
index_pipe.add_component("embedder", SentenceTransformersDocumentEmbedder(model="BAAI/bge-small-en-v1.5"))
index_pipe.add_component("writer",   DocumentWriter(document_store=store))
index_pipe.connect("embedder.documents", "writer.documents")
index_pipe.run({"embedder": {"documents": docs}})

template = """Answer the question using only the context.

Context:
{% for d in documents %}- {{ d.content }}
{% endfor %}

Question: {{ question }}
Answer:"""

query_pipe = Pipeline()
query_pipe.add_component("text_embedder", SentenceTransformersTextEmbedder(model="BAAI/bge-small-en-v1.5"))
query_pipe.add_component("retriever",     InMemoryEmbeddingRetriever(document_store=store))
query_pipe.add_component("prompt",        PromptBuilder(template=template))
query_pipe.add_component("llm",           OpenAIGenerator(api_key_env_var="OPENAI_API_KEY", model="gpt-4o-mini"))

query_pipe.connect("text_embedder.embedding", "retriever.query_embedding")
query_pipe.connect("retriever.documents", "prompt.documents")
query_pipe.connect("prompt.prompt", "llm.prompt")

q = "What can Haystack pipelines be serialised to?"
result = query_pipe.run({"text_embedder": {"text": q}, "prompt": {"question": q}})
print(result["llm"]["replies"][0])

Output:

text
Haystack pipelines can be serialised to YAML and deployed via hayhooks.

When / why to use it

  • Building production RAG where you want explicit, debuggable component boundaries instead of nested chains.
  • Indexing pipelines and query pipelines that need to share a DocumentStore configuration.
  • Serialising LLM apps to YAML so non-engineers can review or edit pipeline topology.
  • Deploying pipelines as REST endpoints with hayhooks.
  • Evaluation-driven RAG: the framework has first-class Evaluator components for context relevance, faithfulness, and SAS.
  • Mixing closed (OpenAI, Anthropic, Cohere) and open (Hugging Face, vLLM, Ollama) models behind a uniform Generator interface.

Common pitfalls

Socket name mismatchPipeline.connect("a.foo", "b.bar") fails fast at connect() time if a has no output named foo or b has no input named bar. The error includes the available socket names — read them carefully rather than guessing.

Embedder model mismatch between index and query — if you index with bge-small-en-v1.5 and query with all-MiniLM-L6-v2, vectors live in different spaces and retrieval returns garbage. Always use the same model= string in SentenceTransformersDocumentEmbedder and SentenceTransformersTextEmbedder.

PromptBuilder is Jinja2{{ var }} and {% for %} are Jinja, not Python f-strings. Forgetting to escape literal { causes silent template errors.

In-memory stores are not persistentInMemoryDocumentStore lives only for the process lifetime. For anything beyond a notebook, use ChromaDocumentStore, QdrantDocumentStore, WeaviateDocumentStore, or ElasticsearchDocumentStore.

Call pipeline.draw("pipeline.png") to render a graphviz diagram of the wiring. Invaluable for code review of multi-stage RAG pipelines.

pipeline.dumps() returns YAML; Pipeline.loads(yaml_str) rebuilds the pipeline. Commit the YAML to git and loads() at startup to keep topology declarative.

Components — the atom of Haystack

A Component is a Python class decorated with @component, exposing run(...) whose parameter names become input sockets and whose return-typed dict becomes output sockets. Components are reusable across pipelines.

python
from haystack import component
from typing import List

@component
class UppercaseTagger:
    """Tag each document by uppercasing the first 20 characters of its content."""

    @component.output_types(documents=List["Document"])
    def run(self, documents: list):
        for d in documents:
            d.meta["tag"] = d.content[:20].upper()
        return {"documents": documents}

The @component.output_types(...) decorator names and types each output socket. The run signature names each input socket and uses standard type hints — Haystack uses these to validate connect() calls.

Document — the data primitive

Documents carry content (text or bytes), meta (dict for filters and provenance), id (auto-generated SHA-256), and an embedding vector once embedded.

python
from haystack import Document

doc = Document(content="Hello world", meta={"source": "readme.md", "section": "intro"})
print(doc.id, doc.meta)

Output:

text
e0c9035898dd52fc65c41454cec9c4d2611bfb37 {'source': 'readme.md', 'section': 'intro'}

Document.meta is the standard place to put filtering metadata — every retriever accepts a filters= argument that operates on meta.

Pipeline — wiring components together

A Pipeline is a directed graph. add_component(name, instance) registers a component under a unique name; connect("a.out", "b.in") wires sockets. Inputs that are not connected to any upstream socket must be supplied at run() time via the {component_name: {socket_name: value}} dict.

python
from haystack import Pipeline
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator

p = Pipeline()
p.add_component("prompt", PromptBuilder(template="Define {{ word }} in one sentence."))
p.add_component("llm",    OpenAIGenerator(model="gpt-4o-mini"))
p.connect("prompt.prompt", "llm.prompt")

print(p.run({"prompt": {"word": "monad"}})["llm"]["replies"][0])

Output:

text
A monad is a design pattern that wraps a value and a function for chaining computations
while controlling side effects, used heavily in functional programming.

Indexing pipelines

Indexing is the one-time (or incremental) flow that turns raw files into embedded Documents in a store. A typical chain: file source → converter → cleaner → splitter → embedder → writer.

python
from haystack import Pipeline
from haystack.components.converters import TextFileToDocument, PyPDFToDocument
from haystack.components.preprocessors import DocumentSplitter, DocumentCleaner
from haystack.components.embedders import SentenceTransformersDocumentEmbedder
from haystack.components.writers import DocumentWriter
from haystack.document_stores.in_memory import InMemoryDocumentStore

store = InMemoryDocumentStore()

idx = Pipeline()
idx.add_component("converter", TextFileToDocument())
idx.add_component("cleaner",   DocumentCleaner(remove_empty_lines=True))
idx.add_component("splitter",  DocumentSplitter(split_by="word", split_length=200, split_overlap=20))
idx.add_component("embedder",  SentenceTransformersDocumentEmbedder(model="BAAI/bge-small-en-v1.5"))
idx.add_component("writer",    DocumentWriter(document_store=store))

idx.connect("converter.documents", "cleaner.documents")
idx.connect("cleaner.documents",   "splitter.documents")
idx.connect("splitter.documents",  "embedder.documents")
idx.connect("embedder.documents",  "writer.documents")

result = idx.run({"converter": {"sources": ["./docs/intro.txt", "./docs/api.txt"]}})
print(f"Wrote {result['writer']['documents_written']} chunks")

Output:

text
Wrote 42 chunks

Choosing a splitter

DocumentSplitter(split_by=...) accepts "word", "sentence", "passage", "page", or "function" (custom callable). split_overlap overlaps consecutive chunks — set ~10–20% of split_length to preserve context across boundaries.

For markdown and code, prefer split_by="passage" (splits on \n\n) over "word" — it respects natural section boundaries.

Retrievers

Retrievers fetch the top-k documents for a query. Haystack has retriever components per store and per retrieval mode (BM25, dense embedding, hybrid).

python
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever, InMemoryEmbeddingRetriever
from haystack.document_stores.in_memory import InMemoryDocumentStore

store = InMemoryDocumentStore()

bm25      = InMemoryBM25Retriever(document_store=store, top_k=5)
embedding = InMemoryEmbeddingRetriever(document_store=store, top_k=5, filters={"field": "meta.section", "operator": "==", "value": "api"})

For other stores, swap in the matching retriever:

StoreEmbedding retriever
ChromaChromaEmbeddingRetriever
QdrantQdrantEmbeddingRetriever
WeaviateWeaviateEmbeddingRetriever
ElasticsearchElasticsearchEmbeddingRetriever
pgvectorPgvectorEmbeddingRetriever

Hybrid retrieval and reranking

Combine BM25 and dense retrieval with a DocumentJoiner, then rerank with a cross-encoder.

python
from haystack import Pipeline
from haystack.components.joiners import DocumentJoiner
from haystack.components.rankers import TransformersSimilarityRanker

hybrid = Pipeline()
hybrid.add_component("text_embedder", SentenceTransformersTextEmbedder(model="BAAI/bge-small-en-v1.5"))
hybrid.add_component("bm25",          InMemoryBM25Retriever(document_store=store, top_k=10))
hybrid.add_component("dense",         InMemoryEmbeddingRetriever(document_store=store, top_k=10))
hybrid.add_component("joiner",        DocumentJoiner(join_mode="reciprocal_rank_fusion"))
hybrid.add_component("ranker",        TransformersSimilarityRanker(model="BAAI/bge-reranker-base", top_k=5))

hybrid.connect("text_embedder.embedding", "dense.query_embedding")
hybrid.connect("bm25.documents",  "joiner.documents")
hybrid.connect("dense.documents", "joiner.documents")
hybrid.connect("joiner.documents", "ranker.documents")

DocumentJoiner(join_mode="reciprocal_rank_fusion") is the standard hybrid fusion algorithm; "concatenate" or "merge" are alternatives when scores are comparable.

Generators

Generators wrap LLMs. Single-turn generators return replies: list[str]; chat generators accept and return ChatMessage objects.

python
from haystack.components.generators import OpenAIGenerator, HuggingFaceLocalGenerator
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.dataclasses import ChatMessage

text_gen = OpenAIGenerator(api_key_env_var="OPENAI_API_KEY", model="gpt-4o-mini")
print(text_gen.run(prompt="Name three databases.")["replies"][0])

chat_gen = OpenAIChatGenerator(model="gpt-4o-mini")
msgs = [
    ChatMessage.from_system("You are concise."),
    ChatMessage.from_user("Define vector similarity."),
]
print(chat_gen.run(messages=msgs)["replies"][0].text)

Output:

text
PostgreSQL, MongoDB, and SQLite.
Vector similarity measures how close two vectors are in an embedding space, typically via cosine or dot product.

For local models via Hugging Face Transformers:

python
local = HuggingFaceLocalGenerator(model="microsoft/Phi-3-mini-4k-instruct", task="text-generation")
local.warm_up()
print(local.run(prompt="What is RAG?")["replies"][0])

All generators support a streaming callback via streaming_callback=. Use print_streaming_chunk from haystack.components.generators.utils for stdout streaming during development.

RAG with sources and citations

Return both the answer and the source documents so the UI can render citations.

python
from haystack import Pipeline
from haystack.components.builders import PromptBuilder, AnswerBuilder
from haystack.components.embedders import SentenceTransformersTextEmbedder
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack.components.generators import OpenAIGenerator

template = """Answer using only the context. If the answer is not present, say "I don't know".

{% for d in documents %}
[{{ loop.index }}] {{ d.content }}
{% endfor %}

Question: {{ question }}
"""

rag = Pipeline()
rag.add_component("text_embedder", SentenceTransformersTextEmbedder(model="BAAI/bge-small-en-v1.5"))
rag.add_component("retriever",     InMemoryEmbeddingRetriever(document_store=store, top_k=4))
rag.add_component("prompt",        PromptBuilder(template=template))
rag.add_component("llm",           OpenAIGenerator(model="gpt-4o-mini"))
rag.add_component("answer",        AnswerBuilder())

rag.connect("text_embedder.embedding", "retriever.query_embedding")
rag.connect("retriever.documents", "prompt.documents")
rag.connect("retriever.documents", "answer.documents")
rag.connect("prompt.prompt", "llm.prompt")
rag.connect("llm.replies",   "answer.replies")

q = "What does DocumentSplitter do?"
out = rag.run({"text_embedder": {"text": q}, "prompt": {"question": q}, "answer": {"query": q}})

ans = out["answer"]["answers"][0]
print(ans.data)
for d in ans.documents:
    print(f"  - {d.meta.get('source', '?')}: {d.content[:60]}...")

Output:

text
DocumentSplitter chunks a document by word, sentence, passage, or page.
  - splitter.md: DocumentSplitter accepts split_by, split_length, split_overlap...
  - splitter.md: For markdown documents, split_by="passage" preserves boundaries...

AnswerBuilder packages the LLM reply and the retrieved documents into an Answer object, the standard return type for RAG pipelines.

Tool calling and agents

Haystack supports OpenAI-style tool calling through OpenAIChatGenerator plus ToolInvoker. Define a Tool from any Python function and add it to the chat generator.

python
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.components.tools import ToolInvoker
from haystack.tools import Tool
from haystack.dataclasses import ChatMessage

def get_weather(city: str) -> str:
    """Return the current weather for the given city."""
    return f"In {city}: 21 C and sunny."

weather_tool = Tool(
    name="get_weather",
    description="Return current weather for a city.",
    parameters={"type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"]},
    function=get_weather,
)

chat = OpenAIChatGenerator(model="gpt-4o-mini", tools=[weather_tool])
invoker = ToolInvoker(tools=[weather_tool])

msgs = [ChatMessage.from_user("What is the weather in Amsterdam?")]
out  = chat.run(messages=msgs)
if out["replies"][0].tool_calls:
    tool_results = invoker.run(messages=out["replies"])["tool_messages"]
    final = chat.run(messages=msgs + out["replies"] + tool_results)
    print(final["replies"][0].text)

Output:

text
The current weather in Amsterdam is 21 C and sunny.

For multi-step agents that loop until done, wrap this in Agent from haystack.components.agents.Agent, which auto-handles the tool-call/tool-result cycle.

Evaluation

Haystack ships several evaluators that score pipeline output against ground-truth or against retrieved context.

python
from haystack.components.evaluators import (
    ContextRelevanceEvaluator,
    FaithfulnessEvaluator,
    SASEvaluator,
)

ctx_eval = ContextRelevanceEvaluator()
result = ctx_eval.run(
    questions=["What does Haystack do?"],
    contexts=[["Haystack builds LLM pipelines."]],
)
print(result["individual_scores"])

faith = FaithfulnessEvaluator()
print(faith.run(
    questions=["What does Haystack do?"],
    contexts=[["Haystack builds LLM pipelines."]],
    predicted_answers=["Haystack lets you train LLMs from scratch."],
)["individual_scores"])

sas = SASEvaluator(model="cross-encoder/stsb-distilroberta-base")
sas.warm_up()
print(sas.run(
    ground_truth_answers=["Haystack builds LLM pipelines."],
    predicted_answers=["Haystack is a framework for building LLM pipelines."],
)["score"])

Output:

text
[1.0]
[0.0]
0.91

SASEvaluator (Semantic Answer Similarity) is the standard semantic-match metric for QA. Pair these with EvaluationRunResult.aggregate_report() to produce a CSV for CI.

Serialising to YAML

Pipelines round-trip to YAML, which is the deployment unit for hayhooks.

python
yaml_str = rag.dumps()
print(yaml_str[:200])

from haystack import Pipeline
rag2 = Pipeline.loads(yaml_str)

YAML looks like:

yaml
components:
  text_embedder:
    type: haystack.components.embedders.SentenceTransformersTextEmbedder
    init_parameters:
      model: BAAI/bge-small-en-v1.5
connections:
  - sender: text_embedder.embedding
    receiver: retriever.query_embedding

Commit the YAML to git as a configuration artefact. CI loads it, runs pipeline.warm_up(), and asserts the topology validates before deploy.

Deploying with hayhooks

hayhooks turns Haystack pipelines into REST endpoints with one command.

bash
pip install hayhooks
hayhooks pipeline deploy-files -n my-rag ./pipelines/rag/
hayhooks run

Output:

text
INFO: Uvicorn running on http://0.0.0.0:1416
INFO: Pipeline 'my-rag' deployed at POST /my-rag/run

Each deployed pipeline accepts a JSON body matching the pipeline's required inputs and returns the pipeline's outputs as JSON.

Document stores — overview

StorePersistenceFiltersHybrid searchBest for
InMemoryDocumentStorenoneyesyes (BM25 + dense)Notebooks, tests, small demos
ChromaDocumentStorelocal SQLite/duckdbyesdense onlyLocal dev, single-node
QdrantDocumentStoreserver / embeddedrichdense + sparseProduction, payload filtering
WeaviateDocumentStoreserverrichhybridMulti-tenant, GraphQL
ElasticsearchDocumentStoreserverrichBM25 + denseExisting Elastic infra
OpenSearchDocumentStoreserverrichBM25 + denseAWS-managed search
PgvectorDocumentStorePostgresSQLdensePostgres-heavy stacks

Real-world recipes

Recipe — re-index changed files only

Skip embedding cost by hashing file contents and writing only when meta["hash"] differs.

python
import hashlib
from pathlib import Path
from haystack import Document

def changed_docs(paths, store) -> list[Document]:
    out = []
    for p in paths:
        text = Path(p).read_text()
        h = hashlib.sha256(text.encode()).hexdigest()
        existing = store.filter_documents(filters={"field": "meta.path", "operator": "==", "value": p})
        if existing and existing[0].meta.get("hash") == h:
            continue
        out.append(Document(content=text, meta={"path": p, "hash": h}))
    return out

new = changed_docs(["docs/intro.txt", "docs/api.txt"], store)
if new:
    idx.run({"converter": {"sources": []}, "embedder": {"documents": new}})

Recipe — multi-query expansion before retrieval

Have the LLM rewrite the question into N variants, retrieve for each, then fuse.

python
from haystack.components.builders import PromptBuilder

expander_template = """Generate 3 paraphrases of this question, one per line:
{{ question }}"""

expander = Pipeline()
expander.add_component("prompt", PromptBuilder(template=expander_template))
expander.add_component("llm",    OpenAIGenerator(model="gpt-4o-mini"))
expander.connect("prompt.prompt", "llm.prompt")

variants = expander.run({"prompt": {"question": "What is hybrid retrieval?"}})["llm"]["replies"][0].splitlines()
all_docs = []
for v in variants:
    all_docs.extend(rag.run({"text_embedder": {"text": v}})["retriever"]["documents"])

Recipe — guard against empty retrieval

If retrieval returns nothing, short-circuit to a "I don't know" answer rather than letting the LLM hallucinate.

python
out = rag.run(...)
docs = out["retriever"]["documents"]
if not docs or max(d.score for d in docs) < 0.3:
    answer = "I could not find a relevant answer in the indexed documents."
else:
    answer = out["llm"]["replies"][0]

Recipe — pipeline draw for code review

python
rag.draw("rag_pipeline.png")

Output:

text
(rag_pipeline.png written; renders the component graph via graphviz)

Attach the PNG to the PR — reviewers immediately see whether retriever, prompt, and answer-builder are wired correctly.

Recipe — async batch evaluation in CI

python
from haystack.evaluation import EvaluationRunResult

questions = ["What is RAG?", "What does the splitter do?"]
truths    = ["Retrieval-augmented generation.", "Splits documents into chunks."]
predictions = [rag.run({...})["answer"]["answers"][0].data for _ in questions]

result = EvaluationRunResult(
    run_name="ci_eval",
    inputs={"questions": questions, "ground_truth_answers": truths, "predicted_answers": predictions},
    results={"sas": sas.run(ground_truth_answers=truths, predicted_answers=predictions)},
)
print(result.aggregated_report())

Quick reference

TaskCode
Install corepip install haystack-ai
Create pipelinep = Pipeline()
Add componentp.add_component("name", Instance())
Connect socketsp.connect("a.out", "b.in")
Run pipelinep.run({"comp": {"socket": value}})
Draw graphp.draw("p.png")
Dump YAMLp.dumps()
Load YAMLPipeline.loads(yaml)
DocumentDocument(content="...", meta={...})
In-memory storeInMemoryDocumentStore()
BM25 retrieverInMemoryBM25Retriever(document_store=s)
Dense retrieverInMemoryEmbeddingRetriever(document_store=s)
Hybrid fusionDocumentJoiner(join_mode="reciprocal_rank_fusion")
RerankerTransformersSimilarityRanker(model="...")
PromptPromptBuilder(template="... {{ var }} ...")
OpenAI generatorOpenAIGenerator(model="gpt-4o-mini")
Chat generatorOpenAIChatGenerator(model="gpt-4o-mini")
Local generatorHuggingFaceLocalGenerator(model="...")
Answer builderAnswerBuilder()
Context evalContextRelevanceEvaluator()
Faithfulness evalFaithfulnessEvaluator()
Semantic similaritySASEvaluator(model="cross-encoder/...")
Tool callTool(name=, description=, parameters=, function=)
Deploy as RESThayhooks pipeline deploy-files -n name ./dir/