cheat sheet

haystack-ai

Package-level reference for haystack-ai on PyPI — install variants, the farm-haystack v1 → haystack-ai v2 rename, integrations, and alternative frameworks.

#pip#package#ai#rag#agentsupdated 05-31-2026

haystack-ai

What it is

haystack-ai is the PyPI distribution of Haystack 2.x, deepset's open-source Python framework for building LLM applications around an explicit pipeline graph. Components — document loaders, splitters, embedders, retrievers, generators, evaluators — are typed Python classes with declared input and output sockets; pipelines connect those sockets into a directed graph that the framework validates at construction time.

The package is the canonical install name since the 2.x rewrite. The older 1.x line lives on PyPI as farm-haystack and is in maintenance only — the architectures are not source-compatible.

Reach for haystack-ai when you want explicit DAG-style pipelines that serialise cleanly to YAML and deploy as REST endpoints (via hayhooks). Reach for langchain if you prefer the LCEL | pipe DSL, or llama-index if your workload is dominated by indexing and retrieval rather than full agent orchestration.

Install

bash
pip install haystack-ai

Output: (none — exits 0 on success)

bash
uv add haystack-ai

Output: dependency resolved + added to pyproject.toml

bash
poetry add haystack-ai

Output: updated lockfile + virtualenv install

bash
pip install farm-haystack       # legacy v1 line (do not start new projects on this)

Output: installs the deprecated 1.x architecture

Versioning & Python support

  • Current line is the 2.x series. Minor releases land roughly monthly and may add or rename components; pin a tight range for production deployments.
  • Recent versions support Python 3.9+. Pure-Python core, with component-specific extras pulling in heavier dependencies (torch, sentence-transformers, vector-DB clients).
  • The split between haystack-ai (core) and haystack-integrations (third-party connectors published under the haystack_integrations.* namespace) is the key architectural change in 2.x. Most integrations now ship as their own PyPI packages (chroma-haystack, qdrant-haystack, anthropic-haystack, cohere-haystack, …) rather than as extras on haystack-ai itself.
  • farm-haystack (the 1.x package) is unmaintained for new features. Security fixes only.

Package metadata

  • Maintainer: deepset and community contributors
  • Project home: github.com/deepset-ai/haystack
  • Integrations monorepo: github.com/deepset-ai/haystack-core-integrations
  • Docs: docs.haystack.deepset.ai
  • PyPI: pypi.org/project/haystack-ai
  • License: Apache-2.0
  • Governance: company-led (deepset) with open contributions; commercial deepset Cloud is the hosted offering
  • First released: haystack-ai since the 2.0 release in late 2024; farm-haystack line dates back to 2020
  • Downloads: millions per month across both packages, growing share on haystack-ai

Optional dependencies & extras

The haystack-ai core package keeps its dependency surface deliberately small. Heavyweight features ship as separate integration packages, each installable on its own:

  • chroma-haystack, qdrant-haystack, weaviate-haystack, pgvector-haystack, pinecone-haystack, elasticsearch-haystack, opensearch-haystack, mongodb-atlas-haystack — document-store integrations.
  • anthropic-haystack, cohere-haystack, mistral-haystack, google-ai-haystack, amazon-bedrock-haystack, nvidia-haystack — generator integrations.
  • sentence-transformers — usually installed directly for local embedding components.
  • fastembed-haystack, instructor-embedders-haystack — alternative embedder packs.
  • ragas-haystack, deepeval-haystack — evaluation glue.
  • hayhooks — REST/MCP serving for Haystack pipelines.
  • haystack-experimental — preview components that may move into core or get removed.

The base install pulls in openai, tenacity, pandas, jinja2, lazy-imports, and posthog among others. The integration packages each pull in their own SDKs.

Alternatives

PackageTrade-off
langchain / langchain-coreLCEL pipe-DSL plus a giant ecosystem. Use when you want the broadest integration coverage.
llama-indexIndexing- and retrieval-first abstractions. Use when RAG is the whole product.
dspy-aiProgrammatic prompt optimisation with dspy.Module. Use when you want to compile prompts.
semantic-kernelMicrosoft's planner-and-skill orchestration. Use in .NET-adjacent stacks.
autogen-agentchatMulti-agent conversations. Use when the design is agent-to-agent.
crewaiRole-based agent crews. Use for narrative multi-agent flows.
farm-haystack (legacy 1.x)Mature 1.x API. Use only to maintain an existing 1.x deployment.

Common gotchas

  1. haystack-ai (v2) vs farm-haystack (v1) is a full rewrite. Component classes, pipeline construction (Pipeline.add_component + Pipeline.connect vs the old add_node), and serialisation are all different. v1 tutorials and Stack Overflow answers do not apply to v2.
  2. Integrations live in separate packages. Do not pip install haystack-ai[chroma] (no such extra) — install chroma-haystack and import from haystack_integrations.document_stores.chroma.
  3. Pipeline socket types are strict. Connecting a List[Document] output to an input declared as str raises a PipelineConnectError at wiring time, not runtime. This is a feature, but surprising on first use.
  4. farm-haystack is still pip-installable, but it shares the haystack import name with v2's haystack namespace from haystack-ai. Never install both into the same environment.
  5. REST serving lives in hayhooks, not in haystack-ai. You install and run it separately to expose a pipeline as an HTTP endpoint.
  6. haystack-experimental is a moving target. Components there may get promoted (and renamed) into core, or removed. Pin the version if you depend on an experimental component.
  7. OpenTelemetry tracing is built in but opt-in. Production deployments that expect traces in their APM need to enable Haystack's tracing module and configure an OTel exporter.

Real-world recipes

The recipes below focus on the install / integration-package choices each pattern requires — the sections/frameworks/haystack companion covers components and the pipeline API in depth.

Minimal RAG pipeline (in-memory store) — uses only haystack-ai core, no integrations. Useful for tests.

python
from haystack import Pipeline, Document
from haystack.components.embedders import SentenceTransformersTextEmbedder, SentenceTransformersDocumentEmbedder
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack.components.generators import OpenAIGenerator
from haystack.components.builders import PromptBuilder
from haystack.document_stores.in_memory import InMemoryDocumentStore

store = InMemoryDocumentStore()
doc_embedder = SentenceTransformersDocumentEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")
doc_embedder.warm_up()
docs = [Document(content="HNSW is a graph-based ANN algorithm.")]
store.write_documents(doc_embedder.run(documents=docs)["documents"])

p = Pipeline()
p.add_component("text_embedder", SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2"))
p.add_component("retriever", InMemoryEmbeddingRetriever(document_store=store))
p.add_component("prompt", PromptBuilder(template="Answer using:\n{{documents}}\nQ: {{question}}"))
p.add_component("llm", OpenAIGenerator(model="gpt-4o-mini"))
p.connect("text_embedder.embedding", "retriever.query_embedding")
p.connect("retriever.documents", "prompt.documents")
p.connect("prompt.prompt", "llm.prompt")

answer = p.run({"text_embedder": {"text": "What is HNSW?"}, "prompt": {"question": "What is HNSW?"}})
print(answer["llm"]["replies"][0])

Output: a generated answer grounded in the retrieved document; the pipeline graph wires text → embedding → retriever → prompt → LLM with strict socket types

Production RAG with Qdrant document store — requires qdrant-haystack from the integrations namespace.

python
from haystack_integrations.document_stores.qdrant import QdrantDocumentStore
from haystack_integrations.components.retrievers.qdrant import QdrantEmbeddingRetriever

store = QdrantDocumentStore(
    url="http://qdrant.internal:6333",
    api_key=os.environ["QDRANT_API_KEY"],
    index="kb",
    embedding_dim=384,
    recreate_index=False,
)
retriever = QdrantEmbeddingRetriever(document_store=store, top_k=10)

Output: the retriever queries Qdrant directly; integration packages live in haystack_integrations.* namespace, separate from haystack-ai core

Hybrid retrieval pipeline (BM25 + embedding + reranker) — branches the pipeline graph and joins via a reranker.

python
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.components.joiners import DocumentJoiner
from haystack.components.rankers import TransformersSimilarityRanker

p = Pipeline()
p.add_component("text_embedder", SentenceTransformersTextEmbedder(...))
p.add_component("dense_retriever", InMemoryEmbeddingRetriever(document_store=store, top_k=20))
p.add_component("bm25_retriever", InMemoryBM25Retriever(document_store=store, top_k=20))
p.add_component("joiner", DocumentJoiner(join_mode="reciprocal_rank_fusion", top_k=20))
p.add_component("ranker", TransformersSimilarityRanker(model="BAAI/bge-reranker-base", top_k=5))

p.connect("text_embedder.embedding", "dense_retriever.query_embedding")
p.connect("dense_retriever.documents", "joiner.documents")
p.connect("bm25_retriever.documents", "joiner.documents")
p.connect("joiner.documents", "ranker.documents")

Output: dense + BM25 results fused via RRF, then reranked by a cross-encoder; this is the canonical "good retrieval" pipeline

Conversational RAG with chat history — Haystack's ChatGenerator + ChatPromptBuilder carry message history through the graph.

python
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.components.builders import ChatPromptBuilder
from haystack.dataclasses import ChatMessage

prompt_builder = ChatPromptBuilder(template=[
    ChatMessage.from_system("You answer using only the provided documents."),
    ChatMessage.from_user("Documents:\n{{documents}}\n\nQ: {{question}}"),
])
chat_llm = OpenAIChatGenerator(model="gpt-4o")

Output: a chat-shaped LLM call; pair with a memory component (e.g. from haystack-experimental) to persist history across turns

Serialise and reload a pipelinePipeline.dumps() and Pipeline.loads() round-trip to YAML, including component arguments. Used by hayhooks to deploy pipelines as REST endpoints.

python
yaml_str = p.dumps()
with open("rag.yaml", "w") as f: f.write(yaml_str)

# Later, in another process:
restored = Pipeline.loads(open("rag.yaml").read())

Output: a YAML representation that captures the graph structure and every component's config; secrets must be supplied separately at load time

Production deployment

Haystack is library code — production deployment is a service that hosts your pipeline, typically built on FastAPI or via the official hayhooks REST/MCP wrapper.

Topology checklist:

ConcernApproach
Pipeline definitionPython code in version control, or YAML next to code
Servinghayhooks (REST/MCP) or your own FastAPI wrapper
Document storeexternal service (Qdrant, Weaviate, Chroma, Elastic, OpenSearch, pgvector)
Embeddingslocal (sentence-transformers) or remote (openai, cohere, voyageai) via component packs
Secretsenv vars surfaced via Secret.from_env_var(...)
TracingOpenTelemetry exporter via haystack.tracing
Evalragas-haystack or deepeval-haystack integration

hayhooks for REST exposure. hayhooks runs Haystack pipelines as HTTP endpoints, with OpenAPI schemas derived from pipeline input/output sockets. Install separately:

bash
pip install hayhooks
hayhooks deploy rag.yaml --name rag-v1
# POST http://localhost:8001/rag-v1/run with {"query": "..."}

Output: the pipeline is exposed at a REST endpoint with auto-generated OpenAPI; same wrapper exposes pipelines as MCP servers for agentic use

Secrets handling. Components that need API keys accept Secret objects, not raw strings. Read from env at runtime so secrets do not land in YAML:

python
from haystack.utils import Secret
generator = OpenAIGenerator(api_key=Secret.from_env_var("OPENAI_API_KEY"))

Output: the API key is fetched at run time from the env; YAML serialisations record only the env var name

Document store choice. The InMemoryDocumentStore is for tests. Production stores (one integration package each):

StorePackageWhen
Qdrantqdrant-haystackRust-backed; rich filtering
Weaviateweaviate-haystackhybrid + generative modules
Chromachroma-haystackembedded prototypes / small prod
Elasticsearchelasticsearch-haystackmature BM25 with vector add-on
OpenSearchopensearch-haystackAWS-native ES fork
pgvectorpgvector-haystackone less moving part if Postgres is already there
Pineconepinecone-haystackhosted only
MongoDB Atlasmongodb-atlas-haystackif MongoDB is already your store

Each integration package is installable on its own — no haystack-ai[chroma] shorthand.

OpenTelemetry tracing. Enable in code (or env) and configure an OTel exporter:

python
from haystack.tracing import enable_tracing
enable_tracing()

Output: each pipeline run emits a trace span with per-component child spans; configure OTEL_* env vars for the exporter

Version migration guide

The farm-haystack (v1) to haystack-ai (v2) split is the largest migration in the project's history. The two are not source-compatible.

v1 → v2 checklist:

  • Package name: farm-haystackhaystack-ai. Never install both into the same environment — they share the haystack import namespace.
  • Pipeline API: Pipeline.add_node(name=..., component=..., inputs=[...]) (string-based wiring) → Pipeline.add_component(name, component) + Pipeline.connect("a.out_socket", "b.in_socket") (typed wiring).
  • Component classes: EmbeddingRetriever, BM25Retriever, FARMReader, etc., from v1 are replaced by *EmbeddingRetriever, *BM25Retriever, generator components, etc. Names changed; the architecture is different.
  • Document stores: v1 in-process Stores are gone. Use external stores via integration packages.
  • REST API: v1 had built-in REST; v2 uses the separate hayhooks package.
  • YAML schema: v1 and v2 YAML are not compatible. Regenerate.

v2.x minor-to-minor. Releases land roughly monthly. Most changes are additive, but:

  • Components in haystack-experimental may be promoted to core (and renamed) or removed.
  • Component constructor arguments evolve; keep an eye on default-value changes.
  • Integration packages have their own release cadences — pin both haystack-ai and each integration package.

Pinning strategy. A reproducible setup pins each piece:

text
haystack-ai>=2.5,<2.6
qdrant-haystack>=4.0,<5.0
sentence-transformers>=3.0,<4.0

The integration-package version typically tracks haystack-ai minor; check each package's compatibility note.

Ecosystem integrations

Haystack 2.x's design philosophy is core minimal, integrations external. The integrations monorepo (haystack-core-integrations) ships dozens of packages:

Document stores: qdrant-haystack, weaviate-haystack, chroma-haystack, pgvector-haystack, pinecone-haystack, elasticsearch-haystack, opensearch-haystack, mongodb-atlas-haystack, astra-haystack, mariadb-haystack.

Generators / chat: anthropic-haystack, cohere-haystack, mistral-haystack, google-ai-haystack, amazon-bedrock-haystack, nvidia-haystack, ollama-haystack, together-haystack, groq-haystack.

Embedders: fastembed-haystack, instructor-embedders-haystack, jina-haystack, voyageai-haystack.

Eval: ragas-haystack, deepeval-haystack.

Tooling: hayhooks (REST/MCP serving), haystack-experimental (preview components).

Imports namespace. Integrations live under haystack_integrations.*:

python
from haystack_integrations.document_stores.chroma import ChromaDocumentStore
from haystack_integrations.components.generators.anthropic import AnthropicGenerator

Multi-language clients. Haystack 2.x is Python-only. Other-language clients call the hayhooks-deployed REST endpoint.

Troubleshooting common errors

  • PipelineConnectError: cannot connect 'a.out' to 'b.in' — socket types don't match. Inspect each component's Input/Output declarations; you may need an adapter component (e.g. DocumentJoiner).
  • ImportError: cannot import name 'X' from 'haystack' — v1 tutorial code on v2. Either install farm-haystack (legacy) or migrate.
  • No module named 'haystack_integrations' — you didn't install the integration package. pip install qdrant-haystack etc.
  • Both farm-haystack and haystack-ai installed — they fight over the haystack namespace. Uninstall one (pip uninstall farm-haystack haystack-ai) and reinstall.
  • Pipeline.run(...) raises on missing input — every input socket without a connection or default must be supplied in the run dict. Inspect with Pipeline.inputs().
  • Pipeline YAML loads but fails on run — secrets weren't supplied. Set the env vars referenced by Secret.from_env_var(...) before Pipeline.loads(...).
  • SentenceTransformersTextEmbedder raises on first use — call embedder.warm_up() once to load the model.

Performance tuning

LeverMechanismWhen it helps
Component warm_up()preload modelsreuse-once vs cold-start cost
Pipeline reuse across requestsconstruct once at startupevery web request
Async components (where available)non-blocking I/Oconcurrent users
Batch embeddersamortise model loadbulk indexing
External vector storeoffload retrievalscale beyond in-memory
Streaming generatorsprogressive outputUX latency
OpenAI / Anthropic prompt cachingreuse system promptsrepeated calls

Streaming. OpenAIGenerator and OpenAIChatGenerator support streaming_callback= for token-by-token responses. The callback runs in the LLM-call thread; keep it cheap.

Pipeline reuse. Pipeline objects are designed to be reused. Construct once at app startup; share across requests. Component state (embedded models, vector-store connections) lives for the process lifetime.

Heavy-component startup cost. SentenceTransformersTextEmbedder loads a model on first use unless you call warm_up(). In a web server, call warm_up() during startup to avoid paying for the first request.

Security considerations

Haystack's surface area depends on which components you wire — most security concerns inherit from the underlying SDKs (OpenAI, Anthropic, Qdrant, etc.).

  • Secrets handling. Components accept Secret objects, not raw strings. Use Secret.from_env_var(...) so YAML serialisations never contain plaintext keys.
  • Pipeline YAML in version control. YAML defines structure and component types — safe to commit. Secrets are referenced by env var name only.
  • Prompt injection via documents. Retrieved documents are interpolated into prompt templates. A document with "Ignore the above and..." can hijack the generator. Use prompt templates that fence document content (e.g. inside XML tags) and trim suspicious content.
  • Generator output as code. If your pipeline feeds LLM output to a code interpreter or shell, treat it as untrusted input. Use sandboxes (vercel-sandbox, e2b, microVMs).
  • Component-level traces. OpenTelemetry traces include component inputs and outputs. Configure trace sampling and PII scrubbing before exporting to a third-party APM.
  • Document store auth. Pass credentials via Secret; never bake into YAML. RBAC on the underlying store (Qdrant API keys, Weaviate OIDC, etc.) is your tenant isolation layer.
  • Self-update of haystack-experimental — preview components churn. Pin and audit before adopting in production.

When NOT to use this

Haystack 2.x is the right framework when you want explicit DAG-style pipelines with strict typed wiring. It's the wrong tool when:

  • You want the LCEL pipe DSL. LangChain's runnable | runnable syntax is more concise for linear chains.
  • Your workload is indexing-heavy. LlamaIndex has stronger indexing primitives (composable indexes, sub-question decomposition).
  • You need a massive integration ecosystem. LangChain still has the broadest provider coverage.
  • You're a one-component user. If you only need a retriever + LLM call, the SDKs directly (openai + qdrant-client) are 20 lines and no framework.
  • You want hosted-only, no Python. deepset Cloud is the managed offering; otherwise you're hosting Python.
  • Multi-agent narrative flows. crewai, autogen, or langgraph model agent-to-agent conversations more directly.

See also