cheat sheet
haystack-ai
Package-level reference for haystack-ai on PyPI — install variants, the farm-haystack v1 → haystack-ai v2 rename, integrations, and alternative frameworks.
haystack-ai
What it is
haystack-ai is the PyPI distribution of Haystack 2.x, deepset's open-source Python framework for building LLM applications around an explicit pipeline graph. Components — document loaders, splitters, embedders, retrievers, generators, evaluators — are typed Python classes with declared input and output sockets; pipelines connect those sockets into a directed graph that the framework validates at construction time.
The package is the canonical install name since the 2.x rewrite. The older 1.x line lives on PyPI as farm-haystack and is in maintenance only — the architectures are not source-compatible.
Reach for haystack-ai when you want explicit DAG-style pipelines that serialise cleanly to YAML and deploy as REST endpoints (via hayhooks). Reach for langchain if you prefer the LCEL | pipe DSL, or llama-index if your workload is dominated by indexing and retrieval rather than full agent orchestration.
Install
pip install haystack-ai
Output: (none — exits 0 on success)
uv add haystack-ai
Output: dependency resolved + added to pyproject.toml
poetry add haystack-ai
Output: updated lockfile + virtualenv install
pip install farm-haystack # legacy v1 line (do not start new projects on this)
Output: installs the deprecated 1.x architecture
Versioning & Python support
- Current line is the
2.xseries. Minor releases land roughly monthly and may add or rename components; pin a tight range for production deployments. - Recent versions support Python 3.9+. Pure-Python core, with component-specific extras pulling in heavier dependencies (
torch,sentence-transformers, vector-DB clients). - The split between
haystack-ai(core) andhaystack-integrations(third-party connectors published under thehaystack_integrations.*namespace) is the key architectural change in 2.x. Most integrations now ship as their own PyPI packages (chroma-haystack,qdrant-haystack,anthropic-haystack,cohere-haystack, …) rather than as extras onhaystack-aiitself. farm-haystack(the 1.x package) is unmaintained for new features. Security fixes only.
Package metadata
- Maintainer: deepset and community contributors
- Project home: github.com/deepset-ai/haystack
- Integrations monorepo: github.com/deepset-ai/haystack-core-integrations
- Docs: docs.haystack.deepset.ai
- PyPI: pypi.org/project/haystack-ai
- License: Apache-2.0
- Governance: company-led (deepset) with open contributions; commercial deepset Cloud is the hosted offering
- First released:
haystack-aisince the 2.0 release in late 2024;farm-haystackline dates back to 2020 - Downloads: millions per month across both packages, growing share on
haystack-ai
Optional dependencies & extras
The haystack-ai core package keeps its dependency surface deliberately small. Heavyweight features ship as separate integration packages, each installable on its own:
chroma-haystack,qdrant-haystack,weaviate-haystack,pgvector-haystack,pinecone-haystack,elasticsearch-haystack,opensearch-haystack,mongodb-atlas-haystack— document-store integrations.anthropic-haystack,cohere-haystack,mistral-haystack,google-ai-haystack,amazon-bedrock-haystack,nvidia-haystack— generator integrations.sentence-transformers— usually installed directly for local embedding components.fastembed-haystack,instructor-embedders-haystack— alternative embedder packs.ragas-haystack,deepeval-haystack— evaluation glue.hayhooks— REST/MCP serving for Haystack pipelines.haystack-experimental— preview components that may move into core or get removed.
The base install pulls in openai, tenacity, pandas, jinja2, lazy-imports, and posthog among others. The integration packages each pull in their own SDKs.
Alternatives
| Package | Trade-off |
|---|---|
langchain / langchain-core | LCEL pipe-DSL plus a giant ecosystem. Use when you want the broadest integration coverage. |
llama-index | Indexing- and retrieval-first abstractions. Use when RAG is the whole product. |
dspy-ai | Programmatic prompt optimisation with dspy.Module. Use when you want to compile prompts. |
semantic-kernel | Microsoft's planner-and-skill orchestration. Use in .NET-adjacent stacks. |
autogen-agentchat | Multi-agent conversations. Use when the design is agent-to-agent. |
crewai | Role-based agent crews. Use for narrative multi-agent flows. |
farm-haystack (legacy 1.x) | Mature 1.x API. Use only to maintain an existing 1.x deployment. |
Common gotchas
haystack-ai(v2) vsfarm-haystack(v1) is a full rewrite. Component classes, pipeline construction (Pipeline.add_component+Pipeline.connectvs the oldadd_node), and serialisation are all different. v1 tutorials and Stack Overflow answers do not apply to v2.- Integrations live in separate packages. Do not
pip install haystack-ai[chroma](no such extra) — installchroma-haystackand import fromhaystack_integrations.document_stores.chroma. - Pipeline socket types are strict. Connecting a
List[Document]output to an input declared asstrraises aPipelineConnectErrorat wiring time, not runtime. This is a feature, but surprising on first use. farm-haystackis still pip-installable, but it shares thehaystackimport name with v2'shaystacknamespace fromhaystack-ai. Never install both into the same environment.- REST serving lives in
hayhooks, not inhaystack-ai. You install and run it separately to expose a pipeline as an HTTP endpoint. haystack-experimentalis a moving target. Components there may get promoted (and renamed) into core, or removed. Pin the version if you depend on an experimental component.- OpenTelemetry tracing is built in but opt-in. Production deployments that expect traces in their APM need to enable Haystack's tracing module and configure an OTel exporter.
Real-world recipes
The recipes below focus on the install / integration-package choices each pattern requires — the sections/frameworks/haystack companion covers components and the pipeline API in depth.
Minimal RAG pipeline (in-memory store) — uses only haystack-ai core, no integrations. Useful for tests.
from haystack import Pipeline, Document
from haystack.components.embedders import SentenceTransformersTextEmbedder, SentenceTransformersDocumentEmbedder
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack.components.generators import OpenAIGenerator
from haystack.components.builders import PromptBuilder
from haystack.document_stores.in_memory import InMemoryDocumentStore
store = InMemoryDocumentStore()
doc_embedder = SentenceTransformersDocumentEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")
doc_embedder.warm_up()
docs = [Document(content="HNSW is a graph-based ANN algorithm.")]
store.write_documents(doc_embedder.run(documents=docs)["documents"])
p = Pipeline()
p.add_component("text_embedder", SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2"))
p.add_component("retriever", InMemoryEmbeddingRetriever(document_store=store))
p.add_component("prompt", PromptBuilder(template="Answer using:\n{{documents}}\nQ: {{question}}"))
p.add_component("llm", OpenAIGenerator(model="gpt-4o-mini"))
p.connect("text_embedder.embedding", "retriever.query_embedding")
p.connect("retriever.documents", "prompt.documents")
p.connect("prompt.prompt", "llm.prompt")
answer = p.run({"text_embedder": {"text": "What is HNSW?"}, "prompt": {"question": "What is HNSW?"}})
print(answer["llm"]["replies"][0])
Output: a generated answer grounded in the retrieved document; the pipeline graph wires text → embedding → retriever → prompt → LLM with strict socket types
Production RAG with Qdrant document store — requires qdrant-haystack from the integrations namespace.
from haystack_integrations.document_stores.qdrant import QdrantDocumentStore
from haystack_integrations.components.retrievers.qdrant import QdrantEmbeddingRetriever
store = QdrantDocumentStore(
url="http://qdrant.internal:6333",
api_key=os.environ["QDRANT_API_KEY"],
index="kb",
embedding_dim=384,
recreate_index=False,
)
retriever = QdrantEmbeddingRetriever(document_store=store, top_k=10)
Output: the retriever queries Qdrant directly; integration packages live in haystack_integrations.* namespace, separate from haystack-ai core
Hybrid retrieval pipeline (BM25 + embedding + reranker) — branches the pipeline graph and joins via a reranker.
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.components.joiners import DocumentJoiner
from haystack.components.rankers import TransformersSimilarityRanker
p = Pipeline()
p.add_component("text_embedder", SentenceTransformersTextEmbedder(...))
p.add_component("dense_retriever", InMemoryEmbeddingRetriever(document_store=store, top_k=20))
p.add_component("bm25_retriever", InMemoryBM25Retriever(document_store=store, top_k=20))
p.add_component("joiner", DocumentJoiner(join_mode="reciprocal_rank_fusion", top_k=20))
p.add_component("ranker", TransformersSimilarityRanker(model="BAAI/bge-reranker-base", top_k=5))
p.connect("text_embedder.embedding", "dense_retriever.query_embedding")
p.connect("dense_retriever.documents", "joiner.documents")
p.connect("bm25_retriever.documents", "joiner.documents")
p.connect("joiner.documents", "ranker.documents")
Output: dense + BM25 results fused via RRF, then reranked by a cross-encoder; this is the canonical "good retrieval" pipeline
Conversational RAG with chat history — Haystack's ChatGenerator + ChatPromptBuilder carry message history through the graph.
from haystack.components.generators.chat import OpenAIChatGenerator
from haystack.components.builders import ChatPromptBuilder
from haystack.dataclasses import ChatMessage
prompt_builder = ChatPromptBuilder(template=[
ChatMessage.from_system("You answer using only the provided documents."),
ChatMessage.from_user("Documents:\n{{documents}}\n\nQ: {{question}}"),
])
chat_llm = OpenAIChatGenerator(model="gpt-4o")
Output: a chat-shaped LLM call; pair with a memory component (e.g. from haystack-experimental) to persist history across turns
Serialise and reload a pipeline — Pipeline.dumps() and Pipeline.loads() round-trip to YAML, including component arguments. Used by hayhooks to deploy pipelines as REST endpoints.
yaml_str = p.dumps()
with open("rag.yaml", "w") as f: f.write(yaml_str)
# Later, in another process:
restored = Pipeline.loads(open("rag.yaml").read())
Output: a YAML representation that captures the graph structure and every component's config; secrets must be supplied separately at load time
Production deployment
Haystack is library code — production deployment is a service that hosts your pipeline, typically built on FastAPI or via the official hayhooks REST/MCP wrapper.
Topology checklist:
| Concern | Approach |
|---|---|
| Pipeline definition | Python code in version control, or YAML next to code |
| Serving | hayhooks (REST/MCP) or your own FastAPI wrapper |
| Document store | external service (Qdrant, Weaviate, Chroma, Elastic, OpenSearch, pgvector) |
| Embeddings | local (sentence-transformers) or remote (openai, cohere, voyageai) via component packs |
| Secrets | env vars surfaced via Secret.from_env_var(...) |
| Tracing | OpenTelemetry exporter via haystack.tracing |
| Eval | ragas-haystack or deepeval-haystack integration |
hayhooks for REST exposure. hayhooks runs Haystack pipelines as HTTP endpoints, with OpenAPI schemas derived from pipeline input/output sockets. Install separately:
pip install hayhooks
hayhooks deploy rag.yaml --name rag-v1
# POST http://localhost:8001/rag-v1/run with {"query": "..."}
Output: the pipeline is exposed at a REST endpoint with auto-generated OpenAPI; same wrapper exposes pipelines as MCP servers for agentic use
Secrets handling. Components that need API keys accept Secret objects, not raw strings. Read from env at runtime so secrets do not land in YAML:
from haystack.utils import Secret
generator = OpenAIGenerator(api_key=Secret.from_env_var("OPENAI_API_KEY"))
Output: the API key is fetched at run time from the env; YAML serialisations record only the env var name
Document store choice. The InMemoryDocumentStore is for tests. Production stores (one integration package each):
| Store | Package | When |
|---|---|---|
| Qdrant | qdrant-haystack | Rust-backed; rich filtering |
| Weaviate | weaviate-haystack | hybrid + generative modules |
| Chroma | chroma-haystack | embedded prototypes / small prod |
| Elasticsearch | elasticsearch-haystack | mature BM25 with vector add-on |
| OpenSearch | opensearch-haystack | AWS-native ES fork |
| pgvector | pgvector-haystack | one less moving part if Postgres is already there |
| Pinecone | pinecone-haystack | hosted only |
| MongoDB Atlas | mongodb-atlas-haystack | if MongoDB is already your store |
Each integration package is installable on its own — no haystack-ai[chroma] shorthand.
OpenTelemetry tracing. Enable in code (or env) and configure an OTel exporter:
from haystack.tracing import enable_tracing
enable_tracing()
Output: each pipeline run emits a trace span with per-component child spans; configure OTEL_* env vars for the exporter
Version migration guide
The farm-haystack (v1) to haystack-ai (v2) split is the largest migration in the project's history. The two are not source-compatible.
v1 → v2 checklist:
- Package name:
farm-haystack→haystack-ai. Never install both into the same environment — they share thehaystackimport namespace. - Pipeline API:
Pipeline.add_node(name=..., component=..., inputs=[...])(string-based wiring) →Pipeline.add_component(name, component)+Pipeline.connect("a.out_socket", "b.in_socket")(typed wiring). - Component classes:
EmbeddingRetriever,BM25Retriever,FARMReader, etc., from v1 are replaced by*EmbeddingRetriever,*BM25Retriever, generator components, etc. Names changed; the architecture is different. - Document stores: v1 in-process Stores are gone. Use external stores via integration packages.
- REST API: v1 had built-in REST; v2 uses the separate
hayhookspackage. - YAML schema: v1 and v2 YAML are not compatible. Regenerate.
v2.x minor-to-minor. Releases land roughly monthly. Most changes are additive, but:
- Components in
haystack-experimentalmay be promoted to core (and renamed) or removed. - Component constructor arguments evolve; keep an eye on default-value changes.
- Integration packages have their own release cadences — pin both
haystack-aiand each integration package.
Pinning strategy. A reproducible setup pins each piece:
haystack-ai>=2.5,<2.6
qdrant-haystack>=4.0,<5.0
sentence-transformers>=3.0,<4.0
The integration-package version typically tracks haystack-ai minor; check each package's compatibility note.
Ecosystem integrations
Haystack 2.x's design philosophy is core minimal, integrations external. The integrations monorepo (haystack-core-integrations) ships dozens of packages:
Document stores: qdrant-haystack, weaviate-haystack, chroma-haystack, pgvector-haystack, pinecone-haystack, elasticsearch-haystack, opensearch-haystack, mongodb-atlas-haystack, astra-haystack, mariadb-haystack.
Generators / chat: anthropic-haystack, cohere-haystack, mistral-haystack, google-ai-haystack, amazon-bedrock-haystack, nvidia-haystack, ollama-haystack, together-haystack, groq-haystack.
Embedders: fastembed-haystack, instructor-embedders-haystack, jina-haystack, voyageai-haystack.
Eval: ragas-haystack, deepeval-haystack.
Tooling: hayhooks (REST/MCP serving), haystack-experimental (preview components).
Imports namespace. Integrations live under haystack_integrations.*:
from haystack_integrations.document_stores.chroma import ChromaDocumentStore
from haystack_integrations.components.generators.anthropic import AnthropicGenerator
Multi-language clients. Haystack 2.x is Python-only. Other-language clients call the hayhooks-deployed REST endpoint.
Troubleshooting common errors
PipelineConnectError: cannot connect 'a.out' to 'b.in'— socket types don't match. Inspect each component'sInput/Outputdeclarations; you may need an adapter component (e.g.DocumentJoiner).ImportError: cannot import name 'X' from 'haystack'— v1 tutorial code on v2. Either installfarm-haystack(legacy) or migrate.No module named 'haystack_integrations'— you didn't install the integration package.pip install qdrant-haystacketc.- Both
farm-haystackandhaystack-aiinstalled — they fight over thehaystacknamespace. Uninstall one (pip uninstall farm-haystack haystack-ai) and reinstall. Pipeline.run(...)raises on missing input — every input socket without a connection or default must be supplied in the run dict. Inspect withPipeline.inputs().- Pipeline YAML loads but fails on run — secrets weren't supplied. Set the env vars referenced by
Secret.from_env_var(...)beforePipeline.loads(...). SentenceTransformersTextEmbedderraises on first use — callembedder.warm_up()once to load the model.
Performance tuning
| Lever | Mechanism | When it helps |
|---|---|---|
Component warm_up() | preload models | reuse-once vs cold-start cost |
| Pipeline reuse across requests | construct once at startup | every web request |
| Async components (where available) | non-blocking I/O | concurrent users |
| Batch embedders | amortise model load | bulk indexing |
| External vector store | offload retrieval | scale beyond in-memory |
| Streaming generators | progressive output | UX latency |
| OpenAI / Anthropic prompt caching | reuse system prompts | repeated calls |
Streaming. OpenAIGenerator and OpenAIChatGenerator support streaming_callback= for token-by-token responses. The callback runs in the LLM-call thread; keep it cheap.
Pipeline reuse. Pipeline objects are designed to be reused. Construct once at app startup; share across requests. Component state (embedded models, vector-store connections) lives for the process lifetime.
Heavy-component startup cost. SentenceTransformersTextEmbedder loads a model on first use unless you call warm_up(). In a web server, call warm_up() during startup to avoid paying for the first request.
Security considerations
Haystack's surface area depends on which components you wire — most security concerns inherit from the underlying SDKs (OpenAI, Anthropic, Qdrant, etc.).
- Secrets handling. Components accept
Secretobjects, not raw strings. UseSecret.from_env_var(...)so YAML serialisations never contain plaintext keys. - Pipeline YAML in version control. YAML defines structure and component types — safe to commit. Secrets are referenced by env var name only.
- Prompt injection via documents. Retrieved documents are interpolated into prompt templates. A document with
"Ignore the above and..."can hijack the generator. Use prompt templates that fence document content (e.g. inside XML tags) and trim suspicious content. - Generator output as code. If your pipeline feeds LLM output to a code interpreter or shell, treat it as untrusted input. Use sandboxes (
vercel-sandbox,e2b, microVMs). - Component-level traces. OpenTelemetry traces include component inputs and outputs. Configure trace sampling and PII scrubbing before exporting to a third-party APM.
- Document store auth. Pass credentials via
Secret; never bake into YAML. RBAC on the underlying store (Qdrant API keys, Weaviate OIDC, etc.) is your tenant isolation layer. - Self-update of
haystack-experimental— preview components churn. Pin and audit before adopting in production.
When NOT to use this
Haystack 2.x is the right framework when you want explicit DAG-style pipelines with strict typed wiring. It's the wrong tool when:
- You want the LCEL pipe DSL. LangChain's
runnable | runnablesyntax is more concise for linear chains. - Your workload is indexing-heavy. LlamaIndex has stronger indexing primitives (composable indexes, sub-question decomposition).
- You need a massive integration ecosystem. LangChain still has the broadest provider coverage.
- You're a one-component user. If you only need a retriever + LLM call, the SDKs directly (
openai+qdrant-client) are 20 lines and no framework. - You want hosted-only, no Python.
deepset Cloudis the managed offering; otherwise you're hosting Python. - Multi-agent narrative flows.
crewai,autogen, orlanggraphmodel agent-to-agent conversations more directly.
See also
- Frameworks: Haystack — components, pipelines, RAG patterns
- Concept: RAG — retrieval-augmented generation patterns
- Concept: agents — agent orchestration patterns