cheat sheet
chromadb
Package-level reference for chromadb on PyPI — install variants, server/client split, embedding-function extras, and alternative vector stores.
chromadb
What it is
chromadb is the Python distribution of Chroma, an open-source vector database for AI applications. The package ships the embedded in-process engine, the persistent SQLite/DuckDB-backed store, an HTTP client/server, and a small library of pluggable embedding functions in a single import. The same chromadb.Client API works whether you are running in-memory for a notebook prototype or pointing at a remote Chroma server cluster.
Reach for chromadb when you want zero-infrastructure RAG storage in Python and are happy to scale up later. Reach for qdrant-client, weaviate-client, or hosted services like Pinecone when you need strict multi-tenant isolation, advanced filtering, or production-grade clustering from day one.
Install
pip install chromadb
Output: (none — exits 0 on success)
uv add chromadb
Output: dependency resolved + added to pyproject.toml
poetry add chromadb
Output: updated lockfile + virtualenv install
pip install chromadb-client # thin HTTP-only client (no server, no embedding deps)
Output: installs the slim client that talks to a remote chroma run server
Versioning & Python support
- Chroma releases are pre-
1.0and move quickly — the0.4.x → 0.5.xjump in 2024 changed the on-disk persistent-store layout and required a migration script. Always read the changelog before bumping. - Recent versions support Python 3.8+ on Linux, macOS, and Windows. Wheels are published for common architectures; building from source needs a C++ toolchain because of the HNSW index code.
- Client/server version skew matters. If you run the Chroma server in Docker, the
chromadb(orchromadb-client) version in your application should track the server's minor version. Cross-minor combinations sometimes work and sometimes fail with opaque protocol errors — pinning both to the same minor is the safe path. - Roadmap targets a
1.0once the storage format and tenant model stabilise; until then treat every minor as a potential breaking release in CI.
Package metadata
- Maintainer: Chroma (the company, formerly Chroma Inc.) and community contributors
- Project home: github.com/chroma-core/chroma
- Docs: docs.trychroma.com
- PyPI: pypi.org/project/chromadb
- License: Apache-2.0
- Governance: company-led with open contributions; commercial Chroma Cloud offering tracks the open core
- First released: 2022
- Downloads: multiple million per month on PyPI; the default vector store in many LangChain and LlamaIndex tutorials
Optional dependencies & extras
chromadb ships as one PyPI package that bundles the server, the local persistent store, and the in-process engine. There are no published feature extras in the usual chromadb[xxx] form — instead, optional functionality lives in companion packages and in the small built-in chromadb.utils.embedding_functions module, which loads its own extras lazily.
Common companions to install alongside:
chromadb-client— slim HTTP-only client (no server, no embedding deps). Use it in lightweight containers that only talk to a remotechroma runserver.sentence-transformers— local CPU/GPU embeddings via the built-inSentenceTransformerEmbeddingFunction.openai— required if you use the built-inOpenAIEmbeddingFunction.cohere,google-generativeai,voyageai— each backs the corresponding built-in embedding function.onnxruntime— used by the bundled default MiniLM embedding model.tiktoken— token counting when you mix Chroma with OpenAI chat models.langchain-chromaorllama-index-vector-stores-chroma— framework adapters in the LangChain / LlamaIndex ecosystems.
Alternatives
| Package | Trade-off |
|---|---|
qdrant-client | Rust-backed Qdrant server with rich payload filtering and gRPC support. Use when you want a stronger production story than embedded Chroma. |
weaviate-client | Schema-first, GraphQL-style queries, hybrid search out of the box. Use for hybrid (vector + BM25) workloads. |
pymilvus | Milvus client. Use when you need very large-scale clustered vector storage. |
pinecone-client | Fully-hosted SaaS — no self-hosting required. Use when you want to outsource ops. |
lancedb | Embedded columnar vector DB on Lance/Arrow. Use when your data is already columnar and you want zero-copy queries. |
faiss-cpu / faiss-gpu | Library, not a database — raw ANN indexes. Use when you only need similarity search, not metadata storage. |
Common gotchas
0.4.x → 0.5.xpersistent-store migration. The on-disk format changed; existingchroma.sqlite3stores need the maintainer-provided migration tool. Snapshot the directory before upgrading.- Collection-API churn.
Client.persist(),Client(persistence_dir=...), andPersistentClient(path=...)have been the recommended entrypoints at different times. Pin a version and copy-paste from that version's docs, not Stack Overflow. - Default embedding function downloads a model on first use. The bundled MiniLM ONNX model is fetched from a CDN — in air-gapped environments, pass an explicit
embedding_function=or pre-cache the model. - HNSW index parameters are set at collection creation.
hnsw:space,hnsw:M, andhnsw:construction_efcannot be retroactively changed without rebuilding the collection. Decide on cosine vs L2 distance up front. - Tenancy model is recent and still maturing. Tenants and databases inside a single Chroma server are usable but underdocumented; production multi-tenant designs should test isolation carefully.
- Client/server version mismatch is silent. A
chromadb-clientfrom 2024 talking to a 2026 server may appear to work foraddbut fail on a newer query parameter. Match minor versions. - In-memory
Client()does not persist. CallingClient()with no path gives you an ephemeral store that vanishes on process exit. UsePersistentClient(path=...)or run the HTTP server for durability.
Real-world recipes
The recipes below are package-level vignettes — they focus on the install footprint and the client/server topology each pattern requires, rather than re-teaching collection methods (the companion sections/ai/chromadb covers the API surface).
Persistent local store for a single-process app — the smallest possible Chroma deployment. PersistentClient writes to a directory of SQLite + parquet shards; restarting the process re-opens the same data.
import chromadb
from chromadb.utils.embedding_functions import SentenceTransformerEmbeddingFunction
client = chromadb.PersistentClient(path=".chroma")
emb = SentenceTransformerEmbeddingFunction(model_name="all-MiniLM-L6-v2")
docs = client.get_or_create_collection("kb", embedding_function=emb)
docs.upsert(
ids=["a", "b"],
documents=["Chroma is an embedded vector DB.", "Pinecone is a hosted SaaS."],
metadatas=[{"source": "intro"}, {"source": "intro"}],
)
print(docs.query(query_texts=["embedded database"], n_results=1))
Output: the closest match with its id, distance, and metadata; the .chroma/ directory holds chroma.sqlite3 plus per-collection parquet segments
Client-server split for multi-process serving — run chroma run --path .chroma --host 0.0.0.0 --port 8000 in one container, then point every app at it with the slim client. This is the only safe way to share a Chroma store across workers.
import chromadb
client = chromadb.HttpClient(host="chroma.internal", port=8000)
docs = client.get_collection("kb")
print(docs.count())
Output: count of documents in the remote collection; the worker image only needs chromadb-client (~5 MB) — no ONNX, no sentence-transformers, no server dependencies
Custom embedding function for an in-house model — Chroma's embedding-function interface is a duck-typed callable. Implement __call__(self, input: list[str]) -> list[list[float]] and pass it as embedding_function=.
import chromadb
from chromadb import Documents, EmbeddingFunction, Embeddings
class MyEmbedder(EmbeddingFunction):
def __init__(self, model):
self.model = model
def __call__(self, input: Documents) -> Embeddings:
return self.model.encode(input).tolist()
client = chromadb.PersistentClient(path=".chroma")
col = client.get_or_create_collection(
"docs",
embedding_function=MyEmbedder(my_local_model),
metadata={"hnsw:space": "cosine"},
)
Output: the collection materialises with a custom embedder; metadata pins the distance function at creation time (cannot be changed retroactively)
Hybrid filter + vector query — Chroma supports a structured where= filter over metadata and a where_document= substring/regex filter, applied as a prefilter to the ANN search.
results = col.query(
query_texts=["how does HNSW work"],
n_results=5,
where={"section": {"$in": ["intro", "tuning"]}},
where_document={"$contains": "HNSW"},
)
Output: the top-5 hits restricted to documents whose metadata section is intro or tuning AND whose body literally contains the substring HNSW
Multi-tenant via collection-per-tenant — Chroma's tenants/databases primitive is still maturing; a robust pattern today is one collection per tenant with a name prefix and shared embedding function.
def get_tenant_collection(tenant_id: str):
return client.get_or_create_collection(
name=f"tenant_{tenant_id}",
embedding_function=emb,
)
Output: every tenant's data lives in its own collection, with hard isolation at the storage layer — no risk of a faulty where clause leaking rows across tenants
Production deployment
The two production topologies are embedded persistent (single process owns the directory) and client-server (the chroma run HTTP server fronts a shared volume). Pick early — the on-disk format is the same but the failure modes are not.
Topology checklist:
| Concern | Embedded PersistentClient | Client-server (chroma run) |
|---|---|---|
| Concurrency | one writer only — file lock contention if you fork workers | many readers + writers via HTTP |
| Container image | full chromadb (~150 MB) | apps use chromadb-client (~5 MB), server image runs separately |
| Backups | snapshot the directory while idle | snapshot the volume; coordinate with server quiesce |
| Auth | none — process-local | static API key (via env var) or proxy fronting the server |
| Telemetry | sends usage pings unless disabled | same — set CHROMA_TELEMETRY=False |
| Failure mode if disk full | hangs on SQLite write | server returns 500, client retries |
Pinning client and server. When you run the server in Docker (chromadb/chroma:0.5.x), pin the application's chromadb-client to the same minor. Cross-minor combinations silently fail on newer query parameters (include=["embeddings"] was added late in 0.4.x; older clients ignore it).
Telemetry opt-out — Chroma sends anonymous PostHog pings on startup. Disable per-process with CHROMA_TELEMETRY=False, or in code via Settings(anonymized_telemetry=False). Required for many compliance reviews.
Backups. The persistent directory is roughly safe to tar while the process is idle. For zero-downtime backups, run chroma run with a filesystem that supports atomic snapshots (ZFS, btrfs, LVM, EBS snapshot) and snapshot the volume rather than copying the live directory.
Multi-tenancy strategy. Two paths exist; both work in production today:
- Collection-per-tenant — strong isolation; tenant deletion is a single
delete_collection. Limit: ~thousands of collections per server before metadata lookups slow down. - Filter-per-tenant — one shared collection with
tenant_idin metadata, queried viawhere={"tenant_id": "..."}. Cheaper at scale, but a missingwhereclause leaks rows. Add an assertion in your query wrapper.
The newer tenants/databases primitive (client.create_tenant(...)) is still maturing — test isolation explicitly before relying on it for regulated workloads.
Index tuning & retrieval quality
Chroma uses HNSW (Hierarchical Navigable Small World) under the hood. The three parameters worth knowing are hnsw:space (distance metric), hnsw:M (graph connectivity), and hnsw:construction_ef (build-time candidate pool). All three are set at collection creation time and cannot be changed without rebuilding.
col = client.create_collection(
name="tuned_kb",
embedding_function=emb,
metadata={
"hnsw:space": "cosine", # cosine | l2 | ip
"hnsw:M": 32, # default 16; higher = better recall, more RAM
"hnsw:construction_ef": 200, # default 100; higher = better index, slower build
"hnsw:search_ef": 100, # default 10; raise at query time for higher recall
},
)
Output: a collection whose HNSW index is built with a larger candidate pool and higher graph degree — measurably better recall at p95 latency cost
Trade-off table:
| Parameter | Low value | High value | When to raise |
|---|---|---|---|
M | 8–16 | 48–64 | corpora over ~1M vectors; recall plateau |
construction_ef | 100 | 400+ | quality matters more than build time |
search_ef | 10 | 100–500 | tuned per-query; p95 latency target |
Distance metric choice. For most modern sentence embeddings (MiniLM, BGE, OpenAI text-embedding-3-*), cosine is the right default — the embeddings are already unit-normalised in spirit. Use ip (inner product) only with explicitly unnormalised embeddings, and l2 only when the embedding model recommends it.
Hybrid filter + vector. Chroma applies metadata where= and document where_document= filters as prefilters — the ANN search runs over the filtered subset. This is fast when filters are selective (the search index is restricted to a small candidate set) and slow when filters match most of the corpus (the prefilter scan dominates).
Reranking pattern. Chroma does not ship a built-in reranker. The standard pattern is to over-retrieve (n_results=50), then rerank in your application with a cross-encoder (sentence-transformers/ms-marco-MiniLM-L-6-v2 or Cohere Rerank), and keep the top 5–10.
Version migration guide
The 0.4.x → 0.5.x boundary is the largest on-disk change to date. Lesser bumps within 0.5.x also rename APIs.
0.4.x → 0.5.x checklist:
- On-disk persistent-store schema changed. Existing
chroma.sqlite3stores need the maintainer-provided migration script. Snapshot the directory first. Client(persist_directory=...)removed. UsePersistentClient(path=...)for embedded persistence, orHttpClient(host=..., port=...)for the server.Client.persist()removed. Persistent clients write through automatically; the no-op call was misleading and is gone.Settings(chroma_db_impl=...)removed. Backend is implicit from which client class you instantiate.- Telemetry env var is
CHROMA_TELEMETRY=False; olderANONYMIZED_TELEMETRYno longer applies.
0.5.x minor-to-minor:
- Tenants/databases primitive evolved across 0.5 minors —
client.create_tenant(...)andclient.create_database(...)signatures shifted. Pin if you depend on the new isolation model. get_or_create_collectionmetadata validation tightened — invalidhnsw:*keys now raise instead of being silently dropped.include=parameter onquery()added options ("embeddings","distances","metadatas","documents","data"); older clients send unrecognised values and 0.5 servers may reject them.
Client/server pinning. Always run the same minor on both sides. The Docker image tag (chromadb/chroma:0.5.x) and the chromadb-client minor must match — cross-minor combinations sometimes work for add()/get() and silently fail on newer query parameters.
The roadmap targets a 1.0 once tenancy and the on-disk format stabilise; treat every pre-1.0 minor as a potential breaking release in CI.
Troubleshooting common errors
The error catalogue below is what trips up new users most often. Most are environmental rather than code bugs.
DuplicateIDErroronadd(...)— the ID already exists in the collection. Switch toupsert(...), which inserts or updates.InvalidCollectionException— collection name doesn't exist or was deleted. Useget_or_create_collectionfor idempotent code.ValueError: Expected metadata to be a non-empty dict— Chroma rejects empty metadatas dicts. Either omitmetadatas=entirely or pass at least one key per row.ConnectionErroragainstchroma run— the server is not listening on the expected port, or the firewall is blocking it.curl http://chroma:8000/api/v1/heartbeatshould return{"nanosecond heartbeat": ...}.Could not connect to tenantwith HttpClient — the server is on an older minor that doesn't know about tenants. Either upgrade the server or instantiate withouttenant=.- Default embedder downloads ONNX model on first run — air-gapped environments need the model pre-cached, or use a custom
embedding_function=. Symptom: hang on firstadd()while a CDN times out. OperationalError: database is locked— twoPersistentClientinstances opened the same directory. Embedded mode is single-writer; move tochroma runor coordinate access.- Cross-version protocol error —
chromadb-clientfrom 2024 hitting a 2026 server (or vice versa) fails with opaque400responses on newer query parameters. Match minors.
Performance tuning
Chroma's performance levers are about telling the engine what to skip more than telling it to go faster. The HNSW index handles vector search; everything else (filters, payload returns, embedding-function calls) is in your control.
| Lever | Mechanism | When it helps |
|---|---|---|
HttpClient for shared workloads | server-process serialises writes | multi-worker apps |
Higher hnsw:M and hnsw:construction_ef | better index | recall-bound queries |
hnsw:search_ef per query | runtime quality knob | latency-budget tuning |
Pre-filter with where= | shrink the search set | selective metadata predicates |
include= query parameter | drop unused fields | reduce network payload |
Batch add/upsert | amortise round-trips | bulk ingestion |
Custom embedding_function | bypass default ONNX download | air-gapped / faster embedders |
chromadb-client slim package | skip server deps in app images | smaller worker images |
Bulk ingestion pattern. Batch documents into upsert(...) calls of a few hundred at a time. Smaller batches are network-overhead-bound; larger batches occupy the server's index build budget and stall reads.
def chunks(iterable, n):
buf = []
for item in iterable:
buf.append(item)
if len(buf) >= n:
yield buf
buf = []
if buf:
yield buf
for batch in chunks(iter_rows(), 500):
col.upsert(
ids=[r["id"] for r in batch],
documents=[r["text"] for r in batch],
metadatas=[r["meta"] for r in batch],
)
Output: streams documents into the collection in 500-row batches; the server returns once each batch is persisted
Query-time search_ef. Set per collection at creation, or tune per query if your version supports it. Higher search_ef improves recall linearly with query latency — production deployments set this from a p95 latency budget rather than a fixed default.
Avoid the default embedder in production. The bundled MiniLM-via-ONNX function downloads a model on first call and re-loads it per process. Pass an explicit embedding_function= that uses a long-lived embedding service (OpenAI, Cohere, local sentence-transformers) — both faster and easier to upgrade.
Embeddings & chunking strategy
Chroma stores vectors and metadata; it does not produce embeddings (the bundled MiniLM ONNX function is a default for convenience, not a recommendation). The embedding choice dominates retrieval quality far more than HNSW tuning, so it deserves explicit attention.
Embedding-model choice. A short-list current at writing:
| Model | Dimensions | Cost | Strengths |
|---|---|---|---|
all-MiniLM-L6-v2 | 384 | local CPU/GPU | small, fast, the Chroma default |
BAAI/bge-small-en-v1.5 | 384 | local | better on MTEB than MiniLM at same dim |
BAAI/bge-large-en-v1.5 | 1024 | local GPU | strong open-weight; needs GPU at scale |
text-embedding-3-small | 1536 (or shorter via Matryoshka) | OpenAI API | fast, accurate, hosted |
text-embedding-3-large | 3072 | OpenAI API | best-in-class accuracy, dimensions matter for storage |
voyage-3-large | 1024+ | Voyage AI API | strong on retrieval benchmarks |
embed-multilingual-v3 | 1024 | Cohere API | non-English content |
Higher-dimensional embeddings improve recall but cost storage and RAM linearly. For corpora over ~1M chunks, prefer dim ≤ 1024 with quantization, or a Matryoshka model trimmed to a shorter dimension.
Chunking decisions live upstream. Chroma stores whatever chunks you give it; bad chunking can't be fixed at retrieval time. The standard heuristics:
- 300–500 character chunks for narrow factual questions.
- 800–1500 character chunks for general RAG with modern long-context LMs.
- ~100 character overlap to avoid edge-case context truncation.
- Respect document structure (titles, sections) —
unstructured'schunk_by_titleis a sensible default.
HyDE / query rewriting. When user queries are short and noisy, embedding them directly gives weak retrieval. The Hypothetical Document Embeddings pattern (HyDE) asks an LM to generate a plausible answer, embeds that, and queries Chroma with the synthetic embedding — often better than embedding the raw query.
Parent-document retrieval. Embed small chunks for precision; return the parent document (or a wider window) to the LM for context. Store parent_id in metadata and look it up after retrieval.
Security considerations
Chroma's default setup has minimal security — appropriate for embedded use, dangerous for a network-exposed server.
- No auth by default on
chroma run. Place behind an authenticated reverse proxy (nginx with basic auth, an API gateway, or a VPC) before exposing to a network. Recent versions support a static API key viaCHROMA_SERVER_AUTHN_PROVIDER/CHROMA_SERVER_AUTHN_CREDENTIALS; verify the version's docs. - TLS not built in — the HTTP server speaks plaintext. Terminate TLS at a proxy.
- Telemetry phones home unless disabled with
CHROMA_TELEMETRY=False. Required for many compliance reviews. - Multi-tenant leakage risk under filter-per-tenant. A missing
where={"tenant_id": ...}clause returns rows across tenants. Either wrap every query in code that enforces the filter, or use collection-per-tenant for hard isolation. - Prompt injection via retrieved content. Documents in Chroma are returned verbatim to the LM. A malicious uploaded document can contain instructions the LM follows ("ignore previous instructions and..."). Validate upload provenance; consider a sanitisation pass on retrieved content before prompt assembly.
- PII in metadata. Metadata is returned in every query response; don't store anything you wouldn't want in logs.
- Backup security. The persistent directory is plaintext SQLite + parquet. Encrypt at rest at the volume layer (LUKS, EBS encryption, etc.).
When NOT to use this
Chroma is the easiest vector DB to start with; it stops being the right tool at a clear scale boundary.
- Corpora over ~10 million vectors with strict p95 latency. Qdrant or Milvus have more mature distributed stories. Chroma works but tuning gets painful.
- Hybrid (BM25 + vector) is a core requirement. Weaviate ships hybrid out of the box; Chroma needs application-side BM25 (e.g.
rank-bm25+ post-merge), which is more code. - Strict multi-tenant isolation with thousands of tenants. Collection-per-tenant slows down past low thousands; filter-per-tenant is leaky if a query forgets the
whereclause. Postgres-with-pgvector or a hosted service may fit better. - You want a fully-hosted, managed service. Pinecone, Weaviate Cloud, and Qdrant Cloud all front their own engines. Chroma Cloud exists but is younger.
- Sparse-vector workloads (e.g. SPLADE). Qdrant and Weaviate have first-class sparse support; Chroma is dense-only at this writing.
See also
- AI: chromadb — collections, queries, embedding functions, framework integration
- Concept: RAG — retrieval-augmented generation patterns
- Concept: API — REST design fundamentals