cheat sheet

chromadb

Package-level reference for chromadb on PyPI — install variants, server/client split, embedding-function extras, and alternative vector stores.

chromadb

What it is

chromadb is the Python distribution of Chroma, an open-source vector database for AI applications. The package ships the embedded in-process engine, the persistent SQLite/DuckDB-backed store, an HTTP client/server, and a small library of pluggable embedding functions in a single import. The same chromadb.Client API works whether you are running in-memory for a notebook prototype or pointing at a remote Chroma server cluster.

Reach for chromadb when you want zero-infrastructure RAG storage in Python and are happy to scale up later. Reach for qdrant-client, weaviate-client, or hosted services like Pinecone when you need strict multi-tenant isolation, advanced filtering, or production-grade clustering from day one.

Install

bash
pip install chromadb

Output: (none — exits 0 on success)

bash
uv add chromadb

Output: dependency resolved + added to pyproject.toml

bash
poetry add chromadb

Output: updated lockfile + virtualenv install

bash
pip install chromadb-client      # thin HTTP-only client (no server, no embedding deps)

Output: installs the slim client that talks to a remote chroma run server

Versioning & Python support

  • Chroma releases are pre-1.0 and move quickly — the 0.4.x → 0.5.x jump in 2024 changed the on-disk persistent-store layout and required a migration script. Always read the changelog before bumping.
  • Recent versions support Python 3.8+ on Linux, macOS, and Windows. Wheels are published for common architectures; building from source needs a C++ toolchain because of the HNSW index code.
  • Client/server version skew matters. If you run the Chroma server in Docker, the chromadb (or chromadb-client) version in your application should track the server's minor version. Cross-minor combinations sometimes work and sometimes fail with opaque protocol errors — pinning both to the same minor is the safe path.
  • Roadmap targets a 1.0 once the storage format and tenant model stabilise; until then treat every minor as a potential breaking release in CI.

Package metadata

  • Maintainer: Chroma (the company, formerly Chroma Inc.) and community contributors
  • Project home: github.com/chroma-core/chroma
  • Docs: docs.trychroma.com
  • PyPI: pypi.org/project/chromadb
  • License: Apache-2.0
  • Governance: company-led with open contributions; commercial Chroma Cloud offering tracks the open core
  • First released: 2022
  • Downloads: multiple million per month on PyPI; the default vector store in many LangChain and LlamaIndex tutorials

Optional dependencies & extras

chromadb ships as one PyPI package that bundles the server, the local persistent store, and the in-process engine. There are no published feature extras in the usual chromadb[xxx] form — instead, optional functionality lives in companion packages and in the small built-in chromadb.utils.embedding_functions module, which loads its own extras lazily.

Common companions to install alongside:

  • chromadb-client — slim HTTP-only client (no server, no embedding deps). Use it in lightweight containers that only talk to a remote chroma run server.
  • sentence-transformers — local CPU/GPU embeddings via the built-in SentenceTransformerEmbeddingFunction.
  • openai — required if you use the built-in OpenAIEmbeddingFunction.
  • cohere, google-generativeai, voyageai — each backs the corresponding built-in embedding function.
  • onnxruntime — used by the bundled default MiniLM embedding model.
  • tiktoken — token counting when you mix Chroma with OpenAI chat models.
  • langchain-chroma or llama-index-vector-stores-chroma — framework adapters in the LangChain / LlamaIndex ecosystems.

Alternatives

PackageTrade-off
qdrant-clientRust-backed Qdrant server with rich payload filtering and gRPC support. Use when you want a stronger production story than embedded Chroma.
weaviate-clientSchema-first, GraphQL-style queries, hybrid search out of the box. Use for hybrid (vector + BM25) workloads.
pymilvusMilvus client. Use when you need very large-scale clustered vector storage.
pinecone-clientFully-hosted SaaS — no self-hosting required. Use when you want to outsource ops.
lancedbEmbedded columnar vector DB on Lance/Arrow. Use when your data is already columnar and you want zero-copy queries.
faiss-cpu / faiss-gpuLibrary, not a database — raw ANN indexes. Use when you only need similarity search, not metadata storage.

Common gotchas

  1. 0.4.x → 0.5.x persistent-store migration. The on-disk format changed; existing chroma.sqlite3 stores need the maintainer-provided migration tool. Snapshot the directory before upgrading.
  2. Collection-API churn. Client.persist(), Client(persistence_dir=...), and PersistentClient(path=...) have been the recommended entrypoints at different times. Pin a version and copy-paste from that version's docs, not Stack Overflow.
  3. Default embedding function downloads a model on first use. The bundled MiniLM ONNX model is fetched from a CDN — in air-gapped environments, pass an explicit embedding_function= or pre-cache the model.
  4. HNSW index parameters are set at collection creation. hnsw:space, hnsw:M, and hnsw:construction_ef cannot be retroactively changed without rebuilding the collection. Decide on cosine vs L2 distance up front.
  5. Tenancy model is recent and still maturing. Tenants and databases inside a single Chroma server are usable but underdocumented; production multi-tenant designs should test isolation carefully.
  6. Client/server version mismatch is silent. A chromadb-client from 2024 talking to a 2026 server may appear to work for add but fail on a newer query parameter. Match minor versions.
  7. In-memory Client() does not persist. Calling Client() with no path gives you an ephemeral store that vanishes on process exit. Use PersistentClient(path=...) or run the HTTP server for durability.

Real-world recipes

The recipes below are package-level vignettes — they focus on the install footprint and the client/server topology each pattern requires, rather than re-teaching collection methods (the companion sections/ai/chromadb covers the API surface).

Persistent local store for a single-process app — the smallest possible Chroma deployment. PersistentClient writes to a directory of SQLite + parquet shards; restarting the process re-opens the same data.

python
import chromadb
from chromadb.utils.embedding_functions import SentenceTransformerEmbeddingFunction

client = chromadb.PersistentClient(path=".chroma")
emb = SentenceTransformerEmbeddingFunction(model_name="all-MiniLM-L6-v2")
docs = client.get_or_create_collection("kb", embedding_function=emb)
docs.upsert(
    ids=["a", "b"],
    documents=["Chroma is an embedded vector DB.", "Pinecone is a hosted SaaS."],
    metadatas=[{"source": "intro"}, {"source": "intro"}],
)
print(docs.query(query_texts=["embedded database"], n_results=1))

Output: the closest match with its id, distance, and metadata; the .chroma/ directory holds chroma.sqlite3 plus per-collection parquet segments

Client-server split for multi-process serving — run chroma run --path .chroma --host 0.0.0.0 --port 8000 in one container, then point every app at it with the slim client. This is the only safe way to share a Chroma store across workers.

python
import chromadb

client = chromadb.HttpClient(host="chroma.internal", port=8000)
docs = client.get_collection("kb")
print(docs.count())

Output: count of documents in the remote collection; the worker image only needs chromadb-client (~5 MB) — no ONNX, no sentence-transformers, no server dependencies

Custom embedding function for an in-house model — Chroma's embedding-function interface is a duck-typed callable. Implement __call__(self, input: list[str]) -> list[list[float]] and pass it as embedding_function=.

python
import chromadb
from chromadb import Documents, EmbeddingFunction, Embeddings

class MyEmbedder(EmbeddingFunction):
    def __init__(self, model):
        self.model = model
    def __call__(self, input: Documents) -> Embeddings:
        return self.model.encode(input).tolist()

client = chromadb.PersistentClient(path=".chroma")
col = client.get_or_create_collection(
    "docs",
    embedding_function=MyEmbedder(my_local_model),
    metadata={"hnsw:space": "cosine"},
)

Output: the collection materialises with a custom embedder; metadata pins the distance function at creation time (cannot be changed retroactively)

Hybrid filter + vector query — Chroma supports a structured where= filter over metadata and a where_document= substring/regex filter, applied as a prefilter to the ANN search.

python
results = col.query(
    query_texts=["how does HNSW work"],
    n_results=5,
    where={"section": {"$in": ["intro", "tuning"]}},
    where_document={"$contains": "HNSW"},
)

Output: the top-5 hits restricted to documents whose metadata section is intro or tuning AND whose body literally contains the substring HNSW

Multi-tenant via collection-per-tenant — Chroma's tenants/databases primitive is still maturing; a robust pattern today is one collection per tenant with a name prefix and shared embedding function.

python
def get_tenant_collection(tenant_id: str):
    return client.get_or_create_collection(
        name=f"tenant_{tenant_id}",
        embedding_function=emb,
    )

Output: every tenant's data lives in its own collection, with hard isolation at the storage layer — no risk of a faulty where clause leaking rows across tenants

Production deployment

The two production topologies are embedded persistent (single process owns the directory) and client-server (the chroma run HTTP server fronts a shared volume). Pick early — the on-disk format is the same but the failure modes are not.

Topology checklist:

ConcernEmbedded PersistentClientClient-server (chroma run)
Concurrencyone writer only — file lock contention if you fork workersmany readers + writers via HTTP
Container imagefull chromadb (~150 MB)apps use chromadb-client (~5 MB), server image runs separately
Backupssnapshot the directory while idlesnapshot the volume; coordinate with server quiesce
Authnone — process-localstatic API key (via env var) or proxy fronting the server
Telemetrysends usage pings unless disabledsame — set CHROMA_TELEMETRY=False
Failure mode if disk fullhangs on SQLite writeserver returns 500, client retries

Pinning client and server. When you run the server in Docker (chromadb/chroma:0.5.x), pin the application's chromadb-client to the same minor. Cross-minor combinations silently fail on newer query parameters (include=["embeddings"] was added late in 0.4.x; older clients ignore it).

Telemetry opt-out — Chroma sends anonymous PostHog pings on startup. Disable per-process with CHROMA_TELEMETRY=False, or in code via Settings(anonymized_telemetry=False). Required for many compliance reviews.

Backups. The persistent directory is roughly safe to tar while the process is idle. For zero-downtime backups, run chroma run with a filesystem that supports atomic snapshots (ZFS, btrfs, LVM, EBS snapshot) and snapshot the volume rather than copying the live directory.

Multi-tenancy strategy. Two paths exist; both work in production today:

  1. Collection-per-tenant — strong isolation; tenant deletion is a single delete_collection. Limit: ~thousands of collections per server before metadata lookups slow down.
  2. Filter-per-tenant — one shared collection with tenant_id in metadata, queried via where={"tenant_id": "..."}. Cheaper at scale, but a missing where clause leaks rows. Add an assertion in your query wrapper.

The newer tenants/databases primitive (client.create_tenant(...)) is still maturing — test isolation explicitly before relying on it for regulated workloads.

Index tuning & retrieval quality

Chroma uses HNSW (Hierarchical Navigable Small World) under the hood. The three parameters worth knowing are hnsw:space (distance metric), hnsw:M (graph connectivity), and hnsw:construction_ef (build-time candidate pool). All three are set at collection creation time and cannot be changed without rebuilding.

python
col = client.create_collection(
    name="tuned_kb",
    embedding_function=emb,
    metadata={
        "hnsw:space": "cosine",         # cosine | l2 | ip
        "hnsw:M": 32,                   # default 16; higher = better recall, more RAM
        "hnsw:construction_ef": 200,    # default 100; higher = better index, slower build
        "hnsw:search_ef": 100,          # default 10; raise at query time for higher recall
    },
)

Output: a collection whose HNSW index is built with a larger candidate pool and higher graph degree — measurably better recall at p95 latency cost

Trade-off table:

ParameterLow valueHigh valueWhen to raise
M8–1648–64corpora over ~1M vectors; recall plateau
construction_ef100400+quality matters more than build time
search_ef10100–500tuned per-query; p95 latency target

Distance metric choice. For most modern sentence embeddings (MiniLM, BGE, OpenAI text-embedding-3-*), cosine is the right default — the embeddings are already unit-normalised in spirit. Use ip (inner product) only with explicitly unnormalised embeddings, and l2 only when the embedding model recommends it.

Hybrid filter + vector. Chroma applies metadata where= and document where_document= filters as prefilters — the ANN search runs over the filtered subset. This is fast when filters are selective (the search index is restricted to a small candidate set) and slow when filters match most of the corpus (the prefilter scan dominates).

Reranking pattern. Chroma does not ship a built-in reranker. The standard pattern is to over-retrieve (n_results=50), then rerank in your application with a cross-encoder (sentence-transformers/ms-marco-MiniLM-L-6-v2 or Cohere Rerank), and keep the top 5–10.

Version migration guide

The 0.4.x → 0.5.x boundary is the largest on-disk change to date. Lesser bumps within 0.5.x also rename APIs.

0.4.x → 0.5.x checklist:

  • On-disk persistent-store schema changed. Existing chroma.sqlite3 stores need the maintainer-provided migration script. Snapshot the directory first.
  • Client(persist_directory=...) removed. Use PersistentClient(path=...) for embedded persistence, or HttpClient(host=..., port=...) for the server.
  • Client.persist() removed. Persistent clients write through automatically; the no-op call was misleading and is gone.
  • Settings(chroma_db_impl=...) removed. Backend is implicit from which client class you instantiate.
  • Telemetry env var is CHROMA_TELEMETRY=False; older ANONYMIZED_TELEMETRY no longer applies.

0.5.x minor-to-minor:

  • Tenants/databases primitive evolved across 0.5 minors — client.create_tenant(...) and client.create_database(...) signatures shifted. Pin if you depend on the new isolation model.
  • get_or_create_collection metadata validation tightened — invalid hnsw:* keys now raise instead of being silently dropped.
  • include= parameter on query() added options ("embeddings", "distances", "metadatas", "documents", "data"); older clients send unrecognised values and 0.5 servers may reject them.

Client/server pinning. Always run the same minor on both sides. The Docker image tag (chromadb/chroma:0.5.x) and the chromadb-client minor must match — cross-minor combinations sometimes work for add()/get() and silently fail on newer query parameters.

The roadmap targets a 1.0 once tenancy and the on-disk format stabilise; treat every pre-1.0 minor as a potential breaking release in CI.

Troubleshooting common errors

The error catalogue below is what trips up new users most often. Most are environmental rather than code bugs.

  • DuplicateIDError on add(...) — the ID already exists in the collection. Switch to upsert(...), which inserts or updates.
  • InvalidCollectionException — collection name doesn't exist or was deleted. Use get_or_create_collection for idempotent code.
  • ValueError: Expected metadata to be a non-empty dict — Chroma rejects empty metadatas dicts. Either omit metadatas= entirely or pass at least one key per row.
  • ConnectionError against chroma run — the server is not listening on the expected port, or the firewall is blocking it. curl http://chroma:8000/api/v1/heartbeat should return {"nanosecond heartbeat": ...}.
  • Could not connect to tenant with HttpClient — the server is on an older minor that doesn't know about tenants. Either upgrade the server or instantiate without tenant=.
  • Default embedder downloads ONNX model on first run — air-gapped environments need the model pre-cached, or use a custom embedding_function=. Symptom: hang on first add() while a CDN times out.
  • OperationalError: database is locked — two PersistentClient instances opened the same directory. Embedded mode is single-writer; move to chroma run or coordinate access.
  • Cross-version protocol errorchromadb-client from 2024 hitting a 2026 server (or vice versa) fails with opaque 400 responses on newer query parameters. Match minors.

Performance tuning

Chroma's performance levers are about telling the engine what to skip more than telling it to go faster. The HNSW index handles vector search; everything else (filters, payload returns, embedding-function calls) is in your control.

LeverMechanismWhen it helps
HttpClient for shared workloadsserver-process serialises writesmulti-worker apps
Higher hnsw:M and hnsw:construction_efbetter indexrecall-bound queries
hnsw:search_ef per queryruntime quality knoblatency-budget tuning
Pre-filter with where=shrink the search setselective metadata predicates
include= query parameterdrop unused fieldsreduce network payload
Batch add/upsertamortise round-tripsbulk ingestion
Custom embedding_functionbypass default ONNX downloadair-gapped / faster embedders
chromadb-client slim packageskip server deps in app imagessmaller worker images

Bulk ingestion pattern. Batch documents into upsert(...) calls of a few hundred at a time. Smaller batches are network-overhead-bound; larger batches occupy the server's index build budget and stall reads.

python
def chunks(iterable, n):
    buf = []
    for item in iterable:
        buf.append(item)
        if len(buf) >= n:
            yield buf
            buf = []
    if buf:
        yield buf

for batch in chunks(iter_rows(), 500):
    col.upsert(
        ids=[r["id"] for r in batch],
        documents=[r["text"] for r in batch],
        metadatas=[r["meta"] for r in batch],
    )

Output: streams documents into the collection in 500-row batches; the server returns once each batch is persisted

Query-time search_ef. Set per collection at creation, or tune per query if your version supports it. Higher search_ef improves recall linearly with query latency — production deployments set this from a p95 latency budget rather than a fixed default.

Avoid the default embedder in production. The bundled MiniLM-via-ONNX function downloads a model on first call and re-loads it per process. Pass an explicit embedding_function= that uses a long-lived embedding service (OpenAI, Cohere, local sentence-transformers) — both faster and easier to upgrade.

Embeddings & chunking strategy

Chroma stores vectors and metadata; it does not produce embeddings (the bundled MiniLM ONNX function is a default for convenience, not a recommendation). The embedding choice dominates retrieval quality far more than HNSW tuning, so it deserves explicit attention.

Embedding-model choice. A short-list current at writing:

ModelDimensionsCostStrengths
all-MiniLM-L6-v2384local CPU/GPUsmall, fast, the Chroma default
BAAI/bge-small-en-v1.5384localbetter on MTEB than MiniLM at same dim
BAAI/bge-large-en-v1.51024local GPUstrong open-weight; needs GPU at scale
text-embedding-3-small1536 (or shorter via Matryoshka)OpenAI APIfast, accurate, hosted
text-embedding-3-large3072OpenAI APIbest-in-class accuracy, dimensions matter for storage
voyage-3-large1024+Voyage AI APIstrong on retrieval benchmarks
embed-multilingual-v31024Cohere APInon-English content

Higher-dimensional embeddings improve recall but cost storage and RAM linearly. For corpora over ~1M chunks, prefer dim ≤ 1024 with quantization, or a Matryoshka model trimmed to a shorter dimension.

Chunking decisions live upstream. Chroma stores whatever chunks you give it; bad chunking can't be fixed at retrieval time. The standard heuristics:

  • 300–500 character chunks for narrow factual questions.
  • 800–1500 character chunks for general RAG with modern long-context LMs.
  • ~100 character overlap to avoid edge-case context truncation.
  • Respect document structure (titles, sections) — unstructured's chunk_by_title is a sensible default.

HyDE / query rewriting. When user queries are short and noisy, embedding them directly gives weak retrieval. The Hypothetical Document Embeddings pattern (HyDE) asks an LM to generate a plausible answer, embeds that, and queries Chroma with the synthetic embedding — often better than embedding the raw query.

Parent-document retrieval. Embed small chunks for precision; return the parent document (or a wider window) to the LM for context. Store parent_id in metadata and look it up after retrieval.

Security considerations

Chroma's default setup has minimal security — appropriate for embedded use, dangerous for a network-exposed server.

  • No auth by default on chroma run. Place behind an authenticated reverse proxy (nginx with basic auth, an API gateway, or a VPC) before exposing to a network. Recent versions support a static API key via CHROMA_SERVER_AUTHN_PROVIDER/CHROMA_SERVER_AUTHN_CREDENTIALS; verify the version's docs.
  • TLS not built in — the HTTP server speaks plaintext. Terminate TLS at a proxy.
  • Telemetry phones home unless disabled with CHROMA_TELEMETRY=False. Required for many compliance reviews.
  • Multi-tenant leakage risk under filter-per-tenant. A missing where={"tenant_id": ...} clause returns rows across tenants. Either wrap every query in code that enforces the filter, or use collection-per-tenant for hard isolation.
  • Prompt injection via retrieved content. Documents in Chroma are returned verbatim to the LM. A malicious uploaded document can contain instructions the LM follows ("ignore previous instructions and..."). Validate upload provenance; consider a sanitisation pass on retrieved content before prompt assembly.
  • PII in metadata. Metadata is returned in every query response; don't store anything you wouldn't want in logs.
  • Backup security. The persistent directory is plaintext SQLite + parquet. Encrypt at rest at the volume layer (LUKS, EBS encryption, etc.).

When NOT to use this

Chroma is the easiest vector DB to start with; it stops being the right tool at a clear scale boundary.

  • Corpora over ~10 million vectors with strict p95 latency. Qdrant or Milvus have more mature distributed stories. Chroma works but tuning gets painful.
  • Hybrid (BM25 + vector) is a core requirement. Weaviate ships hybrid out of the box; Chroma needs application-side BM25 (e.g. rank-bm25 + post-merge), which is more code.
  • Strict multi-tenant isolation with thousands of tenants. Collection-per-tenant slows down past low thousands; filter-per-tenant is leaky if a query forgets the where clause. Postgres-with-pgvector or a hosted service may fit better.
  • You want a fully-hosted, managed service. Pinecone, Weaviate Cloud, and Qdrant Cloud all front their own engines. Chroma Cloud exists but is younger.
  • Sparse-vector workloads (e.g. SPLADE). Qdrant and Weaviate have first-class sparse support; Chroma is dense-only at this writing.

See also

  • AI: chromadb — collections, queries, embedding functions, framework integration
  • Concept: RAG — retrieval-augmented generation patterns
  • Concept: API — REST design fundamentals