cheat sheet

chromadb

Package-level reference for chromadb on PyPI — install variants, server/client split, embedding-function extras, and alternative vector stores.

updated 05-31-2026

chromadb

What it is

chromadb is the Python distribution of Chroma, an open-source vector database for AI applications. The package ships the embedded in-process engine, the persistent SQLite/DuckDB-backed store, an HTTP client/server, and a small library of pluggable embedding functions in a single import. The same chromadb.Client API works whether you are running in-memory for a notebook prototype or pointing at a remote Chroma server cluster.

Reach for chromadb when you want zero-infrastructure RAG storage in Python and are happy to scale up later. Reach for qdrant-client, weaviate-client, or hosted services like Pinecone when you need strict multi-tenant isolation, advanced filtering, or production-grade clustering from day one.

Install

bash

pip install chromadb

Output: (none — exits 0 on success)

bash

uv add chromadb

Output: dependency resolved + added to pyproject.toml

bash

poetry add chromadb

Output: updated lockfile + virtualenv install

bash

pip install chromadb-client      # thin HTTP-only client (no server, no embedding deps)

Output: installs the slim client that talks to a remote chroma run server

Versioning & Python support

Chroma releases are pre-1.0 and move quickly — the 0.4.x → 0.5.x jump in 2024 changed the on-disk persistent-store layout and required a migration script. Always read the changelog before bumping.
Recent versions support Python 3.8+ on Linux, macOS, and Windows. Wheels are published for common architectures; building from source needs a C++ toolchain because of the HNSW index code.
Client/server version skew matters. If you run the Chroma server in Docker, the chromadb (or chromadb-client) version in your application should track the server's minor version. Cross-minor combinations sometimes work and sometimes fail with opaque protocol errors — pinning both to the same minor is the safe path.
Roadmap targets a 1.0 once the storage format and tenant model stabilise; until then treat every minor as a potential breaking release in CI.

Package metadata

Maintainer: Chroma (the company, formerly Chroma Inc.) and community contributors
Project home: github.com/chroma-core/chroma
Docs: docs.trychroma.com
PyPI: pypi.org/project/chromadb
License: Apache-2.0
Governance: company-led with open contributions; commercial Chroma Cloud offering tracks the open core
First released: 2022
Downloads: multiple million per month on PyPI; the default vector store in many LangChain and LlamaIndex tutorials

Optional dependencies & extras

chromadb ships as one PyPI package that bundles the server, the local persistent store, and the in-process engine. There are no published feature extras in the usual chromadb[xxx] form — instead, optional functionality lives in companion packages and in the small built-in chromadb.utils.embedding_functions module, which loads its own extras lazily.

Common companions to install alongside:

chromadb-client — slim HTTP-only client (no server, no embedding deps). Use it in lightweight containers that only talk to a remote chroma run server.
sentence-transformers — local CPU/GPU embeddings via the built-in SentenceTransformerEmbeddingFunction.
openai — required if you use the built-in OpenAIEmbeddingFunction.
cohere, google-generativeai, voyageai — each backs the corresponding built-in embedding function.
onnxruntime — used by the bundled default MiniLM embedding model.
tiktoken — token counting when you mix Chroma with OpenAI chat models.
langchain-chroma or llama-index-vector-stores-chroma — framework adapters in the LangChain / LlamaIndex ecosystems.

Alternatives

Package	Trade-off
`qdrant-client`	Rust-backed Qdrant server with rich payload filtering and gRPC support. Use when you want a stronger production story than embedded Chroma.
`weaviate-client`	Schema-first, GraphQL-style queries, hybrid search out of the box. Use for hybrid (vector + BM25) workloads.
`pymilvus`	Milvus client. Use when you need very large-scale clustered vector storage.
`pinecone-client`	Fully-hosted SaaS — no self-hosting required. Use when you want to outsource ops.
`lancedb`	Embedded columnar vector DB on Lance/Arrow. Use when your data is already columnar and you want zero-copy queries.
`faiss-cpu` / `faiss-gpu`	Library, not a database — raw ANN indexes. Use when you only need similarity search, not metadata storage.

Common gotchas

0.4.x → 0.5.x persistent-store migration. The on-disk format changed; existing chroma.sqlite3 stores need the maintainer-provided migration tool. Snapshot the directory before upgrading.
Collection-API churn. Client.persist(), Client(persistence_dir=...), and PersistentClient(path=...) have been the recommended entrypoints at different times. Pin a version and copy-paste from that version's docs, not Stack Overflow.
Default embedding function downloads a model on first use. The bundled MiniLM ONNX model is fetched from a CDN — in air-gapped environments, pass an explicit embedding_function= or pre-cache the model.
HNSW index parameters are set at collection creation. hnsw:space, hnsw:M, and hnsw:construction_ef cannot be retroactively changed without rebuilding the collection. Decide on cosine vs L2 distance up front.
Tenancy model is recent and still maturing. Tenants and databases inside a single Chroma server are usable but underdocumented; production multi-tenant designs should test isolation carefully.
Client/server version mismatch is silent. A chromadb-client from 2024 talking to a 2026 server may appear to work for add but fail on a newer query parameter. Match minor versions.
In-memory Client() does not persist. Calling Client() with no path gives you an ephemeral store that vanishes on process exit. Use PersistentClient(path=...) or run the HTTP server for durability.

Real-world recipes

The recipes below are package-level vignettes — they focus on the install footprint and the client/server topology each pattern requires, rather than re-teaching collection methods (the companion sections/ai/chromadb covers the API surface).

Persistent local store for a single-process app — the smallest possible Chroma deployment. PersistentClient writes to a directory of SQLite + parquet shards; restarting the process re-opens the same data.

python

import chromadb
from chromadb.utils.embedding_functions import SentenceTransformerEmbeddingFunction

client = chromadb.PersistentClient(path=".chroma")
emb = SentenceTransformerEmbeddingFunction(model_name="all-MiniLM-L6-v2")
docs = client.get_or_create_collection("kb", embedding_function=emb)
docs.upsert(
    ids=["a", "b"],
    documents=["Chroma is an embedded vector DB.", "Pinecone is a hosted SaaS."],
    metadatas=[{"source": "intro"}, {"source": "intro"}],
)
print(docs.query(query_texts=["embedded database"], n_results=1))

Output: the closest match with its id, distance, and metadata; the .chroma/ directory holds chroma.sqlite3 plus per-collection parquet segments

Client-server split for multi-process serving — run chroma run --path .chroma --host 0.0.0.0 --port 8000 in one container, then point every app at it with the slim client. This is the only safe way to share a Chroma store across workers.

python

import chromadb

client = chromadb.HttpClient(host="chroma.internal", port=8000)
docs = client.get_collection("kb")
print(docs.count())

Output: count of documents in the remote collection; the worker image only needs chromadb-client (~5 MB) — no ONNX, no sentence-transformers, no server dependencies

Custom embedding function for an in-house model — Chroma's embedding-function interface is a duck-typed callable. Implement __call__(self, input: list[str]) -> list[list[float]] and pass it as embedding_function=.

python

import chromadb
from chromadb import Documents, EmbeddingFunction, Embeddings

class MyEmbedder(EmbeddingFunction):
    def __init__(self, model):
        self.model = model
    def __call__(self, input: Documents) -> Embeddings:
        return self.model.encode(input).tolist()

client = chromadb.PersistentClient(path=".chroma")
col = client.get_or_create_collection(
    "docs",
    embedding_function=MyEmbedder(my_local_model),
    metadata={"hnsw:space": "cosine"},
)

Output: the collection materialises with a custom embedder; metadata pins the distance function at creation time (cannot be changed retroactively)

Hybrid filter + vector query — Chroma supports a structured where= filter over metadata and a where_document= substring/regex filter, applied as a prefilter to the ANN search.

python

results = col.query(
    query_texts=["how does HNSW work"],
    n_results=5,
    where={"section": {"$in": ["intro", "tuning"]}},
    where_document={"$contains": "HNSW"},
)

Output: the top-5 hits restricted to documents whose metadata section is intro or tuning AND whose body literally contains the substring HNSW

Multi-tenant via collection-per-tenant — Chroma's tenants/databases primitive is still maturing; a robust pattern today is one collection per tenant with a name prefix and shared embedding function.

python

def get_tenant_collection(tenant_id: str):
    return client.get_or_create_collection(
        name=f"tenant_{tenant_id}",
        embedding_function=emb,
    )

Output: every tenant's data lives in its own collection, with hard isolation at the storage layer — no risk of a faulty where clause leaking rows across tenants

Production deployment

The two production topologies are embedded persistent (single process owns the directory) and client-server (the chroma run HTTP server fronts a shared volume). Pick early — the on-disk format is the same but the failure modes are not.

Topology checklist:

Concern	Embedded `PersistentClient`	Client-server (`chroma run`)
Concurrency	one writer only — file lock contention if you fork workers	many readers + writers via HTTP
Container image	full `chromadb` (~150 MB)	apps use `chromadb-client` (~5 MB), server image runs separately
Backups	snapshot the directory while idle	snapshot the volume; coordinate with server quiesce
Auth	none — process-local	static API key (via env var) or proxy fronting the server
Telemetry	sends usage pings unless disabled	same — set `CHROMA_TELEMETRY=False`
Failure mode if disk full	hangs on SQLite write	server returns 500, client retries

Pinning client and server. When you run the server in Docker (chromadb/chroma:0.5.x), pin the application's chromadb-client to the same minor. Cross-minor combinations silently fail on newer query parameters (include=["embeddings"] was added late in 0.4.x; older clients ignore it).

Telemetry opt-out — Chroma sends anonymous PostHog pings on startup. Disable per-process with CHROMA_TELEMETRY=False, or in code via Settings(anonymized_telemetry=False). Required for many compliance reviews.

Backups. The persistent directory is roughly safe to tar while the process is idle. For zero-downtime backups, run chroma run with a filesystem that supports atomic snapshots (ZFS, btrfs, LVM, EBS snapshot) and snapshot the volume rather than copying the live directory.

Multi-tenancy strategy. Two paths exist; both work in production today:

Collection-per-tenant — strong isolation; tenant deletion is a single delete_collection. Limit: ~thousands of collections per server before metadata lookups slow down.
Filter-per-tenant — one shared collection with tenant_id in metadata, queried via where={"tenant_id": "..."}. Cheaper at scale, but a missing where clause leaks rows. Add an assertion in your query wrapper.

The newer tenants/databases primitive (client.create_tenant(...)) is still maturing — test isolation explicitly before relying on it for regulated workloads.

Index tuning & retrieval quality

Chroma uses HNSW (Hierarchical Navigable Small World) under the hood. The three parameters worth knowing are hnsw:space (distance metric), hnsw:M (graph connectivity), and hnsw:construction_ef (build-time candidate pool). All three are set at collection creation time and cannot be changed without rebuilding.

python

col = client.create_collection(
    name="tuned_kb",
    embedding_function=emb,
    metadata={
        "hnsw:space": "cosine",         # cosine | l2 | ip
        "hnsw:M": 32,                   # default 16; higher = better recall, more RAM
        "hnsw:construction_ef": 200,    # default 100; higher = better index, slower build
        "hnsw:search_ef": 100,          # default 10; raise at query time for higher recall
    },
)

Output: a collection whose HNSW index is built with a larger candidate pool and higher graph degree — measurably better recall at p95 latency cost

Trade-off table:

Parameter	Low value	High value	When to raise
`M`	8–16	48–64	corpora over ~1M vectors; recall plateau
`construction_ef`	100	400+	quality matters more than build time
`search_ef`	10	100–500	tuned per-query; p95 latency target

Distance metric choice. For most modern sentence embeddings (MiniLM, BGE, OpenAI text-embedding-3-*), cosine is the right default — the embeddings are already unit-normalised in spirit. Use ip (inner product) only with explicitly unnormalised embeddings, and l2 only when the embedding model recommends it.

Hybrid filter + vector. Chroma applies metadata where= and document where_document= filters as prefilters — the ANN search runs over the filtered subset. This is fast when filters are selective (the search index is restricted to a small candidate set) and slow when filters match most of the corpus (the prefilter scan dominates).

Reranking pattern. Chroma does not ship a built-in reranker. The standard pattern is to over-retrieve (n_results=50), then rerank in your application with a cross-encoder (sentence-transformers/ms-marco-MiniLM-L-6-v2 or Cohere Rerank), and keep the top 5–10.

Version migration guide

The 0.4.x → 0.5.x boundary is the largest on-disk change to date. Lesser bumps within 0.5.x also rename APIs.

0.4.x → 0.5.x checklist:

On-disk persistent-store schema changed. Existing chroma.sqlite3 stores need the maintainer-provided migration script. Snapshot the directory first.
Client(persist_directory=...) removed. Use PersistentClient(path=...) for embedded persistence, or HttpClient(host=..., port=...) for the server.
Client.persist() removed. Persistent clients write through automatically; the no-op call was misleading and is gone.
Settings(chroma_db_impl=...) removed. Backend is implicit from which client class you instantiate.
Telemetry env var is CHROMA_TELEMETRY=False; older ANONYMIZED_TELEMETRY no longer applies.

0.5.x minor-to-minor:

Tenants/databases primitive evolved across 0.5 minors — client.create_tenant(...) and client.create_database(...) signatures shifted. Pin if you depend on the new isolation model.
get_or_create_collection metadata validation tightened — invalid hnsw:* keys now raise instead of being silently dropped.
include= parameter on query() added options ("embeddings", "distances", "metadatas", "documents", "data"); older clients send unrecognised values and 0.5 servers may reject them.

Client/server pinning. Always run the same minor on both sides. The Docker image tag (chromadb/chroma:0.5.x) and the chromadb-client minor must match — cross-minor combinations sometimes work for add()/get() and silently fail on newer query parameters.

The roadmap targets a 1.0 once tenancy and the on-disk format stabilise; treat every pre-1.0 minor as a potential breaking release in CI.

Troubleshooting common errors

The error catalogue below is what trips up new users most often. Most are environmental rather than code bugs.

DuplicateIDError on add(...) — the ID already exists in the collection. Switch to upsert(...), which inserts or updates.
InvalidCollectionException — collection name doesn't exist or was deleted. Use get_or_create_collection for idempotent code.
ValueError: Expected metadata to be a non-empty dict — Chroma rejects empty metadatas dicts. Either omit metadatas= entirely or pass at least one key per row.
ConnectionError against chroma run — the server is not listening on the expected port, or the firewall is blocking it. curl http://chroma:8000/api/v1/heartbeat should return {"nanosecond heartbeat": ...}.
Could not connect to tenant with HttpClient — the server is on an older minor that doesn't know about tenants. Either upgrade the server or instantiate without tenant=.
Default embedder downloads ONNX model on first run — air-gapped environments need the model pre-cached, or use a custom embedding_function=. Symptom: hang on first add() while a CDN times out.
OperationalError: database is locked — two PersistentClient instances opened the same directory. Embedded mode is single-writer; move to chroma run or coordinate access.
Cross-version protocol error — chromadb-client from 2024 hitting a 2026 server (or vice versa) fails with opaque 400 responses on newer query parameters. Match minors.

Performance tuning

Chroma's performance levers are about telling the engine what to skip more than telling it to go faster. The HNSW index handles vector search; everything else (filters, payload returns, embedding-function calls) is in your control.

Lever	Mechanism	When it helps
`HttpClient` for shared workloads	server-process serialises writes	multi-worker apps
Higher `hnsw:M` and `hnsw:construction_ef`	better index	recall-bound queries
`hnsw:search_ef` per query	runtime quality knob	latency-budget tuning
Pre-filter with `where=`	shrink the search set	selective metadata predicates
`include=` query parameter	drop unused fields	reduce network payload
Batch `add`/`upsert`	amortise round-trips	bulk ingestion
Custom `embedding_function`	bypass default ONNX download	air-gapped / faster embedders
`chromadb-client` slim package	skip server deps in app images	smaller worker images

Bulk ingestion pattern. Batch documents into upsert(...) calls of a few hundred at a time. Smaller batches are network-overhead-bound; larger batches occupy the server's index build budget and stall reads.

python

def chunks(iterable, n):
    buf = []
    for item in iterable:
        buf.append(item)
        if len(buf) >= n:
            yield buf
            buf = []
    if buf:
        yield buf

for batch in chunks(iter_rows(), 500):
    col.upsert(
        ids=[r["id"] for r in batch],
        documents=[r["text"] for r in batch],
        metadatas=[r["meta"] for r in batch],
    )

Output: streams documents into the collection in 500-row batches; the server returns once each batch is persisted

Query-time search_ef. Set per collection at creation, or tune per query if your version supports it. Higher search_ef improves recall linearly with query latency — production deployments set this from a p95 latency budget rather than a fixed default.

Avoid the default embedder in production. The bundled MiniLM-via-ONNX function downloads a model on first call and re-loads it per process. Pass an explicit embedding_function= that uses a long-lived embedding service (OpenAI, Cohere, local sentence-transformers) — both faster and easier to upgrade.

Embeddings & chunking strategy

Chroma stores vectors and metadata; it does not produce embeddings (the bundled MiniLM ONNX function is a default for convenience, not a recommendation). The embedding choice dominates retrieval quality far more than HNSW tuning, so it deserves explicit attention.

Embedding-model choice. A short-list current at writing:

Model	Dimensions	Cost	Strengths
`all-MiniLM-L6-v2`	384	local CPU/GPU	small, fast, the Chroma default
`BAAI/bge-small-en-v1.5`	384	local	better on MTEB than MiniLM at same dim
`BAAI/bge-large-en-v1.5`	1024	local GPU	strong open-weight; needs GPU at scale
`text-embedding-3-small`	1536 (or shorter via Matryoshka)	OpenAI API	fast, accurate, hosted
`text-embedding-3-large`	3072	OpenAI API	best-in-class accuracy, dimensions matter for storage
`voyage-3-large`	1024+	Voyage AI API	strong on retrieval benchmarks
`embed-multilingual-v3`	1024	Cohere API	non-English content

Higher-dimensional embeddings improve recall but cost storage and RAM linearly. For corpora over ~1M chunks, prefer dim ≤ 1024 with quantization, or a Matryoshka model trimmed to a shorter dimension.

Chunking decisions live upstream. Chroma stores whatever chunks you give it; bad chunking can't be fixed at retrieval time. The standard heuristics:

300–500 character chunks for narrow factual questions.
800–1500 character chunks for general RAG with modern long-context LMs.
~100 character overlap to avoid edge-case context truncation.
Respect document structure (titles, sections) — unstructured's chunk_by_title is a sensible default.

HyDE / query rewriting. When user queries are short and noisy, embedding them directly gives weak retrieval. The Hypothetical Document Embeddings pattern (HyDE) asks an LM to generate a plausible answer, embeds that, and queries Chroma with the synthetic embedding — often better than embedding the raw query.

Parent-document retrieval. Embed small chunks for precision; return the parent document (or a wider window) to the LM for context. Store parent_id in metadata and look it up after retrieval.

Security considerations

Chroma's default setup has minimal security — appropriate for embedded use, dangerous for a network-exposed server.

No auth by default on chroma run. Place behind an authenticated reverse proxy (nginx with basic auth, an API gateway, or a VPC) before exposing to a network. Recent versions support a static API key via CHROMA_SERVER_AUTHN_PROVIDER/CHROMA_SERVER_AUTHN_CREDENTIALS; verify the version's docs.
TLS not built in — the HTTP server speaks plaintext. Terminate TLS at a proxy.
Telemetry phones home unless disabled with CHROMA_TELEMETRY=False. Required for many compliance reviews.
Multi-tenant leakage risk under filter-per-tenant. A missing where={"tenant_id": ...} clause returns rows across tenants. Either wrap every query in code that enforces the filter, or use collection-per-tenant for hard isolation.
Prompt injection via retrieved content. Documents in Chroma are returned verbatim to the LM. A malicious uploaded document can contain instructions the LM follows ("ignore previous instructions and..."). Validate upload provenance; consider a sanitisation pass on retrieved content before prompt assembly.
PII in metadata. Metadata is returned in every query response; don't store anything you wouldn't want in logs.
Backup security. The persistent directory is plaintext SQLite + parquet. Encrypt at rest at the volume layer (LUKS, EBS encryption, etc.).

When NOT to use this

Chroma is the easiest vector DB to start with; it stops being the right tool at a clear scale boundary.

Corpora over ~10 million vectors with strict p95 latency. Qdrant or Milvus have more mature distributed stories. Chroma works but tuning gets painful.
Hybrid (BM25 + vector) is a core requirement. Weaviate ships hybrid out of the box; Chroma needs application-side BM25 (e.g. rank-bm25 + post-merge), which is more code.
Strict multi-tenant isolation with thousands of tenants. Collection-per-tenant slows down past low thousands; filter-per-tenant is leaky if a query forgets the where clause. Postgres-with-pgvector or a hosted service may fit better.
You want a fully-hosted, managed service. Pinecone, Weaviate Cloud, and Qdrant Cloud all front their own engines. Chroma Cloud exists but is younger.
Sparse-vector workloads (e.g. SPLADE). Qdrant and Weaviate have first-class sparse support; Chroma is dense-only at this writing.

chromadb

What it is

Install

Versioning & Python support

Package metadata

Optional dependencies & extras

Alternatives

Common gotchas

Real-world recipes

Production deployment

Index tuning & retrieval quality

Version migration guide

Troubleshooting common errors

Performance tuning

Embeddings & chunking strategy

Security considerations

When NOT to use this

See also