cheat sheet
weaviate-client
Package-level reference for weaviate-client on PyPI — install variants, the v3 → v4 API split, gRPC, and alternative vector stores.
weaviate-client
What it is
weaviate-client is the official Python SDK for Weaviate, an open-source vector database that combines dense-vector similarity search with BM25 keyword search, generative modules, and a strongly-typed schema. The client talks to a Weaviate server over both REST and gRPC, with gRPC used by default in the modern v4 API for batch operations.
Reach for weaviate-client when you want hybrid search (vector + keyword) out of the box, schema-enforced collections with strong typing, and built-in generative modules that call OpenAI / Cohere / HuggingFace from the server side. Reach for qdrant-client if you prefer a leaner Rust-backed server, or chromadb if you want embedded-only.
Install
pip install weaviate-client
Output: (none — exits 0 on success)
uv add weaviate-client
Output: dependency resolved + added to pyproject.toml
poetry add weaviate-client
Output: updated lockfile + virtualenv install
pip install "weaviate-client<4" # legacy v3 API if you cannot migrate yet
Output: installs the previous-generation client (deprecated)
Versioning & Python support
- The package had a hard v3 → v4 rewrite in 2024.
weaviate-client>=4is the current, supported line;weaviate-client<4(the3.xseries) is in maintenance mode and will not receive new server features. - The
v4client matches Weaviate server1.23+. Older servers do not support the gRPC endpoints the v4 client uses by default; pin the server version to match. - Recent versions support Python 3.9+. Pure-Python with
grpcioandhttpxas binary/extension dependencies. - Server-feature exposure roughly matches the server: when Weaviate ships a new generative or reranker module, the client gets typed helpers in the next minor.
Package metadata
- Maintainer: Weaviate (the company, formerly SeMI Technologies) and community contributors
- Project home: github.com/weaviate/weaviate-python-client
- Server repo: github.com/weaviate/weaviate
- Docs: weaviate.io/developers/weaviate/client-libraries/python
- PyPI: pypi.org/project/weaviate-client
- License: BSD-3-Clause
- Governance: company-led with open contributions; Weaviate Cloud Services is the hosted commercial offering
- First released: 2019
- Downloads: millions per month
Optional dependencies & extras
The weaviate-client package does not expose typical PyPI extras (weaviate-client[...]). gRPC, REST, and authentication helpers are all in the base install.
Common companion packages:
openai,cohere,anthropic,mistralai— client-side usage when you mix Weaviate retrieval with externally-driven generation. (Note: many of these are also available as server-side modules in Weaviate itself, where the API key sits on the server.)sentence-transformers— for client-side embeddings when you don't want to enable a server module.langchain-weaviateandllama-index-vector-stores-weaviate— framework adapters.authlib/requests-oauthlib— sometimes pulled in for OIDC auth flows against Weaviate Cloud.
Alternatives
| Package | Trade-off |
|---|---|
qdrant-client | Rust-backed Qdrant server, rich payload filtering. Use when filter performance matters most. |
chromadb | Embedded, zero-infrastructure. Use for prototypes. |
pymilvus | Milvus client. Use at very large scale. |
pinecone-client | Hosted-only SaaS. Use when you do not want to run a server. |
lancedb | Embedded columnar Lance/Arrow store. Use when your data is already columnar. |
elasticsearch / opensearch-py | Mature BM25 plus newer kNN. Use when you already run Elastic and only need vector as an add-on. |
Common gotchas
- v3 → v4 is a full rewrite, not a refactor. v3 used
weaviate.Client(url=...)with stringly-typed schema dicts; v4 usesweaviate.connect_to_local(),weaviate.connect_to_wcs(...), orweaviate.connect_to_custom(...)with strongly-typedConfigure.*builders. Old tutorials and Stack Overflow answers are mostly incompatible. - Auth methods restructured. v3's
auth_client_secret=AuthApiKey(...)becomesauth_credentials=AuthApiKey(...)in v4, and OIDC has its own helper. Token-style and API-key-style auth share a constructor argument now. - Always
client.close()(or use it as a context manager). The v4 client opens persistent gRPC connections; leaking clients in a long-running process exhausts file descriptors. - Server-side modules vs client-side embeddings. Generative and vectorizer modules (
text2vec-openai,generative-cohere, …) run on the server and need the server to hold the API key. The client API key is unrelated. Wiring these together is a frequent source of "module not enabled" errors. - Cross-namespace
CollectionvsClass. The v4 SDK renamed Weaviate "Classes" to "Collections" in Python; the server still uses "Class" in REST URLs. Don't be surprised when the two terms refer to the same thing. - Batch import defaults to dynamic batching.
with collection.batch.dynamic() as batch:is the v4 idiom; oversizing the batch leads to timeouts under load. Usebatch.fixed_size(...)when you need predictable throughput. - Cloud SDK auth uses Weaviate Cloud Services API keys, not your OpenAI/Cohere keys — a common mistake when copy-pasting from generative-module tutorials.
Real-world recipes
The recipes below highlight the v4 API surface for install-and-topology choices — collection definition, hybrid search, and multi-tenancy. The sections/ai/weaviate companion covers the broader API.
Local server connection — connect_to_local() is the v4 helper for the default localhost:8080 / localhost:50051 (REST/gRPC) setup. Always use the client as a context manager so the gRPC channel closes cleanly.
import weaviate
from weaviate.classes.init import Auth
from weaviate.classes.config import Configure, Property, DataType
with weaviate.connect_to_local() as client:
client.collections.create(
"Article",
vectorizer_config=Configure.Vectorizer.text2vec_openai(),
generative_config=Configure.Generative.openai(),
properties=[
Property(name="title", data_type=DataType.TEXT),
Property(name="body", data_type=DataType.TEXT),
],
)
Output: the collection is created server-side; the server-side text2vec-openai module needs an OPENAI_APIKEY env var set on the Weaviate server, not the client
Connect to Weaviate Cloud Services — connect_to_wcs (older) and connect_to_weaviate_cloud (current name) take the cluster URL and an API key.
import os
import weaviate
from weaviate.classes.init import Auth
with weaviate.connect_to_weaviate_cloud(
cluster_url=os.environ["WCD_URL"],
auth_credentials=Auth.api_key(os.environ["WCD_API_KEY"]),
headers={"X-OpenAI-Api-Key": os.environ["OPENAI_API_KEY"]},
) as client:
print(client.is_ready())
Output: True if the cluster is up; the X-OpenAI-Api-Key header forwards the OpenAI key to the server-side vectorizer/generator modules
Hybrid search (BM25 + vector) — the v4 collection API exposes hybrid() as a first-class query mode. alpha=0.5 weights BM25 and vector equally; alpha=0 is pure BM25, alpha=1 is pure vector.
articles = client.collections.get("Article")
hits = articles.query.hybrid(
query="how does HNSW work",
alpha=0.5,
limit=5,
return_metadata=["score", "explain_score"],
)
for o in hits.objects:
print(o.metadata.score, o.properties["title"])
Output: ranked objects with both BM25 and vector contributions explained in explain_score; the fusion happens server-side, no client-side RRF stitching needed
Generative search (server-side RAG) — Weaviate's generative modules embed the LLM call inside the query. The retrieved objects are passed to the configured generator with a templated prompt; the response includes both objects and the generated text.
articles = client.collections.get("Article")
result = articles.generate.near_text(
query="summarise HNSW",
limit=3,
grouped_task="Write a one-paragraph summary of these articles.",
)
print(result.generated)
Output: a single generated paragraph synthesising the top-3 hits; the OpenAI call happens server-side using the key supplied via X-OpenAI-Api-Key
Multi-tenant collection — Weaviate supports first-class multi-tenancy: one collection holds many tenants, each isolated at the shard level.
from weaviate.classes.config import Configure
from weaviate.classes.tenants import Tenant
client.collections.create(
"Doc",
multi_tenancy_config=Configure.multi_tenancy(enabled=True),
)
docs = client.collections.get("Doc")
docs.tenants.create([Tenant(name="acme"), Tenant(name="globex")])
# All subsequent operations must target a tenant
acme = docs.with_tenant("acme")
acme.data.insert({"title": "Acme internal"})
Output: each tenant lives on its own shard with hard isolation; tenant deletion drops the shard, much faster than per-row deletion
Batch import with the dynamic batcher — the v4 client tunes its batch size on the fly. Use batch.dynamic() for unknown-throughput workloads and batch.fixed_size(...) for predictable load.
with articles.batch.dynamic() as batch:
for row in iter_rows():
batch.add_object(properties=row, uuid=row["id"])
Output: the batch context manager flushes on exit; failures during the run are visible via batch.failed_objects after the with block
Production deployment
The v4 client opens persistent gRPC connections by default. Long-running services must close clients explicitly (or use the context manager pattern) — leaked clients exhaust file descriptors.
Topology checklist:
| Concern | Self-hosted (Docker / k8s) | Weaviate Cloud Services |
|---|---|---|
| Server modules | enable via env (ENABLE_MODULES=text2vec-openai,generative-openai) | managed; toggled in cluster config |
| Vectorizer keys | set on server (OPENAI_APIKEY=...) | forwarded via X-*-Api-Key headers |
| Backup | weaviate-backup module with S3/GCS/Azure/filesystem backend | managed snapshots |
| Replication | per-class replicationFactor | managed; HA tiers |
| Auth | OIDC, API keys, or anonymous | API keys + OIDC |
| Transport | 8080 (REST) + 50051 (gRPC) | TLS-only |
Server-side modules vs client-side embeddings. Weaviate's distinguishing feature is its server-side module ecosystem — the server holds the OpenAI / Cohere / HuggingFace keys and computes embeddings (and optionally generations) on import and query. The client only ships text. This simplifies multi-app deployments (one API key location) but means the server must have outbound network access to the embedding provider.
Alternative: disable server modules (vectorizer_config=None), compute embeddings in the application, and data.insert(vector=...) directly. The client then needs the OpenAI key, not the server.
Backup module. Configure once in BACKUP_FILESYSTEM_PATH=/var/lib/weaviate/backups (or S3/GCS/Azure equivalents), then trigger via the client:
client.backup.create(
backup_id=f"daily-{datetime.utcnow().date()}",
backend="filesystem",
include_collections=["Article"],
)
Output: a backup tarball on the configured backend; restore with client.backup.restore(...)
Multi-tenancy at scale. Tenant-per-collection (Multi-Tenancy enabled) is the recommended pattern past a few hundred tenants. Each tenant maps to a shard; Weaviate handles activation/deactivation to keep cold tenants out of RAM. Configure autoTenantActivation for on-demand activation.
Index tuning & retrieval quality
The v4 collection config exposes HNSW parameters under Configure.VectorIndex.hnsw(...). The same m / efConstruction / ef levers as Chroma/Qdrant, plus Weaviate-specific tuning around dynamic ef and PQ.
from weaviate.classes.config import Configure, VectorDistances
client.collections.create(
"Tuned",
vectorizer_config=Configure.Vectorizer.text2vec_openai(),
vector_index_config=Configure.VectorIndex.hnsw(
distance_metric=VectorDistances.COSINE,
ef_construction=256,
max_connections=32,
dynamic_ef_factor=8,
dynamic_ef_min=100,
dynamic_ef_max=500,
quantizer=Configure.VectorIndex.Quantizer.pq(segments=96),
),
)
Output: the collection is created with a tuned HNSW + product-quantized index; dynamic ef adapts per-query based on the limit
Hybrid alpha tuning. The alpha parameter in hybrid() ranges from 0 (pure BM25) to 1 (pure vector). Common settings:
alpha=0.25— keyword-heavy; product-search-style queriesalpha=0.5— balanced; general RAGalpha=0.75— semantic-heavy; conversational query rewriting
Reranker modules. Weaviate ships reranker modules (reranker-cohere, reranker-transformers, reranker-voyageai) that rerank the candidate set server-side after the initial hybrid retrieval. Enable the module and add rerank=Rerank(prop="title", query=...) to the query.
Version migration guide
The v3 → v4 boundary is the biggest in the package's history. The two clients are not source-compatible; v3 code does not run on v4 with import shimming alone.
v3 → v4 checklist:
- Construction:
weaviate.Client(url=...)→weaviate.connect_to_local(),weaviate.connect_to_weaviate_cloud(...), orweaviate.connect_to_custom(...). - Lifecycle: v4 clients hold persistent gRPC channels — always use
with client:context managers or callclient.close()explicitly. - Schema: v3 stringly-typed dict schemas → v4 typed
Configure.*andProperty(...)builders. - Auth:
auth_client_secret=AuthApiKey(...)→auth_credentials=Auth.api_key(...). OIDC has its own helper. - Queries: v3
client.query.get("Class").with_*()GraphQL builder → v4collection.query.fetch_objects(...),collection.query.near_text(...), etc., returning typed Python objects. - Batch: v3
client.batch.configure(...)+batch.add_data_object(...)→ v4with collection.batch.dynamic() as batch: batch.add_object(...). - Naming: Weaviate "Classes" became "Collections" in the Python SDK; the server REST API still calls them Classes.
v3 maintenance. weaviate-client<4 still installs and works against older Weaviate servers (< 1.23), but receives no new feature work. If you cannot migrate to v4 yet, pin weaviate-client<4 strictly — pip install weaviate-client now installs v4.
v4 minor-to-minor. Releases land roughly monthly. Breaking changes are rare within the v4 line but new modules (rerankers, new vectorizers, RBAC features) require matching server versions. Pin server and client minors together.
Server compatibility. v4 client requires Weaviate server 1.23+. Pre-1.23 servers do not support the gRPC endpoints the v4 client uses by default. Either upgrade the server or pin to weaviate-client<4.
Troubleshooting common errors
WeaviateGrpcUnavailableon connect — gRPC port (50051by default) not exposed. Many Docker setups only publish8080. Either expose50051or useconnect_to_custom(..., grpc_port=...)with the right port."module not enabled"on generative/vectorizer use — the Weaviate server does not have the module loaded. SetENABLE_MODULES=text2vec-openai,generative-openai,...in the server env and restart.UnauthorizedErroragainst WCS — wrong API key, or you used the OpenAI key where the Weaviate key was expected. WCS keys come from the cluster console; provider keys go inX-*-Api-Keyheaders.ResourceWarning: Unclosed gRPC channel— client wasn't closed. Use the context manager pattern; in long-running services, callclient.close()in shutdown hooks.- v3 tutorial code raises
AttributeError— you installed v4. Either migrate orpip install "weaviate-client<4". - Batch
failed_objectsnon-empty — server rejected some rows. Inspectbatch.failed_objectsafter thewithblock exits; common causes are duplicate UUIDs and schema mismatches. ClassvsCollectionconfusion in errors — the server returns "Class" in REST URLs and error messages; the SDK uses "Collection". Same concept, different surface.- Connection refused after a few minutes of idle — load balancers may close idle gRPC channels. Set TCP keepalive or use
additional_config=AdditionalConfig(timeout=...).
Performance tuning
| Lever | Mechanism | When it helps |
|---|---|---|
| gRPC over REST | connect_to_local() uses gRPC by default | batch import, query throughput |
batch.dynamic() | auto-tuned batch size | mixed-throughput ingestion |
batch.fixed_size(...) | predictable batches | steady-state ingestion under load |
| PQ quantization | smaller index | RAM-bound large collections |
replicationFactor > 1 | read scaling | read-heavy workloads |
| Multi-tenancy with autoTenantActivation | cold tenants off-heap | many tenants, sparse access |
| Server-side reranker | cross-encoder after hybrid | quality over latency |
Async client. v4 has an async variant — weaviate.use_async_with_local(), use_async_with_weaviate_cloud(), etc. Required for high-concurrency FastAPI services.
import asyncio, weaviate
async def main():
async with weaviate.use_async_with_local() as client:
articles = client.collections.get("Article")
hits = await articles.query.hybrid("HNSW", limit=5)
print(hits)
asyncio.run(main())
Output: the async client mirrors the sync surface with await semantics; the context manager handles channel cleanup
Batch sizing. The dynamic batcher targets a few hundred objects per batch by default. For predictable steady-state ingestion, batch.fixed_size(batch_size=200, concurrent_requests=4) is a good starting point. Larger batches risk gRPC deadline timeouts.
Security considerations
Weaviate has more built-in security than Chroma but still needs deliberate configuration for production.
- Auth. Three modes: anonymous (default — never expose to a network), static API keys (set via env), or OIDC. Production deployments should use OIDC (Auth0, Azure AD, Keycloak) with RBAC.
- RBAC (v1.25+) — role-based access control with read/write/admin scopes per collection. Required for multi-team clusters.
- TLS. Configure
ENABLE_TLS=trueand provide cert files, or terminate TLS at an ingress controller. - Server-side API keys. When using vectorizer/generator modules, the keys (OpenAI, Cohere, etc.) live on the server. Use secrets managers — never bake into images.
X-*-Api-Keyheader forwarding. Cloud customers forward provider keys per-request via headers — keys live on the client side. Different security model from server-side keys.- Multi-tenant isolation. Multi-tenancy enabled gives shard-level isolation; tenants cannot see each other's data even with a crafted query.
- Prompt injection via generative modules. Server-side generators take retrieved content + a prompt template. A document containing prompt-injection content can hijack the generator. Validate document provenance and consider sanitisation passes.
- Backups. Encrypted at the backup-backend layer (S3 SSE, GCS encryption, etc.); the backup module does not encrypt itself.
Embeddings & chunking strategy
Weaviate is unusual among vector DBs in that the server can produce its own embeddings via vectorizer modules. The choice between server-side and client-side embedding shapes the whole architecture.
Server-side vectorizer (the Weaviate-native path). Configure vectorizer_config=Configure.Vectorizer.text2vec_openai() (or text2vec_cohere, text2vec_voyageai, text2vec_huggingface, text2vec_transformers, text2vec_ollama, …). The server embeds on import and query. Pros: one place for keys, no client embedding code. Cons: server needs outbound network for hosted modules.
Client-side embedding. vectorizer_config=Configure.Vectorizer.none() and pass vector=[...] on insert and near_vector=[...] on query. Pros: keys live in the app, model choice is fully flexible. Cons: more code, more places to maintain.
Vectorizer-module choice (server-side). Roughly:
| Module | When |
|---|---|
text2vec_openai | hosted ease, strong English embeddings |
text2vec_cohere | multilingual; strong on retrieval benchmarks |
text2vec_voyageai | Voyage's strong open-eval results |
text2vec_transformers | local model on server (CPU/GPU) — no external API |
text2vec_ollama | local server-side via Ollama |
text2vec_huggingface | HuggingFace Inference API |
multi2vec_clip | multi-modal (text + image) |
Chunking is upstream. Weaviate stores text properties; chunking happens in the app before insert. Standard heuristics (300–1500 character chunks with ~100 char overlap, respect document structure) apply. For RAG-quality, pair chunk_by_title from unstructured with the server-side vectorizer for a one-pass ingestion pipeline.
Hybrid alpha tuning is part of chunking. Smaller chunks favour BM25 (more keyword density); larger chunks favour vector (more semantic context). Tune alpha per workload.
When NOT to use this
Weaviate's superpower is server-side hybrid + generative modules. The trade-offs below are where another tool is a better fit.
- Notebook prototype with no infra.
chromadbis onepip install; Weaviate needs a running server. - You want vectors but not a database.
faissis a library; Weaviate is a database with schema, RBAC, replication, and modules. - Strict separation: keys in app, not in server. If compliance forbids forwarding API keys to a third-party server, disable server modules and compute embeddings client-side — at that point a leaner client like
qdrant-clientmay be a better fit. - You're already on Postgres.
pgvectorlets you keep one operational store; Weaviate is more capable but more moving parts. - Pure keyword search. Elasticsearch / OpenSearch are more battle-tested for pure BM25 with a deep aggregation ecosystem.
See also
- AI: weaviate — collections, hybrid search, generative modules
- Concept: RAG — retrieval-augmented generation patterns
- Concept: API — REST design fundamentals