cheat sheet

weaviate-client

Package-level reference for weaviate-client on PyPI — install variants, the v3 → v4 API split, gRPC, and alternative vector stores.

updated 05-31-2026

weaviate-client

What it is

weaviate-client is the official Python SDK for Weaviate, an open-source vector database that combines dense-vector similarity search with BM25 keyword search, generative modules, and a strongly-typed schema. The client talks to a Weaviate server over both REST and gRPC, with gRPC used by default in the modern v4 API for batch operations.

Reach for weaviate-client when you want hybrid search (vector + keyword) out of the box, schema-enforced collections with strong typing, and built-in generative modules that call OpenAI / Cohere / HuggingFace from the server side. Reach for qdrant-client if you prefer a leaner Rust-backed server, or chromadb if you want embedded-only.

Install

bash

pip install weaviate-client

Output: (none — exits 0 on success)

bash

uv add weaviate-client

Output: dependency resolved + added to pyproject.toml

bash

poetry add weaviate-client

Output: updated lockfile + virtualenv install

bash

pip install "weaviate-client<4"      # legacy v3 API if you cannot migrate yet

Output: installs the previous-generation client (deprecated)

Versioning & Python support

The package had a hard v3 → v4 rewrite in 2024. weaviate-client>=4 is the current, supported line; weaviate-client<4 (the 3.x series) is in maintenance mode and will not receive new server features.
The v4 client matches Weaviate server 1.23+. Older servers do not support the gRPC endpoints the v4 client uses by default; pin the server version to match.
Recent versions support Python 3.9+. Pure-Python with grpcio and httpx as binary/extension dependencies.
Server-feature exposure roughly matches the server: when Weaviate ships a new generative or reranker module, the client gets typed helpers in the next minor.

Package metadata

Maintainer: Weaviate (the company, formerly SeMI Technologies) and community contributors
Project home: github.com/weaviate/weaviate-python-client
Server repo: github.com/weaviate/weaviate
Docs: weaviate.io/developers/weaviate/client-libraries/python
PyPI: pypi.org/project/weaviate-client
License: BSD-3-Clause
Governance: company-led with open contributions; Weaviate Cloud Services is the hosted commercial offering
First released: 2019
Downloads: millions per month

Optional dependencies & extras

The weaviate-client package does not expose typical PyPI extras (weaviate-client[...]). gRPC, REST, and authentication helpers are all in the base install.

Common companion packages:

openai, cohere, anthropic, mistralai — client-side usage when you mix Weaviate retrieval with externally-driven generation. (Note: many of these are also available as server-side modules in Weaviate itself, where the API key sits on the server.)
sentence-transformers — for client-side embeddings when you don't want to enable a server module.
langchain-weaviate and llama-index-vector-stores-weaviate — framework adapters.
authlib / requests-oauthlib — sometimes pulled in for OIDC auth flows against Weaviate Cloud.

Alternatives

Package	Trade-off
`qdrant-client`	Rust-backed Qdrant server, rich payload filtering. Use when filter performance matters most.
`chromadb`	Embedded, zero-infrastructure. Use for prototypes.
`pymilvus`	Milvus client. Use at very large scale.
`pinecone-client`	Hosted-only SaaS. Use when you do not want to run a server.
`lancedb`	Embedded columnar Lance/Arrow store. Use when your data is already columnar.
`elasticsearch` / `opensearch-py`	Mature BM25 plus newer kNN. Use when you already run Elastic and only need vector as an add-on.

Common gotchas

v3 → v4 is a full rewrite, not a refactor. v3 used weaviate.Client(url=...) with stringly-typed schema dicts; v4 uses weaviate.connect_to_local(), weaviate.connect_to_wcs(...), or weaviate.connect_to_custom(...) with strongly-typed Configure.* builders. Old tutorials and Stack Overflow answers are mostly incompatible.
Auth methods restructured. v3's auth_client_secret=AuthApiKey(...) becomes auth_credentials=AuthApiKey(...) in v4, and OIDC has its own helper. Token-style and API-key-style auth share a constructor argument now.
Always client.close() (or use it as a context manager). The v4 client opens persistent gRPC connections; leaking clients in a long-running process exhausts file descriptors.
Server-side modules vs client-side embeddings. Generative and vectorizer modules (text2vec-openai, generative-cohere, …) run on the server and need the server to hold the API key. The client API key is unrelated. Wiring these together is a frequent source of "module not enabled" errors.
Cross-namespace Collection vs Class. The v4 SDK renamed Weaviate "Classes" to "Collections" in Python; the server still uses "Class" in REST URLs. Don't be surprised when the two terms refer to the same thing.
Batch import defaults to dynamic batching. with collection.batch.dynamic() as batch: is the v4 idiom; oversizing the batch leads to timeouts under load. Use batch.fixed_size(...) when you need predictable throughput.
Cloud SDK auth uses Weaviate Cloud Services API keys, not your OpenAI/Cohere keys — a common mistake when copy-pasting from generative-module tutorials.

Real-world recipes

The recipes below highlight the v4 API surface for install-and-topology choices — collection definition, hybrid search, and multi-tenancy. The sections/ai/weaviate companion covers the broader API.

Local server connection — connect_to_local() is the v4 helper for the default localhost:8080 / localhost:50051 (REST/gRPC) setup. Always use the client as a context manager so the gRPC channel closes cleanly.

python

import weaviate
from weaviate.classes.init import Auth
from weaviate.classes.config import Configure, Property, DataType

with weaviate.connect_to_local() as client:
    client.collections.create(
        "Article",
        vectorizer_config=Configure.Vectorizer.text2vec_openai(),
        generative_config=Configure.Generative.openai(),
        properties=[
            Property(name="title", data_type=DataType.TEXT),
            Property(name="body", data_type=DataType.TEXT),
        ],
    )

Output: the collection is created server-side; the server-side text2vec-openai module needs an OPENAI_APIKEY env var set on the Weaviate server, not the client

Connect to Weaviate Cloud Services — connect_to_wcs (older) and connect_to_weaviate_cloud (current name) take the cluster URL and an API key.

python

import os
import weaviate
from weaviate.classes.init import Auth

with weaviate.connect_to_weaviate_cloud(
    cluster_url=os.environ["WCD_URL"],
    auth_credentials=Auth.api_key(os.environ["WCD_API_KEY"]),
    headers={"X-OpenAI-Api-Key": os.environ["OPENAI_API_KEY"]},
) as client:
    print(client.is_ready())

Output: True if the cluster is up; the X-OpenAI-Api-Key header forwards the OpenAI key to the server-side vectorizer/generator modules

Hybrid search (BM25 + vector) — the v4 collection API exposes hybrid() as a first-class query mode. alpha=0.5 weights BM25 and vector equally; alpha=0 is pure BM25, alpha=1 is pure vector.

python

articles = client.collections.get("Article")
hits = articles.query.hybrid(
    query="how does HNSW work",
    alpha=0.5,
    limit=5,
    return_metadata=["score", "explain_score"],
)
for o in hits.objects:
    print(o.metadata.score, o.properties["title"])

Output: ranked objects with both BM25 and vector contributions explained in explain_score; the fusion happens server-side, no client-side RRF stitching needed

Generative search (server-side RAG) — Weaviate's generative modules embed the LLM call inside the query. The retrieved objects are passed to the configured generator with a templated prompt; the response includes both objects and the generated text.

python

articles = client.collections.get("Article")
result = articles.generate.near_text(
    query="summarise HNSW",
    limit=3,
    grouped_task="Write a one-paragraph summary of these articles.",
)
print(result.generated)

Output: a single generated paragraph synthesising the top-3 hits; the OpenAI call happens server-side using the key supplied via X-OpenAI-Api-Key

Multi-tenant collection — Weaviate supports first-class multi-tenancy: one collection holds many tenants, each isolated at the shard level.

python

from weaviate.classes.config import Configure
from weaviate.classes.tenants import Tenant

client.collections.create(
    "Doc",
    multi_tenancy_config=Configure.multi_tenancy(enabled=True),
)
docs = client.collections.get("Doc")
docs.tenants.create([Tenant(name="acme"), Tenant(name="globex")])

# All subsequent operations must target a tenant
acme = docs.with_tenant("acme")
acme.data.insert({"title": "Acme internal"})

Output: each tenant lives on its own shard with hard isolation; tenant deletion drops the shard, much faster than per-row deletion

Batch import with the dynamic batcher — the v4 client tunes its batch size on the fly. Use batch.dynamic() for unknown-throughput workloads and batch.fixed_size(...) for predictable load.

python

with articles.batch.dynamic() as batch:
    for row in iter_rows():
        batch.add_object(properties=row, uuid=row["id"])

Output: the batch context manager flushes on exit; failures during the run are visible via batch.failed_objects after the with block

Production deployment

The v4 client opens persistent gRPC connections by default. Long-running services must close clients explicitly (or use the context manager pattern) — leaked clients exhaust file descriptors.

Topology checklist:

Concern	Self-hosted (Docker / k8s)	Weaviate Cloud Services
Server modules	enable via env (`ENABLE_MODULES=text2vec-openai,generative-openai`)	managed; toggled in cluster config
Vectorizer keys	set on server (`OPENAI_APIKEY=...`)	forwarded via `X-*-Api-Key` headers
Backup	`weaviate-backup` module with S3/GCS/Azure/filesystem backend	managed snapshots
Replication	per-class `replicationFactor`	managed; HA tiers
Auth	OIDC, API keys, or anonymous	API keys + OIDC
Transport	8080 (REST) + 50051 (gRPC)	TLS-only

Server-side modules vs client-side embeddings. Weaviate's distinguishing feature is its server-side module ecosystem — the server holds the OpenAI / Cohere / HuggingFace keys and computes embeddings (and optionally generations) on import and query. The client only ships text. This simplifies multi-app deployments (one API key location) but means the server must have outbound network access to the embedding provider.

Alternative: disable server modules (vectorizer_config=None), compute embeddings in the application, and data.insert(vector=...) directly. The client then needs the OpenAI key, not the server.

Backup module. Configure once in BACKUP_FILESYSTEM_PATH=/var/lib/weaviate/backups (or S3/GCS/Azure equivalents), then trigger via the client:

python

client.backup.create(
    backup_id=f"daily-{datetime.utcnow().date()}",
    backend="filesystem",
    include_collections=["Article"],
)

Output: a backup tarball on the configured backend; restore with client.backup.restore(...)

Multi-tenancy at scale. Tenant-per-collection (Multi-Tenancy enabled) is the recommended pattern past a few hundred tenants. Each tenant maps to a shard; Weaviate handles activation/deactivation to keep cold tenants out of RAM. Configure autoTenantActivation for on-demand activation.

Index tuning & retrieval quality

The v4 collection config exposes HNSW parameters under Configure.VectorIndex.hnsw(...). The same m / efConstruction / ef levers as Chroma/Qdrant, plus Weaviate-specific tuning around dynamic ef and PQ.

python

from weaviate.classes.config import Configure, VectorDistances

client.collections.create(
    "Tuned",
    vectorizer_config=Configure.Vectorizer.text2vec_openai(),
    vector_index_config=Configure.VectorIndex.hnsw(
        distance_metric=VectorDistances.COSINE,
        ef_construction=256,
        max_connections=32,
        dynamic_ef_factor=8,
        dynamic_ef_min=100,
        dynamic_ef_max=500,
        quantizer=Configure.VectorIndex.Quantizer.pq(segments=96),
    ),
)

Output: the collection is created with a tuned HNSW + product-quantized index; dynamic ef adapts per-query based on the limit

Hybrid alpha tuning. The alpha parameter in hybrid() ranges from 0 (pure BM25) to 1 (pure vector). Common settings:

alpha=0.25 — keyword-heavy; product-search-style queries
alpha=0.5 — balanced; general RAG
alpha=0.75 — semantic-heavy; conversational query rewriting

Reranker modules. Weaviate ships reranker modules (reranker-cohere, reranker-transformers, reranker-voyageai) that rerank the candidate set server-side after the initial hybrid retrieval. Enable the module and add rerank=Rerank(prop="title", query=...) to the query.

Version migration guide

The v3 → v4 boundary is the biggest in the package's history. The two clients are not source-compatible; v3 code does not run on v4 with import shimming alone.

v3 → v4 checklist:

Construction: weaviate.Client(url=...) → weaviate.connect_to_local(), weaviate.connect_to_weaviate_cloud(...), or weaviate.connect_to_custom(...).
Lifecycle: v4 clients hold persistent gRPC channels — always use with client: context managers or call client.close() explicitly.
Schema: v3 stringly-typed dict schemas → v4 typed Configure.* and Property(...) builders.
Auth: auth_client_secret=AuthApiKey(...) → auth_credentials=Auth.api_key(...). OIDC has its own helper.
Queries: v3 client.query.get("Class").with_*() GraphQL builder → v4 collection.query.fetch_objects(...), collection.query.near_text(...), etc., returning typed Python objects.
Batch: v3 client.batch.configure(...) + batch.add_data_object(...) → v4 with collection.batch.dynamic() as batch: batch.add_object(...).
Naming: Weaviate "Classes" became "Collections" in the Python SDK; the server REST API still calls them Classes.

v3 maintenance. weaviate-client<4 still installs and works against older Weaviate servers (< 1.23), but receives no new feature work. If you cannot migrate to v4 yet, pin weaviate-client<4 strictly — pip install weaviate-client now installs v4.

v4 minor-to-minor. Releases land roughly monthly. Breaking changes are rare within the v4 line but new modules (rerankers, new vectorizers, RBAC features) require matching server versions. Pin server and client minors together.

Server compatibility. v4 client requires Weaviate server 1.23+. Pre-1.23 servers do not support the gRPC endpoints the v4 client uses by default. Either upgrade the server or pin to weaviate-client<4.

Troubleshooting common errors

WeaviateGrpcUnavailable on connect — gRPC port (50051 by default) not exposed. Many Docker setups only publish 8080. Either expose 50051 or use connect_to_custom(..., grpc_port=...) with the right port.
"module not enabled" on generative/vectorizer use — the Weaviate server does not have the module loaded. Set ENABLE_MODULES=text2vec-openai,generative-openai,... in the server env and restart.
UnauthorizedError against WCS — wrong API key, or you used the OpenAI key where the Weaviate key was expected. WCS keys come from the cluster console; provider keys go in X-*-Api-Key headers.
ResourceWarning: Unclosed gRPC channel — client wasn't closed. Use the context manager pattern; in long-running services, call client.close() in shutdown hooks.
v3 tutorial code raises AttributeError — you installed v4. Either migrate or pip install "weaviate-client<4".
Batch failed_objects non-empty — server rejected some rows. Inspect batch.failed_objects after the with block exits; common causes are duplicate UUIDs and schema mismatches.
Class vs Collection confusion in errors — the server returns "Class" in REST URLs and error messages; the SDK uses "Collection". Same concept, different surface.
Connection refused after a few minutes of idle — load balancers may close idle gRPC channels. Set TCP keepalive or use additional_config=AdditionalConfig(timeout=...).

Performance tuning

Lever	Mechanism	When it helps
gRPC over REST	`connect_to_local()` uses gRPC by default	batch import, query throughput
`batch.dynamic()`	auto-tuned batch size	mixed-throughput ingestion
`batch.fixed_size(...)`	predictable batches	steady-state ingestion under load
PQ quantization	smaller index	RAM-bound large collections
`replicationFactor` > 1	read scaling	read-heavy workloads
Multi-tenancy with autoTenantActivation	cold tenants off-heap	many tenants, sparse access
Server-side reranker	cross-encoder after hybrid	quality over latency

Async client. v4 has an async variant — weaviate.use_async_with_local(), use_async_with_weaviate_cloud(), etc. Required for high-concurrency FastAPI services.

python

import asyncio, weaviate

async def main():
    async with weaviate.use_async_with_local() as client:
        articles = client.collections.get("Article")
        hits = await articles.query.hybrid("HNSW", limit=5)
        print(hits)

asyncio.run(main())

Output: the async client mirrors the sync surface with await semantics; the context manager handles channel cleanup

Batch sizing. The dynamic batcher targets a few hundred objects per batch by default. For predictable steady-state ingestion, batch.fixed_size(batch_size=200, concurrent_requests=4) is a good starting point. Larger batches risk gRPC deadline timeouts.

Security considerations

Weaviate has more built-in security than Chroma but still needs deliberate configuration for production.

Auth. Three modes: anonymous (default — never expose to a network), static API keys (set via env), or OIDC. Production deployments should use OIDC (Auth0, Azure AD, Keycloak) with RBAC.
RBAC (v1.25+) — role-based access control with read/write/admin scopes per collection. Required for multi-team clusters.
TLS. Configure ENABLE_TLS=true and provide cert files, or terminate TLS at an ingress controller.
Server-side API keys. When using vectorizer/generator modules, the keys (OpenAI, Cohere, etc.) live on the server. Use secrets managers — never bake into images.
X-*-Api-Key header forwarding. Cloud customers forward provider keys per-request via headers — keys live on the client side. Different security model from server-side keys.
Multi-tenant isolation. Multi-tenancy enabled gives shard-level isolation; tenants cannot see each other's data even with a crafted query.
Prompt injection via generative modules. Server-side generators take retrieved content + a prompt template. A document containing prompt-injection content can hijack the generator. Validate document provenance and consider sanitisation passes.
Backups. Encrypted at the backup-backend layer (S3 SSE, GCS encryption, etc.); the backup module does not encrypt itself.

Embeddings & chunking strategy

Weaviate is unusual among vector DBs in that the server can produce its own embeddings via vectorizer modules. The choice between server-side and client-side embedding shapes the whole architecture.

Server-side vectorizer (the Weaviate-native path). Configure vectorizer_config=Configure.Vectorizer.text2vec_openai() (or text2vec_cohere, text2vec_voyageai, text2vec_huggingface, text2vec_transformers, text2vec_ollama, …). The server embeds on import and query. Pros: one place for keys, no client embedding code. Cons: server needs outbound network for hosted modules.

Client-side embedding. vectorizer_config=Configure.Vectorizer.none() and pass vector=[...] on insert and near_vector=[...] on query. Pros: keys live in the app, model choice is fully flexible. Cons: more code, more places to maintain.

Vectorizer-module choice (server-side). Roughly:

Module	When
`text2vec_openai`	hosted ease, strong English embeddings
`text2vec_cohere`	multilingual; strong on retrieval benchmarks
`text2vec_voyageai`	Voyage's strong open-eval results
`text2vec_transformers`	local model on server (CPU/GPU) — no external API
`text2vec_ollama`	local server-side via Ollama
`text2vec_huggingface`	HuggingFace Inference API
`multi2vec_clip`	multi-modal (text + image)

Chunking is upstream. Weaviate stores text properties; chunking happens in the app before insert. Standard heuristics (300–1500 character chunks with ~100 char overlap, respect document structure) apply. For RAG-quality, pair chunk_by_title from unstructured with the server-side vectorizer for a one-pass ingestion pipeline.

Hybrid alpha tuning is part of chunking. Smaller chunks favour BM25 (more keyword density); larger chunks favour vector (more semantic context). Tune alpha per workload.

When NOT to use this

Weaviate's superpower is server-side hybrid + generative modules. The trade-offs below are where another tool is a better fit.

Notebook prototype with no infra. chromadb is one pip install; Weaviate needs a running server.
You want vectors but not a database. faiss is a library; Weaviate is a database with schema, RBAC, replication, and modules.
Strict separation: keys in app, not in server. If compliance forbids forwarding API keys to a third-party server, disable server modules and compute embeddings client-side — at that point a leaner client like qdrant-client may be a better fit.
You're already on Postgres. pgvector lets you keep one operational store; Weaviate is more capable but more moving parts.
Pure keyword search. Elasticsearch / OpenSearch are more battle-tested for pure BM25 with a deep aggregation ecosystem.

weaviate-client

What it is

Install

Versioning & Python support

Package metadata

Optional dependencies & extras

Alternatives

Common gotchas

Real-world recipes

Production deployment

Index tuning & retrieval quality

Version migration guide

Troubleshooting common errors

Performance tuning

Security considerations

Embeddings & chunking strategy

When NOT to use this

See also