cheat sheet

weaviate-client

Package-level reference for weaviate-client on PyPI — install variants, the v3 → v4 API split, gRPC, and alternative vector stores.

weaviate-client

What it is

weaviate-client is the official Python SDK for Weaviate, an open-source vector database that combines dense-vector similarity search with BM25 keyword search, generative modules, and a strongly-typed schema. The client talks to a Weaviate server over both REST and gRPC, with gRPC used by default in the modern v4 API for batch operations.

Reach for weaviate-client when you want hybrid search (vector + keyword) out of the box, schema-enforced collections with strong typing, and built-in generative modules that call OpenAI / Cohere / HuggingFace from the server side. Reach for qdrant-client if you prefer a leaner Rust-backed server, or chromadb if you want embedded-only.

Install

bash
pip install weaviate-client

Output: (none — exits 0 on success)

bash
uv add weaviate-client

Output: dependency resolved + added to pyproject.toml

bash
poetry add weaviate-client

Output: updated lockfile + virtualenv install

bash
pip install "weaviate-client<4"      # legacy v3 API if you cannot migrate yet

Output: installs the previous-generation client (deprecated)

Versioning & Python support

  • The package had a hard v3 → v4 rewrite in 2024. weaviate-client>=4 is the current, supported line; weaviate-client<4 (the 3.x series) is in maintenance mode and will not receive new server features.
  • The v4 client matches Weaviate server 1.23+. Older servers do not support the gRPC endpoints the v4 client uses by default; pin the server version to match.
  • Recent versions support Python 3.9+. Pure-Python with grpcio and httpx as binary/extension dependencies.
  • Server-feature exposure roughly matches the server: when Weaviate ships a new generative or reranker module, the client gets typed helpers in the next minor.

Package metadata

  • Maintainer: Weaviate (the company, formerly SeMI Technologies) and community contributors
  • Project home: github.com/weaviate/weaviate-python-client
  • Server repo: github.com/weaviate/weaviate
  • Docs: weaviate.io/developers/weaviate/client-libraries/python
  • PyPI: pypi.org/project/weaviate-client
  • License: BSD-3-Clause
  • Governance: company-led with open contributions; Weaviate Cloud Services is the hosted commercial offering
  • First released: 2019
  • Downloads: millions per month

Optional dependencies & extras

The weaviate-client package does not expose typical PyPI extras (weaviate-client[...]). gRPC, REST, and authentication helpers are all in the base install.

Common companion packages:

  • openai, cohere, anthropic, mistralai — client-side usage when you mix Weaviate retrieval with externally-driven generation. (Note: many of these are also available as server-side modules in Weaviate itself, where the API key sits on the server.)
  • sentence-transformers — for client-side embeddings when you don't want to enable a server module.
  • langchain-weaviate and llama-index-vector-stores-weaviate — framework adapters.
  • authlib / requests-oauthlib — sometimes pulled in for OIDC auth flows against Weaviate Cloud.

Alternatives

PackageTrade-off
qdrant-clientRust-backed Qdrant server, rich payload filtering. Use when filter performance matters most.
chromadbEmbedded, zero-infrastructure. Use for prototypes.
pymilvusMilvus client. Use at very large scale.
pinecone-clientHosted-only SaaS. Use when you do not want to run a server.
lancedbEmbedded columnar Lance/Arrow store. Use when your data is already columnar.
elasticsearch / opensearch-pyMature BM25 plus newer kNN. Use when you already run Elastic and only need vector as an add-on.

Common gotchas

  1. v3 → v4 is a full rewrite, not a refactor. v3 used weaviate.Client(url=...) with stringly-typed schema dicts; v4 uses weaviate.connect_to_local(), weaviate.connect_to_wcs(...), or weaviate.connect_to_custom(...) with strongly-typed Configure.* builders. Old tutorials and Stack Overflow answers are mostly incompatible.
  2. Auth methods restructured. v3's auth_client_secret=AuthApiKey(...) becomes auth_credentials=AuthApiKey(...) in v4, and OIDC has its own helper. Token-style and API-key-style auth share a constructor argument now.
  3. Always client.close() (or use it as a context manager). The v4 client opens persistent gRPC connections; leaking clients in a long-running process exhausts file descriptors.
  4. Server-side modules vs client-side embeddings. Generative and vectorizer modules (text2vec-openai, generative-cohere, …) run on the server and need the server to hold the API key. The client API key is unrelated. Wiring these together is a frequent source of "module not enabled" errors.
  5. Cross-namespace Collection vs Class. The v4 SDK renamed Weaviate "Classes" to "Collections" in Python; the server still uses "Class" in REST URLs. Don't be surprised when the two terms refer to the same thing.
  6. Batch import defaults to dynamic batching. with collection.batch.dynamic() as batch: is the v4 idiom; oversizing the batch leads to timeouts under load. Use batch.fixed_size(...) when you need predictable throughput.
  7. Cloud SDK auth uses Weaviate Cloud Services API keys, not your OpenAI/Cohere keys — a common mistake when copy-pasting from generative-module tutorials.

Real-world recipes

The recipes below highlight the v4 API surface for install-and-topology choices — collection definition, hybrid search, and multi-tenancy. The sections/ai/weaviate companion covers the broader API.

Local server connectionconnect_to_local() is the v4 helper for the default localhost:8080 / localhost:50051 (REST/gRPC) setup. Always use the client as a context manager so the gRPC channel closes cleanly.

python
import weaviate
from weaviate.classes.init import Auth
from weaviate.classes.config import Configure, Property, DataType

with weaviate.connect_to_local() as client:
    client.collections.create(
        "Article",
        vectorizer_config=Configure.Vectorizer.text2vec_openai(),
        generative_config=Configure.Generative.openai(),
        properties=[
            Property(name="title", data_type=DataType.TEXT),
            Property(name="body", data_type=DataType.TEXT),
        ],
    )

Output: the collection is created server-side; the server-side text2vec-openai module needs an OPENAI_APIKEY env var set on the Weaviate server, not the client

Connect to Weaviate Cloud Servicesconnect_to_wcs (older) and connect_to_weaviate_cloud (current name) take the cluster URL and an API key.

python
import os
import weaviate
from weaviate.classes.init import Auth

with weaviate.connect_to_weaviate_cloud(
    cluster_url=os.environ["WCD_URL"],
    auth_credentials=Auth.api_key(os.environ["WCD_API_KEY"]),
    headers={"X-OpenAI-Api-Key": os.environ["OPENAI_API_KEY"]},
) as client:
    print(client.is_ready())

Output: True if the cluster is up; the X-OpenAI-Api-Key header forwards the OpenAI key to the server-side vectorizer/generator modules

Hybrid search (BM25 + vector) — the v4 collection API exposes hybrid() as a first-class query mode. alpha=0.5 weights BM25 and vector equally; alpha=0 is pure BM25, alpha=1 is pure vector.

python
articles = client.collections.get("Article")
hits = articles.query.hybrid(
    query="how does HNSW work",
    alpha=0.5,
    limit=5,
    return_metadata=["score", "explain_score"],
)
for o in hits.objects:
    print(o.metadata.score, o.properties["title"])

Output: ranked objects with both BM25 and vector contributions explained in explain_score; the fusion happens server-side, no client-side RRF stitching needed

Generative search (server-side RAG) — Weaviate's generative modules embed the LLM call inside the query. The retrieved objects are passed to the configured generator with a templated prompt; the response includes both objects and the generated text.

python
articles = client.collections.get("Article")
result = articles.generate.near_text(
    query="summarise HNSW",
    limit=3,
    grouped_task="Write a one-paragraph summary of these articles.",
)
print(result.generated)

Output: a single generated paragraph synthesising the top-3 hits; the OpenAI call happens server-side using the key supplied via X-OpenAI-Api-Key

Multi-tenant collection — Weaviate supports first-class multi-tenancy: one collection holds many tenants, each isolated at the shard level.

python
from weaviate.classes.config import Configure
from weaviate.classes.tenants import Tenant

client.collections.create(
    "Doc",
    multi_tenancy_config=Configure.multi_tenancy(enabled=True),
)
docs = client.collections.get("Doc")
docs.tenants.create([Tenant(name="acme"), Tenant(name="globex")])

# All subsequent operations must target a tenant
acme = docs.with_tenant("acme")
acme.data.insert({"title": "Acme internal"})

Output: each tenant lives on its own shard with hard isolation; tenant deletion drops the shard, much faster than per-row deletion

Batch import with the dynamic batcher — the v4 client tunes its batch size on the fly. Use batch.dynamic() for unknown-throughput workloads and batch.fixed_size(...) for predictable load.

python
with articles.batch.dynamic() as batch:
    for row in iter_rows():
        batch.add_object(properties=row, uuid=row["id"])

Output: the batch context manager flushes on exit; failures during the run are visible via batch.failed_objects after the with block

Production deployment

The v4 client opens persistent gRPC connections by default. Long-running services must close clients explicitly (or use the context manager pattern) — leaked clients exhaust file descriptors.

Topology checklist:

ConcernSelf-hosted (Docker / k8s)Weaviate Cloud Services
Server modulesenable via env (ENABLE_MODULES=text2vec-openai,generative-openai)managed; toggled in cluster config
Vectorizer keysset on server (OPENAI_APIKEY=...)forwarded via X-*-Api-Key headers
Backupweaviate-backup module with S3/GCS/Azure/filesystem backendmanaged snapshots
Replicationper-class replicationFactormanaged; HA tiers
AuthOIDC, API keys, or anonymousAPI keys + OIDC
Transport8080 (REST) + 50051 (gRPC)TLS-only

Server-side modules vs client-side embeddings. Weaviate's distinguishing feature is its server-side module ecosystem — the server holds the OpenAI / Cohere / HuggingFace keys and computes embeddings (and optionally generations) on import and query. The client only ships text. This simplifies multi-app deployments (one API key location) but means the server must have outbound network access to the embedding provider.

Alternative: disable server modules (vectorizer_config=None), compute embeddings in the application, and data.insert(vector=...) directly. The client then needs the OpenAI key, not the server.

Backup module. Configure once in BACKUP_FILESYSTEM_PATH=/var/lib/weaviate/backups (or S3/GCS/Azure equivalents), then trigger via the client:

python
client.backup.create(
    backup_id=f"daily-{datetime.utcnow().date()}",
    backend="filesystem",
    include_collections=["Article"],
)

Output: a backup tarball on the configured backend; restore with client.backup.restore(...)

Multi-tenancy at scale. Tenant-per-collection (Multi-Tenancy enabled) is the recommended pattern past a few hundred tenants. Each tenant maps to a shard; Weaviate handles activation/deactivation to keep cold tenants out of RAM. Configure autoTenantActivation for on-demand activation.

Index tuning & retrieval quality

The v4 collection config exposes HNSW parameters under Configure.VectorIndex.hnsw(...). The same m / efConstruction / ef levers as Chroma/Qdrant, plus Weaviate-specific tuning around dynamic ef and PQ.

python
from weaviate.classes.config import Configure, VectorDistances

client.collections.create(
    "Tuned",
    vectorizer_config=Configure.Vectorizer.text2vec_openai(),
    vector_index_config=Configure.VectorIndex.hnsw(
        distance_metric=VectorDistances.COSINE,
        ef_construction=256,
        max_connections=32,
        dynamic_ef_factor=8,
        dynamic_ef_min=100,
        dynamic_ef_max=500,
        quantizer=Configure.VectorIndex.Quantizer.pq(segments=96),
    ),
)

Output: the collection is created with a tuned HNSW + product-quantized index; dynamic ef adapts per-query based on the limit

Hybrid alpha tuning. The alpha parameter in hybrid() ranges from 0 (pure BM25) to 1 (pure vector). Common settings:

  • alpha=0.25 — keyword-heavy; product-search-style queries
  • alpha=0.5 — balanced; general RAG
  • alpha=0.75 — semantic-heavy; conversational query rewriting

Reranker modules. Weaviate ships reranker modules (reranker-cohere, reranker-transformers, reranker-voyageai) that rerank the candidate set server-side after the initial hybrid retrieval. Enable the module and add rerank=Rerank(prop="title", query=...) to the query.

Version migration guide

The v3 → v4 boundary is the biggest in the package's history. The two clients are not source-compatible; v3 code does not run on v4 with import shimming alone.

v3 → v4 checklist:

  • Construction: weaviate.Client(url=...)weaviate.connect_to_local(), weaviate.connect_to_weaviate_cloud(...), or weaviate.connect_to_custom(...).
  • Lifecycle: v4 clients hold persistent gRPC channels — always use with client: context managers or call client.close() explicitly.
  • Schema: v3 stringly-typed dict schemas → v4 typed Configure.* and Property(...) builders.
  • Auth: auth_client_secret=AuthApiKey(...)auth_credentials=Auth.api_key(...). OIDC has its own helper.
  • Queries: v3 client.query.get("Class").with_*() GraphQL builder → v4 collection.query.fetch_objects(...), collection.query.near_text(...), etc., returning typed Python objects.
  • Batch: v3 client.batch.configure(...) + batch.add_data_object(...) → v4 with collection.batch.dynamic() as batch: batch.add_object(...).
  • Naming: Weaviate "Classes" became "Collections" in the Python SDK; the server REST API still calls them Classes.

v3 maintenance. weaviate-client<4 still installs and works against older Weaviate servers (< 1.23), but receives no new feature work. If you cannot migrate to v4 yet, pin weaviate-client<4 strictly — pip install weaviate-client now installs v4.

v4 minor-to-minor. Releases land roughly monthly. Breaking changes are rare within the v4 line but new modules (rerankers, new vectorizers, RBAC features) require matching server versions. Pin server and client minors together.

Server compatibility. v4 client requires Weaviate server 1.23+. Pre-1.23 servers do not support the gRPC endpoints the v4 client uses by default. Either upgrade the server or pin to weaviate-client<4.

Troubleshooting common errors

  • WeaviateGrpcUnavailable on connect — gRPC port (50051 by default) not exposed. Many Docker setups only publish 8080. Either expose 50051 or use connect_to_custom(..., grpc_port=...) with the right port.
  • "module not enabled" on generative/vectorizer use — the Weaviate server does not have the module loaded. Set ENABLE_MODULES=text2vec-openai,generative-openai,... in the server env and restart.
  • UnauthorizedError against WCS — wrong API key, or you used the OpenAI key where the Weaviate key was expected. WCS keys come from the cluster console; provider keys go in X-*-Api-Key headers.
  • ResourceWarning: Unclosed gRPC channel — client wasn't closed. Use the context manager pattern; in long-running services, call client.close() in shutdown hooks.
  • v3 tutorial code raises AttributeError — you installed v4. Either migrate or pip install "weaviate-client<4".
  • Batch failed_objects non-empty — server rejected some rows. Inspect batch.failed_objects after the with block exits; common causes are duplicate UUIDs and schema mismatches.
  • Class vs Collection confusion in errors — the server returns "Class" in REST URLs and error messages; the SDK uses "Collection". Same concept, different surface.
  • Connection refused after a few minutes of idle — load balancers may close idle gRPC channels. Set TCP keepalive or use additional_config=AdditionalConfig(timeout=...).

Performance tuning

LeverMechanismWhen it helps
gRPC over RESTconnect_to_local() uses gRPC by defaultbatch import, query throughput
batch.dynamic()auto-tuned batch sizemixed-throughput ingestion
batch.fixed_size(...)predictable batchessteady-state ingestion under load
PQ quantizationsmaller indexRAM-bound large collections
replicationFactor > 1read scalingread-heavy workloads
Multi-tenancy with autoTenantActivationcold tenants off-heapmany tenants, sparse access
Server-side rerankercross-encoder after hybridquality over latency

Async client. v4 has an async variant — weaviate.use_async_with_local(), use_async_with_weaviate_cloud(), etc. Required for high-concurrency FastAPI services.

python
import asyncio, weaviate

async def main():
    async with weaviate.use_async_with_local() as client:
        articles = client.collections.get("Article")
        hits = await articles.query.hybrid("HNSW", limit=5)
        print(hits)

asyncio.run(main())

Output: the async client mirrors the sync surface with await semantics; the context manager handles channel cleanup

Batch sizing. The dynamic batcher targets a few hundred objects per batch by default. For predictable steady-state ingestion, batch.fixed_size(batch_size=200, concurrent_requests=4) is a good starting point. Larger batches risk gRPC deadline timeouts.

Security considerations

Weaviate has more built-in security than Chroma but still needs deliberate configuration for production.

  • Auth. Three modes: anonymous (default — never expose to a network), static API keys (set via env), or OIDC. Production deployments should use OIDC (Auth0, Azure AD, Keycloak) with RBAC.
  • RBAC (v1.25+) — role-based access control with read/write/admin scopes per collection. Required for multi-team clusters.
  • TLS. Configure ENABLE_TLS=true and provide cert files, or terminate TLS at an ingress controller.
  • Server-side API keys. When using vectorizer/generator modules, the keys (OpenAI, Cohere, etc.) live on the server. Use secrets managers — never bake into images.
  • X-*-Api-Key header forwarding. Cloud customers forward provider keys per-request via headers — keys live on the client side. Different security model from server-side keys.
  • Multi-tenant isolation. Multi-tenancy enabled gives shard-level isolation; tenants cannot see each other's data even with a crafted query.
  • Prompt injection via generative modules. Server-side generators take retrieved content + a prompt template. A document containing prompt-injection content can hijack the generator. Validate document provenance and consider sanitisation passes.
  • Backups. Encrypted at the backup-backend layer (S3 SSE, GCS encryption, etc.); the backup module does not encrypt itself.

Embeddings & chunking strategy

Weaviate is unusual among vector DBs in that the server can produce its own embeddings via vectorizer modules. The choice between server-side and client-side embedding shapes the whole architecture.

Server-side vectorizer (the Weaviate-native path). Configure vectorizer_config=Configure.Vectorizer.text2vec_openai() (or text2vec_cohere, text2vec_voyageai, text2vec_huggingface, text2vec_transformers, text2vec_ollama, …). The server embeds on import and query. Pros: one place for keys, no client embedding code. Cons: server needs outbound network for hosted modules.

Client-side embedding. vectorizer_config=Configure.Vectorizer.none() and pass vector=[...] on insert and near_vector=[...] on query. Pros: keys live in the app, model choice is fully flexible. Cons: more code, more places to maintain.

Vectorizer-module choice (server-side). Roughly:

ModuleWhen
text2vec_openaihosted ease, strong English embeddings
text2vec_coheremultilingual; strong on retrieval benchmarks
text2vec_voyageaiVoyage's strong open-eval results
text2vec_transformerslocal model on server (CPU/GPU) — no external API
text2vec_ollamalocal server-side via Ollama
text2vec_huggingfaceHuggingFace Inference API
multi2vec_clipmulti-modal (text + image)

Chunking is upstream. Weaviate stores text properties; chunking happens in the app before insert. Standard heuristics (300–1500 character chunks with ~100 char overlap, respect document structure) apply. For RAG-quality, pair chunk_by_title from unstructured with the server-side vectorizer for a one-pass ingestion pipeline.

Hybrid alpha tuning is part of chunking. Smaller chunks favour BM25 (more keyword density); larger chunks favour vector (more semantic context). Tune alpha per workload.

When NOT to use this

Weaviate's superpower is server-side hybrid + generative modules. The trade-offs below are where another tool is a better fit.

  • Notebook prototype with no infra. chromadb is one pip install; Weaviate needs a running server.
  • You want vectors but not a database. faiss is a library; Weaviate is a database with schema, RBAC, replication, and modules.
  • Strict separation: keys in app, not in server. If compliance forbids forwarding API keys to a third-party server, disable server modules and compute embeddings client-side — at that point a leaner client like qdrant-client may be a better fit.
  • You're already on Postgres. pgvector lets you keep one operational store; Weaviate is more capable but more moving parts.
  • Pure keyword search. Elasticsearch / OpenSearch are more battle-tested for pure BM25 with a deep aggregation ecosystem.

See also