cheat sheet
qdrant-client
Store and search vector embeddings with the Qdrant Python client. Covers collections, CRUD, filtered vector search, payload indexing, batch upsert, sparse/dense hybrid search, and integrations.
qdrant-client — High-Performance Vector Database
What it is
Qdrant is an open-source vector database and similarity search engine written in Rust, with a Python client (qdrant-client) that provides both a REST and a gRPC interface. It is designed for high-throughput production workloads and offers fine-grained payload filtering, named vectors (store multiple embeddings per point), sparse vectors for hybrid search, on-disk HNSW indexing, and built-in quantisation for memory efficiency. The Python client can connect to a remote Qdrant server or run an in-memory/local-file instance without a separate process.
Install
pip install qdrant-client
pip install "qdrant-client[fastembed]" # adds local embedding generation via FastEmbed
Output: (none — exits 0 on success)
Quick example
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
client = QdrantClient(":memory:") # in-memory, no server needed
# Create a collection
client.create_collection(
collection_name="articles",
vectors_config=VectorParams(size=4, distance=Distance.COSINE),
)
# Upsert points (id + vector + payload)
client.upsert(
collection_name="articles",
points=[
PointStruct(id=1, vector=[0.1, 0.2, 0.9, 0.7], payload={"title": "Attention Is All You Need"}),
PointStruct(id=2, vector=[0.3, 0.8, 0.1, 0.2], payload={"title": "BERT Pre-training"}),
PointStruct(id=3, vector=[0.9, 0.1, 0.3, 0.5], payload={"title": "GPT-3 Language Model"}),
],
)
# Search by vector similarity
results = client.search(
collection_name="articles",
query_vector=[0.1, 0.2, 0.8, 0.7],
limit=2,
)
for r in results:
print(f"[{r.score:.3f}] {r.payload['title']}")
Output:
[0.998] Attention Is All You Need
[0.854] GPT-3 Language Model
When / why to use it
- High-throughput RAG pipelines where query latency matters — Qdrant benchmarks among the fastest open-source vector databases on the ANN (approximate nearest neighbour) benchmarks.
- Production workloads that need payload filtering with low latency — Qdrant filters are applied during the HNSW graph traversal, not as a post-processing step.
- Memory-constrained deployments — built-in scalar, product, and binary quantisation reduce RAM usage by 4–32×.
- Multi-vector search — store dense and sparse vectors per point and perform hybrid search in one query.
- When you want a self-hosted solution with a Python-first API and no external JVM/Node dependencies.
Common pitfalls
Point IDs must be unsigned integers or UUIDs — Qdrant rejects string IDs that are not valid UUIDs. Use
str(uuid.uuid4())or an integer counter. Passing arbitrary strings raises a validation error.
Collection vector size is fixed at creation — you cannot change
VectorParams.sizeafter creating a collection. If your embedding model changes to a different dimension, you must recreate the collection and re-index all points.
In-memory client does not persist —
QdrantClient(":memory:")is ideal for testing but data is lost when the process exits. UseQdrantClient(path="./qdrant_storage")for local persistence or connect to a running Qdrant server for production.
Use
client.upload_points()for bulk imports — it streams points to the server in configurable batches and is significantly faster than callingupsert()in a loop.
Enable payload indexing for fields you filter on frequently —
create_payload_index()creates a keyword or range index and makes filtered queries orders of magnitude faster on large collections.
Connecting to Qdrant
from qdrant_client import QdrantClient
# In-memory (testing only — data lost on exit)
client = QdrantClient(":memory:")
# Local file persistence (no server needed)
client = QdrantClient(path="./qdrant_storage")
# Remote Qdrant server (Docker or Qdrant Cloud)
client = QdrantClient(
host="localhost",
port=6333, # REST; gRPC default is 6334
prefer_grpc=True, # faster for large payloads
timeout=10.0,
)
# Qdrant Cloud
client = QdrantClient(
url="https://your-cluster.qdrant.tech",
api_key="your-api-key",
)
print(client.get_collections())
Output:
CollectionsResponse(collections=[])
Creating collections
A collection defines the vector dimensions, distance metric, and optional index settings.
from qdrant_client import QdrantClient
from qdrant_client.models import (
Distance, VectorParams, HnswConfigDiff, OptimizersConfigDiff,
ScalarQuantizationConfig, ScalarType,
)
client = QdrantClient(":memory:")
client.create_collection(
collection_name="documents",
vectors_config=VectorParams(
size=1536, # must match your embedding model
distance=Distance.COSINE, # COSINE | EUCLID | DOT | MANHATTAN
),
hnsw_config=HnswConfigDiff(
m=16, # number of connections per layer
ef_construct=100, # higher = more accurate but slower build
full_scan_threshold=10_000, # use flat scan below this count
),
quantization_config=ScalarQuantizationConfig(
type=ScalarType.INT8, # 4× memory reduction, ~1% accuracy loss
quantile=0.99,
always_ram=True,
),
optimizers_config=OptimizersConfigDiff(
default_segment_number=5,
memmap_threshold=20_000, # mmap vectors to disk above this count
),
)
info = client.get_collection("documents")
print(f"Status: {info.status}, vectors_count: {info.vectors_count}")
Output:
Status: CollectionStatus.GREEN, vectors_count: 0
CRUD operations
from qdrant_client import QdrantClient
from qdrant_client.models import PointStruct, PointIdsList
import uuid
client = QdrantClient(":memory:")
from qdrant_client.models import Distance, VectorParams
client.create_collection("docs", vectors_config=VectorParams(size=4, distance=Distance.COSINE))
# Insert (upsert — insert or update by ID)
client.upsert(
collection_name="docs",
points=[
PointStruct(
id=str(uuid.uuid4()),
vector=[0.1, 0.2, 0.9, 0.7],
payload={"title": "Transformers", "year": 2017, "topic": "nlp"},
),
PointStruct(
id=1, # integer IDs are also valid
vector=[0.3, 0.8, 0.1, 0.2],
payload={"title": "BERT", "year": 2018, "topic": "nlp"},
),
],
)
# Retrieve by ID
points = client.retrieve(collection_name="docs", ids=[1], with_payload=True, with_vectors=True)
print(points[0].payload)
# Update payload (partial — only listed fields are changed)
client.set_payload(
collection_name="docs",
payload={"year": 2019},
points=PointIdsList(points=[1]),
)
# Delete points
client.delete(
collection_name="docs",
points_selector=PointIdsList(points=[1]),
)
print("After delete:", client.count("docs").count)
Output:
{'title': 'BERT', 'year': 2018, 'topic': 'nlp'}
After delete: 1
Batch upsert
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
import numpy as np, uuid
client = QdrantClient(path="./qdrant_storage")
client.recreate_collection(
collection_name="embeddings",
vectors_config=VectorParams(size=384, distance=Distance.COSINE),
)
n = 10_000
vectors = np.random.rand(n, 384).tolist()
payloads = [{"doc_id": i, "chunk": i % 20} for i in range(n)]
ids = [str(uuid.uuid4()) for _ in range(n)]
client.upload_points(
collection_name="embeddings",
points=[
PointStruct(id=ids[i], vector=vectors[i], payload=payloads[i])
for i in range(n)
],
batch_size=256,
parallel=4, # number of upload threads
)
print(f"Total: {client.count('embeddings').count}")
Output:
Total: 10000
Vector search
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct, ScoredPoint
import numpy as np
client = QdrantClient(":memory:")
client.create_collection("docs", vectors_config=VectorParams(size=4, distance=Distance.COSINE))
client.upsert("docs", points=[
PointStruct(id=i, vector=np.random.rand(4).tolist(), payload={"title": f"Doc {i}", "category": "nlp" if i % 2 == 0 else "cv"})
for i in range(20)
])
query_vec = np.random.rand(4).tolist()
# Basic search
results: list[ScoredPoint] = client.search(
collection_name="docs",
query_vector=query_vec,
limit=5,
with_payload=True,
)
for r in results:
print(f"[{r.score:.4f}] id={r.id} | {r.payload['title']}")
Output:
[0.9912] id=7 | Doc 7
[0.9845] id=14 | Doc 14
[0.9801] id=3 | Doc 3
[0.9734] id=11 | Doc 11
[0.9621] id=0 | Doc 0
Filtered search
Qdrant applies payload filters during graph traversal (not post-filtering), so filtered queries have the same sub-millisecond latency as unfiltered ones.
from qdrant_client.models import Filter, FieldCondition, MatchValue, Range
# Exact match filter
nlp_results = client.search(
collection_name="docs",
query_vector=query_vec,
query_filter=Filter(
must=[FieldCondition(key="category", match=MatchValue(value="nlp"))],
),
limit=3,
)
# Range filter
range_results = client.search(
collection_name="docs",
query_vector=query_vec,
query_filter=Filter(
must=[FieldCondition(key="id", range=Range(gte=5, lt=15))],
),
limit=3,
)
# Combined AND / OR / NOT
combined = client.search(
collection_name="docs",
query_vector=query_vec,
query_filter=Filter(
must=[FieldCondition(key="category", match=MatchValue(value="nlp"))],
must_not=[FieldCondition(key="id", match=MatchValue(value=7))],
),
limit=3,
)
for r in combined:
print(f"[{r.score:.4f}] {r.payload['title']}")
Output:
[0.9845] Doc 14
[0.9621] Doc 0
[0.9412] Doc 2
Payload indexing
Creating a payload index speeds up filtered queries on frequently-used fields.
from qdrant_client.models import PayloadSchemaType
# Keyword index — for exact-match filters
client.create_payload_index(
collection_name="docs",
field_name="category",
field_schema=PayloadSchemaType.KEYWORD,
)
# Integer index — for range filters
client.create_payload_index(
collection_name="docs",
field_name="year",
field_schema=PayloadSchemaType.INTEGER,
)
print("Indexes created")
Named vectors — multiple embeddings per point
Named vectors let you store more than one embedding (e.g. dense and sparse, or embeddings from different models) per point and query each independently.
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct, NamedVector
client = QdrantClient(":memory:")
client.create_collection(
collection_name="multi_vec",
vectors_config={
"title_emb": VectorParams(size=384, distance=Distance.COSINE),
"content_emb": VectorParams(size=384, distance=Distance.COSINE),
},
)
import numpy as np
client.upsert(
"multi_vec",
points=[
PointStruct(
id=1,
vector={
"title_emb": np.random.rand(384).tolist(),
"content_emb": np.random.rand(384).tolist(),
},
payload={"title": "Attention Is All You Need"},
),
],
)
# Search using a specific named vector
results = client.search(
collection_name="multi_vec",
query_vector=NamedVector(name="content_emb", vector=np.random.rand(384).tolist()),
limit=1,
)
print(results[0].payload["title"])
Output:
Attention Is All You Need
Hybrid search with sparse vectors
Sparse vectors (like BM25 or SPLADE) can be combined with dense vectors for hybrid search. Requires Qdrant 1.10+.
from qdrant_client import QdrantClient
from qdrant_client.models import (
Distance, VectorParams, SparseVectorParams, PointStruct,
NamedVector, NamedSparseVector, SparseVector, Query, FusionQuery, Fusion,
)
client = QdrantClient(":memory:")
client.create_collection(
collection_name="hybrid",
vectors_config={"dense": VectorParams(size=4, distance=Distance.COSINE)},
sparse_vectors_config={"sparse": SparseVectorParams()},
)
client.upsert(
"hybrid",
points=[
PointStruct(
id=1,
vector={
"dense": [0.1, 0.2, 0.9, 0.7],
"sparse": SparseVector(indices=[10, 42, 789], values=[0.8, 0.4, 0.6]),
},
payload={"title": "Attention Is All You Need"},
),
],
)
# Hybrid query — fuse dense and sparse results with RRF
results = client.query_points(
collection_name="hybrid",
prefetch=[
Query(nearest=NamedVector(name="dense", vector=[0.1, 0.2, 0.8, 0.7])),
Query(nearest=NamedSparseVector(name="sparse", vector=SparseVector(indices=[10, 42], values=[0.9, 0.3]))),
],
query=FusionQuery(fusion=Fusion.RRF),
limit=3,
)
for r in results.points:
print(f"[{r.score:.4f}] {r.payload['title']}")
Output:
[0.0161] Attention Is All You Need
LangChain integration
from langchain_qdrant import QdrantVectorStore
from langchain_openai import OpenAIEmbeddings
from qdrant_client import QdrantClient
import os
client = QdrantClient(":memory:")
embeddings = OpenAIEmbeddings(api_key=os.environ["OPENAI_API_KEY"])
vectorstore = QdrantVectorStore.from_texts(
texts=[
"Transformers use self-attention to process sequences in parallel.",
"BERT is pre-trained with masked language modelling.",
],
embedding=embeddings,
url=":memory:",
collection_name="langchain_demo",
)
retriever = vectorstore.as_retriever(search_kwargs={"k": 2})
docs = retriever.invoke("What is self-attention?")
for doc in docs:
print(doc.page_content)
Output:
Transformers use self-attention to process sequences in parallel.
BERT is pre-trained with masked language modelling.
LlamaIndex integration
from llama_index.vector_stores.qdrant import QdrantVectorStore
from llama_index.core import VectorStoreIndex, StorageContext, Settings
from llama_index.embeddings.openai import OpenAIEmbedding
from qdrant_client import QdrantClient
import os
Settings.embed_model = OpenAIEmbedding(api_key=os.environ["OPENAI_API_KEY"])
client = QdrantClient(":memory:")
vector_store = QdrantVectorStore(client=client, collection_name="llama_demo")
storage_context = StorageContext.from_defaults(vector_store=vector_store)
from llama_index.core import Document
index = VectorStoreIndex.from_documents(
[Document(text="Transformers use self-attention to process sequences in parallel.")],
storage_context=storage_context,
)
query_engine = index.as_query_engine()
response = query_engine.query("What mechanism do transformers use?")
print(response)
Output:
Transformers use self-attention to process sequences in parallel.
Quick reference
| Task | Code |
|---|---|
| In-memory client | QdrantClient(":memory:") |
| Persistent client | QdrantClient(path="./qdrant_storage") |
| Remote client | QdrantClient(url=..., api_key=...) |
| Create collection | client.create_collection("name", vectors_config=VectorParams(size=n, distance=Distance.COSINE)) |
| Upsert points | client.upsert("name", points=[PointStruct(id=..., vector=..., payload=...)]) |
| Bulk import | client.upload_points("name", points=[...], batch_size=256, parallel=4) |
| Search | client.search("name", query_vector=[...], limit=k) |
| Filtered search | client.search(..., query_filter=Filter(must=[FieldCondition(...)])) |
| Exact match filter | FieldCondition(key="field", match=MatchValue(value="val")) |
| Range filter | FieldCondition(key="num", range=Range(gte=0, lt=100)) |
| Payload index | client.create_payload_index("name", "field", PayloadSchemaType.KEYWORD) |
| Retrieve by ID | client.retrieve("name", ids=[1, 2], with_payload=True) |
| Update payload | client.set_payload("name", payload={...}, points=PointIdsList(points=[id])) |
| Delete points | client.delete("name", points_selector=PointIdsList(points=[id])) |
| Point count | client.count("name").count |
| Collection info | client.get_collection("name") |
| Named vectors | VectorParams per-name dict in vectors_config={"dense": ..., "sparse": ...} |