cheat sheet

qdrant-client

Store and search vector embeddings with the Qdrant Python client. Covers collections, CRUD, filtered vector search, payload indexing, batch upsert, sparse/dense hybrid search, and integrations.

updated 04-27-2026

qdrant-client — High-Performance Vector Database

What it is

Qdrant is an open-source vector database and similarity search engine written in Rust, with a Python client (qdrant-client) that provides both a REST and a gRPC interface. It is designed for high-throughput production workloads and offers fine-grained payload filtering, named vectors (store multiple embeddings per point), sparse vectors for hybrid search, on-disk HNSW indexing, and built-in quantisation for memory efficiency. The Python client can connect to a remote Qdrant server or run an in-memory/local-file instance without a separate process.

Install

bash

pip install qdrant-client
pip install "qdrant-client[fastembed]"   # adds local embedding generation via FastEmbed

Output: (none — exits 0 on success)

Quick example

python

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct

client = QdrantClient(":memory:")   # in-memory, no server needed

# Create a collection
client.create_collection(
    collection_name="articles",
    vectors_config=VectorParams(size=4, distance=Distance.COSINE),
)

# Upsert points (id + vector + payload)
client.upsert(
    collection_name="articles",
    points=[
        PointStruct(id=1, vector=[0.1, 0.2, 0.9, 0.7], payload={"title": "Attention Is All You Need"}),
        PointStruct(id=2, vector=[0.3, 0.8, 0.1, 0.2], payload={"title": "BERT Pre-training"}),
        PointStruct(id=3, vector=[0.9, 0.1, 0.3, 0.5], payload={"title": "GPT-3 Language Model"}),
    ],
)

# Search by vector similarity
results = client.search(
    collection_name="articles",
    query_vector=[0.1, 0.2, 0.8, 0.7],
    limit=2,
)
for r in results:
    print(f"[{r.score:.3f}] {r.payload['title']}")

Output:

text

[0.998] Attention Is All You Need
[0.854] GPT-3 Language Model

When / why to use it

High-throughput RAG pipelines where query latency matters — Qdrant benchmarks among the fastest open-source vector databases on the ANN (approximate nearest neighbour) benchmarks.
Production workloads that need payload filtering with low latency — Qdrant filters are applied during the HNSW graph traversal, not as a post-processing step.
Memory-constrained deployments — built-in scalar, product, and binary quantisation reduce RAM usage by 4–32×.
Multi-vector search — store dense and sparse vectors per point and perform hybrid search in one query.
When you want a self-hosted solution with a Python-first API and no external JVM/Node dependencies.

Common pitfalls

Point IDs must be unsigned integers or UUIDs — Qdrant rejects string IDs that are not valid UUIDs. Use str(uuid.uuid4()) or an integer counter. Passing arbitrary strings raises a validation error.

Collection vector size is fixed at creation — you cannot change VectorParams.size after creating a collection. If your embedding model changes to a different dimension, you must recreate the collection and re-index all points.

In-memory client does not persist — QdrantClient(":memory:") is ideal for testing but data is lost when the process exits. Use QdrantClient(path="./qdrant_storage") for local persistence or connect to a running Qdrant server for production.

Use client.upload_points() for bulk imports — it streams points to the server in configurable batches and is significantly faster than calling upsert() in a loop.

Enable payload indexing for fields you filter on frequently — create_payload_index() creates a keyword or range index and makes filtered queries orders of magnitude faster on large collections.

Connecting to Qdrant

python

from qdrant_client import QdrantClient

# In-memory (testing only — data lost on exit)
client = QdrantClient(":memory:")

# Local file persistence (no server needed)
client = QdrantClient(path="./qdrant_storage")

# Remote Qdrant server (Docker or Qdrant Cloud)
client = QdrantClient(
    host="localhost",
    port=6333,               # REST; gRPC default is 6334
    prefer_grpc=True,        # faster for large payloads
    timeout=10.0,
)

# Qdrant Cloud
client = QdrantClient(
    url="https://your-cluster.qdrant.tech",
    api_key="your-api-key",
)

print(client.get_collections())

Output:

text

CollectionsResponse(collections=[])

Creating collections

A collection defines the vector dimensions, distance metric, and optional index settings.

python

from qdrant_client import QdrantClient
from qdrant_client.models import (
    Distance, VectorParams, HnswConfigDiff, OptimizersConfigDiff,
    ScalarQuantizationConfig, ScalarType,
)

client = QdrantClient(":memory:")

client.create_collection(
    collection_name="documents",
    vectors_config=VectorParams(
        size=1536,                  # must match your embedding model
        distance=Distance.COSINE,   # COSINE | EUCLID | DOT | MANHATTAN
    ),
    hnsw_config=HnswConfigDiff(
        m=16,                       # number of connections per layer
        ef_construct=100,           # higher = more accurate but slower build
        full_scan_threshold=10_000, # use flat scan below this count
    ),
    quantization_config=ScalarQuantizationConfig(
        type=ScalarType.INT8,       # 4× memory reduction, ~1% accuracy loss
        quantile=0.99,
        always_ram=True,
    ),
    optimizers_config=OptimizersConfigDiff(
        default_segment_number=5,
        memmap_threshold=20_000,    # mmap vectors to disk above this count
    ),
)

info = client.get_collection("documents")
print(f"Status: {info.status}, vectors_count: {info.vectors_count}")

Output:

text

Status: CollectionStatus.GREEN, vectors_count: 0

CRUD operations

python

from qdrant_client import QdrantClient
from qdrant_client.models import PointStruct, PointIdsList
import uuid

client = QdrantClient(":memory:")
from qdrant_client.models import Distance, VectorParams
client.create_collection("docs", vectors_config=VectorParams(size=4, distance=Distance.COSINE))

# Insert (upsert — insert or update by ID)
client.upsert(
    collection_name="docs",
    points=[
        PointStruct(
            id=str(uuid.uuid4()),
            vector=[0.1, 0.2, 0.9, 0.7],
            payload={"title": "Transformers", "year": 2017, "topic": "nlp"},
        ),
        PointStruct(
            id=1,                    # integer IDs are also valid
            vector=[0.3, 0.8, 0.1, 0.2],
            payload={"title": "BERT", "year": 2018, "topic": "nlp"},
        ),
    ],
)

# Retrieve by ID
points = client.retrieve(collection_name="docs", ids=[1], with_payload=True, with_vectors=True)
print(points[0].payload)

# Update payload (partial — only listed fields are changed)
client.set_payload(
    collection_name="docs",
    payload={"year": 2019},
    points=PointIdsList(points=[1]),
)

# Delete points
client.delete(
    collection_name="docs",
    points_selector=PointIdsList(points=[1]),
)

print("After delete:", client.count("docs").count)

Output:

text

{'title': 'BERT', 'year': 2018, 'topic': 'nlp'}
After delete: 1

Batch upsert

python

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
import numpy as np, uuid

client = QdrantClient(path="./qdrant_storage")
client.recreate_collection(
    collection_name="embeddings",
    vectors_config=VectorParams(size=384, distance=Distance.COSINE),
)

n = 10_000
vectors  = np.random.rand(n, 384).tolist()
payloads = [{"doc_id": i, "chunk": i % 20} for i in range(n)]
ids      = [str(uuid.uuid4()) for _ in range(n)]

client.upload_points(
    collection_name="embeddings",
    points=[
        PointStruct(id=ids[i], vector=vectors[i], payload=payloads[i])
        for i in range(n)
    ],
    batch_size=256,
    parallel=4,        # number of upload threads
)

print(f"Total: {client.count('embeddings').count}")

Output:

text

Total: 10000

Vector search

python

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct, ScoredPoint
import numpy as np

client = QdrantClient(":memory:")
client.create_collection("docs", vectors_config=VectorParams(size=4, distance=Distance.COSINE))
client.upsert("docs", points=[
    PointStruct(id=i, vector=np.random.rand(4).tolist(), payload={"title": f"Doc {i}", "category": "nlp" if i % 2 == 0 else "cv"})
    for i in range(20)
])

query_vec = np.random.rand(4).tolist()

# Basic search
results: list[ScoredPoint] = client.search(
    collection_name="docs",
    query_vector=query_vec,
    limit=5,
    with_payload=True,
)
for r in results:
    print(f"[{r.score:.4f}] id={r.id} | {r.payload['title']}")

Output:

text

[0.9912] id=7  | Doc 7
[0.9845] id=14 | Doc 14
[0.9801] id=3  | Doc 3
[0.9734] id=11 | Doc 11
[0.9621] id=0  | Doc 0

Filtered search

Qdrant applies payload filters during graph traversal (not post-filtering), so filtered queries have the same sub-millisecond latency as unfiltered ones.

python

from qdrant_client.models import Filter, FieldCondition, MatchValue, Range

# Exact match filter
nlp_results = client.search(
    collection_name="docs",
    query_vector=query_vec,
    query_filter=Filter(
        must=[FieldCondition(key="category", match=MatchValue(value="nlp"))],
    ),
    limit=3,
)

# Range filter
range_results = client.search(
    collection_name="docs",
    query_vector=query_vec,
    query_filter=Filter(
        must=[FieldCondition(key="id", range=Range(gte=5, lt=15))],
    ),
    limit=3,
)

# Combined AND / OR / NOT
combined = client.search(
    collection_name="docs",
    query_vector=query_vec,
    query_filter=Filter(
        must=[FieldCondition(key="category", match=MatchValue(value="nlp"))],
        must_not=[FieldCondition(key="id", match=MatchValue(value=7))],
    ),
    limit=3,
)
for r in combined:
    print(f"[{r.score:.4f}] {r.payload['title']}")

Output:

text

[0.9845] Doc 14
[0.9621] Doc 0
[0.9412] Doc 2

Payload indexing

Creating a payload index speeds up filtered queries on frequently-used fields.

python

from qdrant_client.models import PayloadSchemaType

# Keyword index — for exact-match filters
client.create_payload_index(
    collection_name="docs",
    field_name="category",
    field_schema=PayloadSchemaType.KEYWORD,
)

# Integer index — for range filters
client.create_payload_index(
    collection_name="docs",
    field_name="year",
    field_schema=PayloadSchemaType.INTEGER,
)

print("Indexes created")

Named vectors — multiple embeddings per point

Named vectors let you store more than one embedding (e.g. dense and sparse, or embeddings from different models) per point and query each independently.

python

from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct, NamedVector

client = QdrantClient(":memory:")

client.create_collection(
    collection_name="multi_vec",
    vectors_config={
        "title_emb":   VectorParams(size=384, distance=Distance.COSINE),
        "content_emb": VectorParams(size=384, distance=Distance.COSINE),
    },
)

import numpy as np
client.upsert(
    "multi_vec",
    points=[
        PointStruct(
            id=1,
            vector={
                "title_emb":   np.random.rand(384).tolist(),
                "content_emb": np.random.rand(384).tolist(),
            },
            payload={"title": "Attention Is All You Need"},
        ),
    ],
)

# Search using a specific named vector
results = client.search(
    collection_name="multi_vec",
    query_vector=NamedVector(name="content_emb", vector=np.random.rand(384).tolist()),
    limit=1,
)
print(results[0].payload["title"])

Output:

text

Attention Is All You Need

Hybrid search with sparse vectors

Sparse vectors (like BM25 or SPLADE) can be combined with dense vectors for hybrid search. Requires Qdrant 1.10+.

python

from qdrant_client import QdrantClient
from qdrant_client.models import (
    Distance, VectorParams, SparseVectorParams, PointStruct,
    NamedVector, NamedSparseVector, SparseVector, Query, FusionQuery, Fusion,
)

client = QdrantClient(":memory:")

client.create_collection(
    collection_name="hybrid",
    vectors_config={"dense": VectorParams(size=4, distance=Distance.COSINE)},
    sparse_vectors_config={"sparse": SparseVectorParams()},
)

client.upsert(
    "hybrid",
    points=[
        PointStruct(
            id=1,
            vector={
                "dense":  [0.1, 0.2, 0.9, 0.7],
                "sparse": SparseVector(indices=[10, 42, 789], values=[0.8, 0.4, 0.6]),
            },
            payload={"title": "Attention Is All You Need"},
        ),
    ],
)

# Hybrid query — fuse dense and sparse results with RRF
results = client.query_points(
    collection_name="hybrid",
    prefetch=[
        Query(nearest=NamedVector(name="dense", vector=[0.1, 0.2, 0.8, 0.7])),
        Query(nearest=NamedSparseVector(name="sparse", vector=SparseVector(indices=[10, 42], values=[0.9, 0.3]))),
    ],
    query=FusionQuery(fusion=Fusion.RRF),
    limit=3,
)
for r in results.points:
    print(f"[{r.score:.4f}] {r.payload['title']}")

Output:

text

[0.0161] Attention Is All You Need

LangChain integration

python

from langchain_qdrant import QdrantVectorStore
from langchain_openai import OpenAIEmbeddings
from qdrant_client import QdrantClient
import os

client = QdrantClient(":memory:")
embeddings = OpenAIEmbeddings(api_key=os.environ["OPENAI_API_KEY"])

vectorstore = QdrantVectorStore.from_texts(
    texts=[
        "Transformers use self-attention to process sequences in parallel.",
        "BERT is pre-trained with masked language modelling.",
    ],
    embedding=embeddings,
    url=":memory:",
    collection_name="langchain_demo",
)

retriever = vectorstore.as_retriever(search_kwargs={"k": 2})
docs = retriever.invoke("What is self-attention?")
for doc in docs:
    print(doc.page_content)

Output:

text

Transformers use self-attention to process sequences in parallel.
BERT is pre-trained with masked language modelling.

LlamaIndex integration

python

from llama_index.vector_stores.qdrant import QdrantVectorStore
from llama_index.core import VectorStoreIndex, StorageContext, Settings
from llama_index.embeddings.openai import OpenAIEmbedding
from qdrant_client import QdrantClient
import os

Settings.embed_model = OpenAIEmbedding(api_key=os.environ["OPENAI_API_KEY"])

client = QdrantClient(":memory:")
vector_store = QdrantVectorStore(client=client, collection_name="llama_demo")
storage_context = StorageContext.from_defaults(vector_store=vector_store)

from llama_index.core import Document
index = VectorStoreIndex.from_documents(
    [Document(text="Transformers use self-attention to process sequences in parallel.")],
    storage_context=storage_context,
)

query_engine = index.as_query_engine()
response = query_engine.query("What mechanism do transformers use?")
print(response)

Output:

text

Transformers use self-attention to process sequences in parallel.

Quick reference

Task	Code
In-memory client	`QdrantClient(":memory:")`
Persistent client	`QdrantClient(path="./qdrant_storage")`
Remote client	`QdrantClient(url=..., api_key=...)`
Create collection	`client.create_collection("name", vectors_config=VectorParams(size=n, distance=Distance.COSINE))`
Upsert points	`client.upsert("name", points=[PointStruct(id=..., vector=..., payload=...)])`
Bulk import	`client.upload_points("name", points=[...], batch_size=256, parallel=4)`
Search	`client.search("name", query_vector=[...], limit=k)`
Filtered search	`client.search(..., query_filter=Filter(must=[FieldCondition(...)]))`
Exact match filter	`FieldCondition(key="field", match=MatchValue(value="val"))`
Range filter	`FieldCondition(key="num", range=Range(gte=0, lt=100))`
Payload index	`client.create_payload_index("name", "field", PayloadSchemaType.KEYWORD)`
Retrieve by ID	`client.retrieve("name", ids=[1, 2], with_payload=True)`
Update payload	`client.set_payload("name", payload={...}, points=PointIdsList(points=[id]))`
Delete points	`client.delete("name", points_selector=PointIdsList(points=[id]))`
Point count	`client.count("name").count`
Collection info	`client.get_collection("name")`
Named vectors	`VectorParams` per-name dict in `vectors_config={"dense": ..., "sparse": ...}`