Qdrant Vector Database: Setup, Search, Production

Qdrant is an open source vector database written in Rust that has quietly become the default choice for teams running retrieval at scale outside Postgres. It powers semantic search, recommendation systems, image search, and the retrieval half of most modern RAG stacks. If you have ever stored an embedding and wished you could filter on it without slowing the search to a crawl, Qdrant is the project worth knowing.

Original content from computingforgeeks.com - post 167984

This guide is the starting point of our Qdrant series on Computing for Geeks. It explains what Qdrant actually is, how its internals are laid out, when you should pick it over pgvector or Milvus, how the security and quantization stories work in 2026, and where to go next once you want to run it for real. Every linked sibling article ships with companion code in the c4geeks/qdrant repository, tested on Proxmox VMs and, where relevant, on vast.ai GPU instances. For the canonical reference, the official Qdrant documentation is the source of truth, and the qdrant/qdrant repository hosts the source and releases.

Why teams pick Qdrant in 2026

Qdrant earned its current position on three properties that matter once you take a vector workload past the proof of concept stage. Search stays fast even when you stack filters on top of vector similarity, the memory footprint scales down by 4x to 40x with quantization, and the operational surface is small enough that a single engineer can keep it healthy in production.

The project shipped its first release in 2021 and is licensed under Apache 2.0, which means you can self host it without surprise license clauses. The official Docker image, Helm chart, and managed cloud all share the same binary, so the path from a laptop demo to a Kubernetes cluster does not involve learning a second product.

Filtered HNSW search. The HNSW index walk is filter aware, so a query like “vectors similar to X, also matching category Y and price under Z” stays in single digit milliseconds on millions of points.
Quantization that actually ships. Scalar quantization cuts memory by 4x with a small recall loss, product quantization goes 8x to 64x, and binary quantization reaches 32x and is up to 40x faster on the search path thanks to SIMD popcount.
Hybrid dense plus sparse. Named vectors and the sparse vector API let you combine semantic similarity with BM25 style keyword retrieval in a single query.
Raft based clustering. A three node cluster gives you sharding, replication, and automatic shard re election after a node failure.
Honest defaults. The default Docker run is open and unauthenticated and the documentation says so loudly. Production posture is opt in, not buried behind a config flag the docs forget to mention.

What is a vector database, in one paragraph

A vector database stores high dimensional numerical arrays called embeddings and answers nearest neighbour queries against them. An embedding is what a model like CLIP, BGE, or OpenAI text-embedding-3 produces when you hand it a piece of text, an image, or audio. Two embeddings sit close in vector space when their inputs are semantically similar. The database does not understand meaning, but it understands distance, and that is enough to power semantic search, recommendations, deduplication, anomaly detection, and the retrieval step in a RAG pipeline. For a deeper primer, our vector search vs traditional search explainer walks through why dense retrieval beats keyword matching for fuzzy intent queries.

Core Qdrant concepts at a glance

Five concepts will appear in every article in this series. Knowing them saves you a lot of doc skimming later.

Concept	What it is
Collection	The top level namespace. Holds points, an HNSW index, and the payload schema. Roughly analogous to a table in SQL.
Point	One record. Carries an ID, one or more vectors, and a JSON payload of metadata.
Vector	The float array the search engine compares. A collection can hold multiple named vectors per point (for example, a dense and a sparse vector side by side).
Payload	Arbitrary JSON metadata attached to a point. Filterable, indexable, and returned with search results.
Shard and replica	In a cluster, each collection is split into shards, and each shard can have multiple replicas distributed across nodes for fault tolerance.

How Qdrant works under the hood

A search request enters Qdrant on port 6333 (REST) or 6334 (gRPC). The collection layer routes the query to one of its segments, the segment runs the HNSW index walk to find candidate neighbours, payload filters prune candidates that fail metadata predicates, and the top scoring points are returned with their payload attached.

Writes follow a similar path. Each upsert hits a write ahead log first for durability, then lands in the memtable of the active segment. A background optimizer eventually flushes the memtable into a fresh segment, builds the HNSW graph for that segment, and applies quantization if the collection asks for it. In a cluster, writes are forwarded to the replicas of the target shard, with the shard leader coordinating acknowledgement.

HNSW index. Hierarchical Navigable Small World graph. Builds a layered proximity graph that supports logarithmic time nearest neighbour search. Qdrant ships a filter aware variant that does not blow up when you stack must, should, and must_not clauses on the search.
Segments and the optimizer. A collection is a collection of segments. The optimizer merges small segments, builds indexes on freshly written segments, and applies quantization in the background.
WAL. The write ahead log makes upserts crash safe and is the only thing that has to be fsync’d on the hot write path.
Snapshots. Both per collection and full storage snapshots produce a tar archive that can be restored by upload, URL fetch, or at startup. Since v1.10, snapshots can write directly to S3 compatible object storage.
Raft consensus. In cluster mode, raft commits cluster topology and collection metadata. Point writes themselves bypass raft and replicate directly between shard replicas for throughput.

Try Qdrant in 60 seconds

If you have Docker installed, you can have Qdrant running with its Web UI open in under a minute. This is not the production setup, just the fastest way to see what the system looks like before reading the rest of the series.

docker run -d --name qdrant \
  -p 6333:6333 -p 6334:6334 \
  -v "$(pwd)/qdrant_storage:/qdrant/storage:z" \
  qdrant/qdrant

Open http://localhost:6333/dashboard in a browser and you are looking at the built in Web UI with the Console, Collections panel, and the interactive tutorial. The REST API is on the same port, gRPC is on 6334, and the storage volume keeps your data between container restarts.

For real workloads, start with our install Qdrant on Ubuntu walkthrough which covers Docker Compose, the native .deb path, UFW rules, and systemd. The same series has dedicated install guides for Rocky Linux 10 and Debian 13.

Deployment options compared

Qdrant ships in five recognisable shapes. Pick the one that matches both your operational appetite and your data residency rules.

Option	Best for	Effort
Docker	Single node demos, dev laptops, small production servers	Minutes
Native .deb / .rpm	Hardened bare metal with systemd, SELinux, and no Docker	Tens of minutes
Helm on Kubernetes	Production self host with HA, ingress, and GitOps	Hours, once
Qdrant Cloud (managed)	Teams that do not want to operate the database themselves	None
GPU image	Indexing heavy workloads at tens of millions of vectors and up	Same as Docker, plus drivers

The Docker image and the native packages share the same binary, the same config schema, and the same ports. Switching between them later is a matter of moving the storage volume across and restarting. The Helm chart wraps a StatefulSet plus a PVC per pod, which means you cannot back Qdrant with NFS or object storage at the file system level. Qdrant requires block storage with POSIX semantics.

Qdrant Cloud vs self host: how to decide

The honest answer is that small teams without a platform engineer should usually start on Qdrant Cloud, and teams with existing Kubernetes practice should usually self host. The break even point sits roughly where your monthly managed bill crosses the cost of one engineer day per quarter spent on Qdrant operations.

Dimension	Qdrant Cloud	Self host
Operational cost	Predictable monthly bill, no oncall	VM or cluster cost plus engineer time
Data residency	AWS / GCP / Azure regions, no on premises	Anywhere you can run Linux
Upgrades	Managed, with maintenance windows	Your responsibility, but reproducible
Backups	Built in, no S3 plumbing	Snapshots to your own S3 compatible storage
SLA	Contractual, with paid tiers	Whatever you write in your runbook
Customisation	Limited to config knobs	Full access to config.yaml, JVM, networking

Hybrid Cloud sits between the two. The control plane runs in Qdrant Cloud and the data plane runs in your Kubernetes cluster, which keeps your vectors inside your network while outsourcing the operations.

Security model: API key, JWT, and TLS

Out of the box Qdrant binds to all interfaces with no authentication. That is the right default for a laptop, and a disaster on a public VPS. The security story for production has three layers, each of which gets its own dedicated article in this series.

Static API key. Set service.api_key in config.yaml or pass QDRANT__SERVICE__API_KEY as an environment variable. Every request must then carry an api-key header. Pair it with TLS so the key never crosses the wire in plaintext.
JWT and RBAC. Enable service.jwt_rbac on top of the static key. Tokens are signed with the API key as the HMAC secret and can be scoped to read only, write only, specific collections, or specific payload values. This is how you give a dashboard a read only token without leaking your master key.
TLS. Either terminate at Nginx or Caddy in front of Qdrant, or configure tls.cert and tls.key inside Qdrant itself. Production grade gRPC needs HTTP/2 plus TLS, which is a small but real Nginx configuration job.

The full hardening walkthrough lives in our secure Qdrant with API key, TLS, and Nginx article, and the granular access story has its own deep dive in Qdrant JWT and RBAC.

Performance and quantization in plain numbers

The single biggest knob you have in Qdrant is quantization. It controls how much memory the search engine uses and, in the binary case, how fast distance comparisons run. There are three flavours, and choosing between them is a recall versus cost trade off you can make per collection.

Mode	Memory saving	Speed effect	Recall impact
None (float32)	1x baseline	Baseline	Reference
Scalar (int8)	~4x	Slightly faster (int8 maths is cheaper)	Around 1% recall loss
Product	8x to 64x, configurable	Comparable to baseline, depends on subvector count	Tunable, larger savings cost more recall
Binary	~32x	Up to 40x faster (XOR + popcount)	5% to 10% loss, mitigated by rescoring

A common production pattern is asymmetric quantization. Stored vectors get binary quantized for maximum memory savings and speed, while the query vector stays at full float precision. The first stage returns a coarse top-k, then a rescoring pass over the full vectors restores most of the lost recall. Our performance tuning article runs all three modes against the same 1 million vector dataset and reports the actual numbers.

HNSW itself has three tunable knobs worth knowing. m controls graph connectivity, ef_construct controls build time accuracy, and ef controls per query accuracy. Defaults are sensible for most workloads. The time to revisit them is when you have a clear recall target and a measured baseline to compare against.

When Qdrant vs alternatives

There is no universally best vector database. There is the one that fits your stack, your scale, and your team. Here is the short decision tree we use when starting a new project.

You already run Postgres and have under 10 million vectors. Use pgvector. One less moving part, joins against your existing tables work natively, and pgvector 0.9 has closed most of the speed gap. Our install pgvector on PostgreSQL 17 guide is the place to start.
You need filtered search at speed, want to self host, and outgrow pgvector. Use Qdrant. Filtered HNSW is genuinely best in class and the operational surface is small.
You are working at billion vector scale. Look at Milvus first. It was designed for that tier and its sharding and partitioning story is the most mature.
You want batteries included modules (text, image, generative) out of the box. Consider Weaviate. It bundles more, at the cost of being a larger thing to operate.
You do not want to operate a vector database at all. Qdrant Cloud or Pinecone. Pick on price and on which regions you need.

The full benchmark and decision matrix with measured latency, RPS, and recall numbers lives in our Qdrant vs pgvector vs Milvus vs Weaviate comparison.

Operations: snapshots, monitoring, scaling

The day two story is where vector databases either earn their keep or quietly cost you an outage. Qdrant gets a few things right by default and a few more right if you remember to turn them on.

Snapshots. Per collection and full storage snapshots are a single REST call. Since v1.10 they can write straight to S3 compatible object storage, which makes daily systemd timers trivial. The one rough edge is that restoring from S3 still goes through an HTTP upload step, which we cover in the snapshots and S3 backup article.
Monitoring. Qdrant exposes Prometheus and OpenMetrics on /metrics and a system metrics endpoint on /sys_metrics. There are at least three good Grafana dashboards you can import. We publish the prometheus.yml plus a vetted dashboard JSON in the Qdrant Prometheus and Grafana article.
Scaling vertically. RAM is the resource that runs out first because the HNSW graph plus the unquantized vectors sit in memory by default. Switching on quantization or moving vectors to disk with on_disk: true is usually cheaper than buying a bigger box.
Scaling horizontally. A three node raft cluster is the smallest sensible unit. You get sharding, replication, and the ability to lose a node without losing a write. The distributed cluster article walks through the bootstrap, the failure drill, and shard transfer.
GPU acceleration. Since v1.13, Qdrant ships a GPU enabled image that accelerates HNSW index build. It is not a query accelerator, so it only pays back on large indexing jobs. The GPU acceleration on vast.ai article benchmarks the cost of an RTX 4090 hour against the equivalent CPU build time.

Your roadmap to learning Qdrant

The rest of the series is laid out in the order most teams actually adopt the database. Skim the foundations, settle on the install path that matches your OS, then jump straight to the operational article that matches the problem in front of you.

Install Qdrant on your OS

Install Qdrant on Ubuntu 26.04 / 24.04 LTS, with both Docker and native .deb paths
Install Qdrant on Rocky Linux 10 / AlmaLinux 10, including SELinux contexts and firewalld rules
Install Qdrant on Debian 13 / 12, with Docker Compose and journald log forwarding

Use Qdrant: core operations

Qdrant Web UI tour: Dashboard, Console, Graph view, and the interactive tutorial
Create, configure, and manage collections, including named vectors and sparse vectors
REST and gRPC APIs in practice, with curl, Python, and Go examples
Filters and payload indexes, including hybrid dense plus sparse search

Run Qdrant in production

Secure Qdrant with API key, TLS, and Nginx reverse proxy
Qdrant JWT and RBAC for granular access
Snapshots, backup, and restore with S3 storage
Monitor Qdrant with Prometheus and Grafana

Scale and optimise

Deploy a 3 node Qdrant distributed cluster
Deploy Qdrant on Kubernetes with Helm
Performance tuning: quantization, HNSW, indexing
Run Qdrant with GPU acceleration on vast.ai

Build something with Qdrant

Local RAG with Qdrant, Ollama, and LangChain. Pairs naturally with our existing self hosted RAG with Ollama and pgvector article for a side by side view of both backends
PDF question answering with Qdrant and LlamaIndex
Visual image search with Qdrant and CLIP
Semantic search API with Qdrant and FastAPI
Qdrant and n8n no code vector workflows

Compare and migrate

Qdrant vs pgvector vs Milvus vs Weaviate: benchmark and decision guide
Migrate from Pinecone to Qdrant with verification
Qdrant commands and API cheat sheet

Sibling articles ship over the next several weeks. The pillar is updated each time a new one lands, so bookmark this page if you want a single index that always points to the latest set.

Wrapping up

Qdrant has matured into the obvious default for self hosted vector search outside Postgres. The architecture is small enough to learn in an afternoon, the operational surface fits in one engineer’s head, and the quantization story makes the memory bill bearable even at tens of millions of vectors. If you are evaluating it for the first time, run the 60 second Docker command at the top of this page, take the Web UI tour, then work through the install guide for your OS and the security article. Everything else in this series builds on those two.

All companion code is open and tracked in the c4geeks/qdrant repository. If a command in any of the linked articles does not reproduce on a clean VM, open an issue there and we will fix the article.