pgvector turns PostgreSQL into a vector database. If you already run PostgreSQL for your applications, you can store and query vector embeddings without adding another service to your stack. That means semantic search, recommendation engines, and RAG pipelines all running inside the database you already manage and back up.
This guide covers installing pgvector on PostgreSQL 17 for both Rocky Linux 10 and Ubuntu 24.04, creating vector columns, building indexes (HNSW and IVFFlat), running similarity searches, and practical demos you can adapt for your own projects.
Tested March 2026 on Rocky Linux 10.1 (SELinux enforcing) and Ubuntu 24.04 LTS with PostgreSQL 17.9, pgvector 0.8.2
What You Need
- Rocky Linux 10 or Ubuntu 24.04 with PostgreSQL 17 installed
- Root or sudo access
- PGDG repository configured (the PostgreSQL install guide covers this)
pgvector itself is lightweight. It adds no background processes and uses memory only when you query or build indexes.
Install pgvector
pgvector is available as a prebuilt package from the official PostgreSQL repository (PGDG) on both OS families. No compilation needed.
Rocky Linux 10 / AlmaLinux 10 / RHEL 10
Install pgvector from the PGDG repository:
sudo dnf install -y pgvector_17
The package name follows the pattern pgvector_<pg-version>. For PostgreSQL 16, that would be pgvector_16.
Ubuntu 24.04 / Debian 13
On Debian-based systems, install the pgvector package:
sudo apt install -y postgresql-17-pgvector
On Debian/Ubuntu the pattern is postgresql-<pg-version>-pgvector.
Enable the Extension
After installing the package, enable the extension inside the database where you need vectors:
sudo -u postgres psql -c "CREATE EXTENSION vector;"
Verify the extension is active and check the version:
sudo -u postgres psql -c "SELECT extname, extversion FROM pg_extension WHERE extname = 'vector';"
The output confirms pgvector is loaded:
extname | extversion
---------+------------
vector | 0.8.2
(1 row)
Vector Data Types
pgvector 0.8.2 supports four data types for different use cases:
| Type | Storage | Max Dimensions | Use Case |
|---|---|---|---|
vector | 32-bit float | 16,000 | Standard embeddings (most common) |
halfvec | 16-bit float | 16,000 | Half the memory, slight precision loss |
sparsevec | Sparse format | 16,000 | Mostly-zero vectors (BM25, TF-IDF) |
bit | 1-bit per dim | 64,000 | Binary embeddings, Hamming distance |
For most AI embedding models (OpenAI, Ollama, Cohere), use vector. Switch to halfvec when memory is tight and you can tolerate minor precision differences.
Create Tables with Vector Columns
A vector column stores fixed-length arrays of floats. Specify the number of dimensions when creating the column. Common embedding dimensions are 384 (MiniLM, nomic-embed-text), 768 (BERT), 1536 (OpenAI text-embedding-ada-002), and 3072 (OpenAI text-embedding-3-large).
Connect to PostgreSQL:
sudo -u postgres psql
Create a table with a 384-dimension vector column (matches Ollama’s nomic-embed-text model):
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
title TEXT NOT NULL,
content TEXT,
embedding vector(384)
);
Distance Operators
pgvector provides operators for different distance metrics. The choice depends on how your embedding model was trained:
| Operator | Distance Metric | Index Ops Class | When to Use |
|---|---|---|---|
<=> | Cosine distance | vector_cosine_ops | Most embedding models (default choice) |
<-> | L2 (Euclidean) | vector_l2_ops | When magnitude matters (image features) |
<#> | Negative inner product | vector_ip_ops | Normalized vectors, maximum inner product search |
Cosine distance (<=>) is the safest default. Most text embedding models are optimized for cosine similarity. Here is a quick demo:
CREATE TABLE items (
id SERIAL PRIMARY KEY,
name TEXT,
embedding vector(3)
);
INSERT INTO items (name, embedding) VALUES
('cat', '[0.1, 0.2, 0.8]'),
('dog', '[0.15, 0.25, 0.75]'),
('fish', '[0.9, 0.1, 0.05]'),
('bird', '[0.3, 0.7, 0.4]'),
('snake', '[0.85, 0.05, 0.1]');
Find items most similar to “cat” using cosine distance:
SELECT name, 1 - (embedding <=> '[0.1, 0.2, 0.8]') AS similarity
FROM items
ORDER BY embedding <=> '[0.1, 0.2, 0.8]'
LIMIT 3;
“Dog” ranks highest because its vector is closest to “cat” in direction:
name | similarity
------+--------------------
cat | 1
dog | 0.9948949708177414
bird | 0.6857339969219595
(3 rows)
All three distance operators side by side on the same data:
SELECT name,
embedding <-> '[0.1, 0.2, 0.8]' AS l2_distance,
embedding <=> '[0.1, 0.2, 0.8]' AS cosine_distance,
embedding <#> '[0.1, 0.2, 0.8]' AS neg_inner_product
FROM items
ORDER BY embedding <=> '[0.1, 0.2, 0.8]';
The results show how each metric measures distance differently. L2 uses absolute magnitude, cosine ignores magnitude and compares direction only, and inner product combines both:
name | l2_distance | cosine_distance | neg_inner_product
-------+-----------------+-----------------+----------------------
cat | 0 | 0 | -0.690000057220459
dog | 0.0866025474761 | 0.005105029182 | -0.6650000214576721
bird | 0.6708204287913 | 0.314266003078 | -0.49000000953674316
snake | 1.0368220331737 | 0.754263412560 | -0.17500001192092896
fish | 1.101135712321 | 0.800887375587 | -0.15000000596046448
(5 rows)
Build Vector Indexes
Without an index, pgvector performs exact (brute-force) nearest neighbor search by scanning every row. This is accurate but slow on large datasets. Indexes give you approximate nearest neighbor (ANN) search that is dramatically faster at the cost of slightly less perfect recall.
HNSW Index (Recommended)
HNSW (Hierarchical Navigable Small World) is the preferred index type for most workloads. It builds a multi-layer graph and delivers consistently fast queries without the training step that IVFFlat requires.
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
Key parameters:
| Parameter | Default | Effect |
|---|---|---|
m | 16 | Connections per node. Higher = better recall, more memory, slower builds |
ef_construction | 64 | Build-time search depth. Higher = better index quality, slower builds |
At query time, control the search accuracy vs speed tradeoff with hnsw.ef_search:
SET hnsw.ef_search = 100; -- higher = better recall, slower
IVFFlat Index
IVFFlat divides vectors into clusters (lists) and searches only the nearest clusters at query time. It requires data in the table before building (unlike HNSW which can be built on an empty table). Use IVFFlat when memory is constrained because it uses less memory than HNSW.
CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
The lists parameter controls the number of clusters. A good starting point is the square root of the row count: for 1 million rows, use lists = 1000. At query time, ivfflat.probes controls how many clusters to search:
SET ivfflat.probes = 10; -- search 10 clusters (default is 1)
Which Index to Choose?
| Criteria | HNSW | IVFFlat |
|---|---|---|
| Build speed | Slower | Faster |
| Query speed | Faster | Slightly slower |
| Memory usage | Higher | Lower |
| Recall at default settings | Better | Lower (needs tuning) |
| Requires data before build? | No | Yes |
| Best for | Most workloads | Memory-constrained, batch scenarios |
Start with HNSW unless you have a specific reason to use IVFFlat.
Performance Tuning
pgvector benefits from the same PostgreSQL tuning you would apply for any analytics workload, plus a few vector-specific settings.
Edit the PostgreSQL configuration:
sudo vi /var/lib/pgsql/17/data/postgresql.conf
On Ubuntu, the path is /etc/postgresql/17/main/postgresql.conf.
Key settings to adjust:
# Memory for sorting and index operations
work_mem = '256MB'
maintenance_work_mem = '1GB'
# Parallel workers for index building (speeds up HNSW/IVFFlat creation)
max_parallel_maintenance_workers = 4
max_parallel_workers_per_gather = 4
# Shared memory (25% of total RAM is a good starting point)
shared_buffers = '1GB'
effective_cache_size = '3GB'
Restart PostgreSQL after changes:
sudo systemctl restart postgresql-17
On Ubuntu, the service name is postgresql instead of postgresql-17.
Practical Demo: Semantic Search with 1,000 Documents
This demo creates a table with 1,000 documents, builds an HNSW index, and runs similarity queries to show real query plans and timing.
sudo -u postgres psql
Create and populate the table with random 384-dimension vectors (in production, these come from an embedding model like Ollama’s nomic-embed-text):
CREATE TABLE docs (
id SERIAL PRIMARY KEY,
title TEXT,
embedding vector(384)
);
INSERT INTO docs (title, embedding)
SELECT
'Document ' || i,
('[' || string_agg((random() * 2 - 1)::text, ',') || ']')::vector(384)
FROM generate_series(1, 1000) AS i,
generate_series(1, 384) AS j
GROUP BY i;
Build the HNSW index:
CREATE INDEX ON docs USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 64);
Check the index size:
SELECT indexname, pg_size_pretty(pg_relation_size(indexname::regclass)) AS size
FROM pg_indexes WHERE tablename = 'docs';
The HNSW index on 1,000 rows with 384 dimensions takes about 2 MB:
indexname | size
-------------------+---------
docs_pkey | 40 kB
docs_embedding_idx| 2008 kB
(2 rows)
Run a similarity search with EXPLAIN ANALYZE to see the query plan:
SET hnsw.ef_search = 40;
EXPLAIN ANALYZE
SELECT id, title, 1 - (embedding <=> (SELECT embedding FROM docs WHERE id = 1)) AS similarity
FROM docs
ORDER BY embedding <=> (SELECT embedding FROM docs WHERE id = 1)
LIMIT 5;
The query completes in under 1ms on 1,000 rows:
Planning Time: 0.466 ms
Execution Time: 0.551 ms
Generate Embeddings with Ollama
To store real embeddings (not random vectors), you need an embedding model. Ollama runs embedding models locally. The nomic-embed-text model produces 768-dimension vectors and is one of the most popular choices.
Pull the embedding model (assumes Ollama is installed):
ollama pull nomic-embed-text
Generate an embedding via the API:
curl -s http://localhost:11434/api/embed \
-d '{"model":"nomic-embed-text","input":"PostgreSQL is a powerful open-source database"}' \
| python3 -c "import sys,json; d=json.load(sys.stdin); print(f'Dimensions: {len(d[\"embeddings\"][0])}')"
This returns Dimensions: 768. To insert these embeddings into PostgreSQL, use a script that calls the Ollama API and inserts the result. Here is a Python example using psycopg2:
import psycopg2
import requests
import json
conn = psycopg2.connect("dbname=postgres user=postgres")
cur = conn.cursor()
# Create table for real embeddings
cur.execute("""
CREATE TABLE IF NOT EXISTS knowledge_base (
id SERIAL PRIMARY KEY,
content TEXT NOT NULL,
embedding vector(768)
)
""")
# Documents to embed
docs = [
"PostgreSQL supports advanced indexing including GiST and GIN",
"Linux kernel 6.12 introduced real-time scheduling improvements",
"Nginx can be configured as a reverse proxy with load balancing",
"Docker containers share the host kernel unlike virtual machines",
"SELinux enforcing mode prevents unauthorized process access"
]
for doc in docs:
# Get embedding from Ollama
resp = requests.post("http://localhost:11434/api/embed",
json={"model": "nomic-embed-text", "input": doc})
embedding = resp.json()["embeddings"][0]
# Insert into PostgreSQL
cur.execute(
"INSERT INTO knowledge_base (content, embedding) VALUES (%s, %s)",
(doc, json.dumps(embedding))
)
conn.commit()
cur.close()
conn.close()
print("Inserted", len(docs), "documents with embeddings")
This pattern (embed with Ollama, store in pgvector) is the foundation of a self-hosted RAG pipeline. No data leaves your network, no API costs, and your existing PostgreSQL backup strategy covers the vectors too.
Embedding Model Dimensions Reference
When creating your vector column, the dimension must match the model you use. Here are the common embedding models and their output dimensions:
| Model | Provider | Dimensions | Notes |
|---|---|---|---|
| nomic-embed-text | Ollama (local) | 768 | Open source, no API cost |
| mxbai-embed-large | Ollama (local) | 1024 | Higher quality, larger |
| all-MiniLM-L6-v2 | Sentence Transformers | 384 | Fast, lightweight |
| text-embedding-ada-002 | OpenAI | 1536 | Legacy model |
| text-embedding-3-small | OpenAI | 1536 | Current recommended |
| text-embedding-3-large | OpenAI | 3072 | Highest quality (OpenAI) |
| embed-english-v3.0 | Cohere | 1024 | Multilingual available |
Rocky Linux 10 vs Ubuntu 24.04
The pgvector extension works identically on both platforms. The differences are in package names, config paths, and service management:
| Item | Rocky Linux 10 | Ubuntu 24.04 |
|---|---|---|
| pgvector package | pgvector_17 | postgresql-17-pgvector |
| PostgreSQL config | /var/lib/pgsql/17/data/postgresql.conf | /etc/postgresql/17/main/postgresql.conf |
| Service name | postgresql-17 | postgresql |
| Data directory | /var/lib/pgsql/17/data/ | /var/lib/postgresql/17/main/ |
| SELinux/AppArmor | SELinux enforcing (no pgvector issues) | AppArmor (no action needed) |
Production Considerations
A few things to keep in mind when running pgvector in production:
- Backup: pgvector data is stored in regular PostgreSQL tables. Your existing
pg_dumpor streaming replication setup covers vectors automatically - Index rebuilds: HNSW indexes can be built with parallel workers. Set
max_parallel_maintenance_workersto match your CPU cores for faster builds on large datasets - Memory: HNSW indexes are memory-mapped. For best performance, ensure
effective_cache_sizeis large enough to keep the index in the OS page cache - Dimension limit: pgvector supports up to 16,000 dimensions for
vectorandhalfvectypes. Usehalfvecto cut memory usage in half when you can tolerate minor precision differences - Vacuuming: Run
VACUUMafter large bulk deletes to reclaim space in HNSW indexes
Going Further
With pgvector installed, you have the building blocks for several AI-powered workflows:
- Self-hosted RAG: Combine pgvector with Ollama to build a document Q&A system that runs entirely on your infrastructure
- Semantic search: Replace keyword-based search in your application with vector similarity for more relevant results
- Recommendation engines: Store user preference vectors and find similar items or users
- Image similarity: Store CLIP embeddings and find visually similar images
- Anomaly detection: Flag items that are distant from their expected cluster centroid