pgvector vs Pinecone vs Weaviate vs Qdrant: Where the Index Sits Decides Everything

Read the feature pages and these four vector stores are nearly interchangeable: ANN, metadata filters, hybrid search, multi-tenancy — all four claim every box. The thing that decides which one survives the agentic-RAG stack at scale is invisible there: where the index sits relative to the rows it indexes. pgvector lives inside the Postgres that already holds your primary data; Pinecone is a fully managed external service; Weaviate ships a knowledge-graph schema layer that wants to become a second source of truth; Qdrant is an open-source engine whose whole design point is filtered ANN at scale. Pick on that axis first; everything else follows.

At a glance

Four vector stores, four answers to the same question — what does the index sit next to, and what do you have to operate to keep it there. The table lists the basics; the matrix below it shows where each one leans hardest across the axes that actually differ.

Project	Released / maintainer	Primary niche	Where it runs
pgvector	2021, open source (community)	Vector search as a Postgres extension	Wherever your Postgres runs
Pinecone	2019, Pinecone Systems Inc.	Fully managed serverless vector DB	Pinecone cloud (managed-only)
Weaviate	2019, Weaviate B.V. (open source)	Vector engine + knowledge-graph schema	Self-host or Weaviate Cloud
Qdrant	2021, Qdrant Solutions (open source)	Rust engine optimized for filtered ANN at scale	Self-host or Qdrant Cloud

Snapshot: 2026-06-01. Vector-store features move quickly; verify against current docs.

Where each store leans hardest. The axes converge everywhere except where the index sits and what it ships on top.

pgvector — deep dive

pgvector adds a vector type and ANN indexes to Postgres; the index sits on the same rows as the SQL data, under the same planner and the same transaction.

Storage and index model

pgvector is a Postgres extension. You add a column of type vector(d) to an existing table, and the embeddings live on the same row as the rest of your data — same primary key, same foreign keys, same transaction log. The ANN index is one of two shapes: HNSW, a navigable small-world graph kept in shared buffers, or IVFFlat, which clusters the vectors into lists and scans a configurable number of nearest lists at query time. Either way, the index file is a normal Postgres relation; backups, replicas, and point-in-time recovery work without anything new. This is the same vector-search machinery covered in the chunking and vector search primer, but located inside the database that already authoritatively answers "what is this row?".

Query path (vector + filter + hybrid)

Queries are plain SQL. ORDER BY embedding <=> $1 LIMIT k runs the ANN index; a WHERE clause on scalar columns goes through the existing B-tree or GIN index; and the Postgres planner decides whether to filter first and then ANN-search the survivors, or ANN-search and then filter. Hybrid search composes from existing parts: Postgres full-text search (tsvector + ts_rank) joined to the same row that holds the vector, with rank fusion done in SQL — the pattern is the same one in hybrid search and reranking, but written as one query instead of two services. Because the join target is the row itself, agentic queries that need a vector match and the actual business fields ("title, body, author name, last updated") return in one round-trip.

What the runtime makes hard

Two things. First, filtered ANN at very large scale is a known sharp edge — the planner can sometimes pick a strategy that defeats the HNSW index's pruning, and at billions of vectors with high-cardinality filters you may have to coach it with index hints or partition tables. Second, write amplification: HNSW index build and incremental rebuilds tax the same Postgres instance that serves your transactional workload, so heavy ingestion competes for I/O and CPU with the OLTP path. The pragmatic move is to put pgvector on a read replica (or a dedicated logical-replicated copy) when ingest volume gets serious, which is more or less the same playbook you would already run for any read-heavy Postgres workload.

Pinecone — deep dive

Pinecone is fully managed: vectors and metadata live in Pinecone's cloud, and your agent reaches the index over HTTPS while joining back to your primary DB by ID.

Storage and index model

Pinecone is a vector database as a service. You create an index with a name, a dimension, a distance metric, and a region; you partition it into namespaces for tenant or collection isolation; you upsert { id, values, metadata } records over HTTPS. The current serverless tier separates hot compute from a blob-storage cold tier, so the storage you pay for is decoupled from query throughput — there are no shards to size, no replicas to provision, no HNSW parameters to tune. The internal index and ANN algorithm are not part of the contract; Pinecone owns the implementation, and you trade transparency for never having to think about it.

Query path (vector + filter + hybrid)

Queries are an HTTPS POST: vector, top-k, optional metadata filter, optional namespace. Metadata filters are applied during traversal rather than after, so a high-cardinality filter still returns roughly the right top-k. Pinecone supports sparse-dense hybrid by accepting a sparse vector alongside the dense one and fusing the scores; if you want BM25-style lexical search you compute the sparse vector yourself (BM25, SPLADE, or similar) and pass both. The query returns IDs and scores plus the metadata you stored at upsert time — but the full row of business data still lives in your primary DB, so the agent then issues a second query to your Postgres or Mongo to enrich the results.

What the runtime makes hard

Three things. First, two systems of record: every row written to your primary DB has to be embedded, upserted, and kept consistent with Pinecone, so a stale-vector or missing-vector bug is a class of problem you did not have before. Second, vendor lock-in by definition — the service is not self-hostable, so the cheapest hedge is a clean abstraction layer over reads and writes. Third, data boundary: vectors and their attached metadata leave your infra, so regulated content needs an enterprise-region or BYOC plan, and an outage in Pinecone is an outage in your retrieval path. The service is fast to start with and easy to operate; the operating cost is the discipline to treat it as a remote dependency rather than a database you own.

Weaviate — deep dive

Weaviate ships a knowledge-graph schema layer on top: classes have typed properties and cross-references, modules embed and rerank inline, and a GraphQL query returns vector matches plus graph traversals together.

Storage and index model

Weaviate is an open-source vector engine with a schema layer baked into it. You declare classes with typed properties, including cross-references between classes — so Article references Author and Topic, the way a graph database does, and each class gets its own HNSW index over a class-level embedding. Vectorization happens server-side via modules: text2vec-openai, text2vec-cohere, text2vec-transformers, plus rerankers and a generative module that can run RAG-in-DB. The architecture is the closest of the four to a small graph RAG system in a box — vectors and typed relationships in one place.

Query path (vector + filter + hybrid)

The default query language is GraphQL. A Get { Article(nearVector: …, where: …) { _additional { distance } author { name } } } returns the vector match, the score, and joined cross-referenced objects in one round-trip. Hybrid search is a first-class operator: hybrid(query, alpha) blends BM25 with vector results, with the alpha parameter controlling fusion weight, and the reranker module can apply a cross-encoder pass at query time. Multi-tenancy is per-class — each tenant gets an isolated HNSW graph, which keeps tenants from polluting each other's recall at the cost of more graphs to maintain.

What the runtime makes hard

Schema evolution is the sharp edge. The schema is a contract — adding a property is fine, but changing a vectorizer, retypeing a property, or restructuring cross-references usually means a re-vectorize and a migration, which is exactly the work pgvector lets you skip by piggy-backing on whatever the application schema already is. Weaviate is also another stateful service in your topology — a separate failure domain with its own backup story, version upgrades, and module configuration. The trade is real: you get RAG-relevant primitives (vectorize, rerank, graph traversal) inside the database, but you operate one more system whose schema you have to keep honest.

Qdrant — deep dive

Qdrant is a Rust vector engine designed around filtered ANN: payload indexes feed a filter-aware HNSW, quantization shrinks vectors, and the cluster shards and replicates by point ID.

Storage and index model

Qdrant is an open-source vector engine written in Rust. Data is organized into collections, each of which has a vector size, a distance metric, and optional named multi-vector slots (for example one slot for a dense embedding, one for ColBERT-style late-interaction vectors). Each item is a point: an ID, one or more vectors, and a payload — arbitrary structured JSON, with per-field payload indexes that act like an inverted index on the metadata. The index of interest is HNSW, but Qdrant offers quantization — scalar, product, or binary — that shrinks the in-memory vector footprint by 4× to 32× and rescours the survivors with the full vectors, which is what lets a single node hold a lot of points.

Query path (vector + filter + hybrid)

The signature feature is filter-aware HNSW: the graph traversal evaluates payload filters as it walks, rather than ANN-searching first and discarding non-matches afterwards. The win is recall under heavy filters — the failure mode where a post-filter ANN query returns "I found nothing matching tenant_id=42 in the nearest 200, sorry" mostly does not happen here, because the traversal does not waste its budget on non-matching neighbors. Hybrid search is supported through sparse vectors plus dense (BM25-style sparse vectors fused with dense at query time), and the API exposes recommend, batch search, and scroll alongside plain search, which composes well into agentic retrieval loops where the agent issues repeated, scoped queries.

What the runtime makes hard

You self-host (or use Qdrant Cloud), which means you run another stateful service with its own sharding, replication factor, WAL, and snapshots. Quantization is powerful but not free — you choose the quantization level, rescore depth, and shard count yourself, so the operating overhead is "another distributed database to operate," not "an extension on the database you already have." The payload model is JSON rather than a schema, so there is no equivalent of Weaviate's typed cross-references; relationships are something the application reconstitutes. The honest position: Qdrant gives you the strongest filtered-ANN engine in this set, and the price is a separate stateful system you operate.

Cross-cutting comparison

Where the index sits relative to your primary data

The headline axis. Four homes for the index, four different consistency and joinability stories.

This is the axis the feature lists hide, and it is the one that decides which store survives contact with a production agent. pgvector puts the index on the same rows as your primary data, under the same planner and transaction log, so a single SQL query can join "nearest vector" to "actual business fields" without a second hop. Pinecone moves the index to a managed cloud, which makes operations easy but turns every retrieval into a cross-system join: vectors there, source rows here, and the application is responsible for keeping the two consistent. Weaviate is something stranger — its schema layer wants to be a source of truth (classes, cross-references, server-side vectorization), which is wonderful when the application happens to fit Weaviate's shape, and exhausting when the application's domain schema and Weaviate's schema start to drift. Qdrant takes the most modest position: it is a vector index next door, the payload is opaque JSON, and you join in the application — but the engine is built for that exact shape and does it well.

Filtered ANN behavior at scale

The axis where the four pull furthest apart. Filtered ANN is what an agentic-RAG workload does most.

Agentic RAG queries are almost never "give me the nearest k vectors with no filter" — they are "the nearest k vectors that belong to this tenant, this document type, this date range." pgvector defers the strategy to the Postgres planner: at modest scale it picks the right combination of B-tree and HNSW, but at high cardinality you can land on a plan that defeats the ANN index's pruning, and the fix is index tuning or partitioning. Pinecone and Weaviate both filter during traversal, which mostly works; Pinecone hides the algorithm and asks you to trust it, Weaviate exposes inverted indexes per class and gives you a flat-search escape hatch when recall matters more than latency. Qdrant takes filter-aware ANN as its design point: the HNSW traversal evaluates payload conditions as it walks, so the recall-under-heavy-filters failure mode that hurts the other three is, by construction, the case Qdrant tunes for. If your agent's filters are wide and shallow, all four are fine; if they are narrow and selective ("only this customer's docs from the last 30 days, only invoice category"), Qdrant's behavior is the most predictable.

Operations and deployment story

The state question collapses into an operations question — who runs the index, and who pays for the second system of record.

The four split cleanly. pgvector adds zero new operational surface — your existing Postgres backups, replicas, monitoring, RLS, and cloud-DB managed services (RDS, Neon, Supabase, Aurora) cover it, which is the single biggest reason teams that start with pgvector mostly stay there. Pinecone moves the entire operational surface to the vendor: no servers, no shard sizing, no HNSW tuning, in exchange for a remote dependency in your retrieval critical path and a separate billing line. Weaviate is the heaviest of the four to operate — it is its own stateful service, with version upgrades, module configuration, and schema migrations that have no analog in your application database. Qdrant is lighter than Weaviate but still a separate stateful service: a single Rust binary scales from one Docker container to a sharded cluster, but the sharding, replication factor, and quantization choices are yours. Counterintuitively the question "should we operate fewer systems or more capable ones?" answers itself for most teams: fewer systems wins until a query pattern (massive scale, very heavy filters, multi-region) actively forces a more capable one.

When to pick which

Use case	Pick pgvector if…	Pick Pinecone if…	Pick Weaviate if…	Pick Qdrant if…
You already run Postgres	Yes — co-located index, no new system, joins to existing rows for free.	Only if Pinecone's scale or zero-ops story outweighs adding a second source of truth.	Only if the knowledge-graph schema is the actual application model.	Only if filtered-ANN behavior at your scale is the binding constraint.
You do NOT want to operate the index	Use managed Postgres (RDS, Neon, Supabase) — pgvector rides along.	Yes — managed-only is the whole product; no servers, no tuning.	Weaviate Cloud exists, but you still own the schema and modules.	Qdrant Cloud exists; self-hosting is the default expectation.
Heavy filtered ANN at huge scale	Possible with HNSW + partitioning, but tuning gets sharp.	Managed metadata filtering scales without your tuning.	Per-class HNSW + inverted indexes work; flat-search fallback available.	Yes — filter-aware HNSW is the design point.
RAG over typed entities with relationships	Model the relationships in SQL — works, but no native graph traversal.	Namespaces are flat; relationships live in your primary DB.	Yes — typed classes + cross-references are first-class.	JSON payloads only; the app reconstitutes relationships.
Regulated data, hard data-boundary	Yes — same boundary as your Postgres.	Requires enterprise region or BYOC.	Self-host inside your boundary.	Self-host inside your boundary.

FAQ

Is pgvector "good enough" for production RAG?

For most teams, yes — and the threshold for "no" is higher than the vendor pitches suggest. pgvector handles tens of millions of vectors comfortably on a normally-sized Postgres instance with HNSW; the corners where it gets sharp are very heavy filtered ANN at high cardinality, write throughput colliding with OLTP load, and indexes large enough that build time becomes operational pain. Before reaching for a dedicated vector DB, the cheap moves are: a read replica for vectors, partitioning by tenant or date, and pushing the embedding pipeline off the OLTP node. The team's choosing a vector database deep-dive walks the decision in more detail.

What is the actual difference between Pinecone, Weaviate, and Qdrant?

All three are dedicated vector engines, but they sit in different places. Pinecone is fully managed and proprietary — no self-host, no transparency into the index, but no operational work either. Weaviate is open source with a knowledge-graph schema layer baked in: typed classes, cross-references, inline vectorization modules, and a GraphQL API; it is the heaviest to operate and the most opinionated about how you model your data. Qdrant is open source and minimal — vectors plus JSON payloads, with filter-aware HNSW as its design point, optimized for filtered ANN at scale; it is lighter to operate than Weaviate and asks less of your schema.

How does filtered ANN actually behave differently between these stores?

The failure mode to watch is "the filter is narrow enough that the ANN graph traversal returns mostly non-matches, recall collapses, and the agent gets nothing useful." pgvector and Pinecone are fine for wide filters; pgvector starts to struggle when the Postgres planner picks the wrong leading index and you have to hint or partition; Weaviate's per-class HNSW plus inverted indexes works well within a class. Qdrant is the one that explicitly treats filter-aware HNSW as a primary design constraint: the graph traversal evaluates payload filters as it walks, so the "narrow filter, low recall" failure mostly does not happen. If your agent's queries are very selective, this is the dimension that matters most.

Can I use embeddings from one store with another?

Yes — embeddings are just floating-point arrays, so the same model (OpenAI text-embedding-3-large, Cohere, BGE, voyage-3, whatever) produces vectors any of the four can index. The store does not own the embedding; the embedding model does. This is why migrating between vector stores is usually "re-upsert the same vectors with the same IDs", not "re-embed everything from scratch" — assuming you kept the source text and the model name pinned. See embeddings: meaning as geometry for the underlying intuition.

Does it matter that Pinecone is closed source?

It matters about as much as any managed cloud dependency in your retrieval critical path. The pragmatic question is not "open vs closed" but "what happens if this vendor's pricing, region availability, or SLO no longer fits us in eighteen months?" The answer for Pinecone is "re-upsert your vectors elsewhere," which is cheaper than people think — the embedding model and source text are the assets, the index is replaceable. The hedge that costs nothing is a thin abstraction layer over reads and writes; the hedge that costs something is sticking with self-hostable stores so the exit is trivial.

Where does this sit in the broader RAG stack?

The vector store is one layer inside a retrieval pipeline that also includes chunking, embedding, hybrid search, reranking, and query rewriting. Picking the right store does not save a bad chunking strategy and does not invent recall the embeddings did not have. The honest sequence is: get RAG working end-to-end with whatever store is closest at hand (very often pgvector), measure where retrieval is actually failing using the patterns in evaluating RAG, and only then move the index if the data tells you to.

pgvector vs Pinecone vs Weaviate vs Qdrant: Where the Index Sits Decides Everything

At a glance

pgvector — deep dive

Storage and index model

Query path (vector + filter + hybrid)

What the runtime makes hard

Pinecone — deep dive

Storage and index model

Query path (vector + filter + hybrid)

What the runtime makes hard

Weaviate — deep dive

Storage and index model

Query path (vector + filter + hybrid)

What the runtime makes hard

Qdrant — deep dive

Storage and index model

Query path (vector + filter + hybrid)

What the runtime makes hard

Cross-cutting comparison

Where the index sits relative to your primary data

Filtered ANN behavior at scale

Operations and deployment story

When to pick which

FAQ

Is pgvector "good enough" for production RAG?

What is the actual difference between Pinecone, Weaviate, and Qdrant?

How does filtered ANN actually behave differently between these stores?

Can I use embeddings from one store with another?

Does it matter that Pinecone is closed source?

Where does this sit in the broader RAG stack?

Further reading

On this wiki:

Project sources: