AI Blog

pgvector vs Pinecone vs Weaviate vs Qdrant: Where the Index Sits Decides Everything

Four vector stores, four nearly identical feature lists — ANN, filters, hybrid search, all of it. The thing that actually decides which one survives the agentic-RAG stack at scale is invisible there: where the index sits relative to your primary data.

By Agentic AI Wiki 24 min read

Read the feature pages and these four vector stores are nearly interchangeable: ANN, metadata filters, hybrid search, multi-tenancy — all four claim every box. The thing that decides which one survives the agentic-RAG stack at scale is invisible there: where the index sits relative to the rows it indexes. pgvector lives inside the Postgres that already holds your primary data; Pinecone is a fully managed external service; Weaviate ships a knowledge-graph schema layer that wants to become a second source of truth; Qdrant is an open-source engine whose whole design point is filtered ANN at scale. Pick on that axis first; everything else follows.

At a glance

Four vector stores, four answers to the same question — what does the index sit next to, and what do you have to operate to keep it there. The table lists the basics; the matrix below it shows where each one leans hardest across the axes that actually differ.

Project Released / maintainer Primary niche Where it runs
pgvector 2021, open source (community) Vector search as a Postgres extension Wherever your Postgres runs
Pinecone 2019, Pinecone Systems Inc. Fully managed serverless vector DB Pinecone cloud (managed-only)
Weaviate 2019, Weaviate B.V. (open source) Vector engine + knowledge-graph schema Self-host or Weaviate Cloud
Qdrant 2021, Qdrant Solutions (open source) Rust engine optimized for filtered ANN at scale Self-host or Qdrant Cloud

Snapshot: 2026-06-01. Vector-store features move quickly; verify against current docs.

Vector store feature matrix Heatmap comparing pgvector, Pinecone, Weaviate, and Qdrant across six axes: Co-located with primary data, Filtered ANN at scale, Hybrid search, Schema / graph model, Self-hostable, and Managed option. Strength indicated by fill color from light (weak) to dark orange-red (strong). Vector store feature matrix Co-located with data Filtered ANN at scale Hybrid search Schema / graph model Self- hostable Managed option pgvector Native (in DB) Planner-driven tsvector + vec Relational Yes RDS / Neon / Supabase Pinecone External only Metadata filter Sparse-dense vectors Namespaces No Managed-only (serverless) Weaviate External Inverted indexes BM25 + vector Classes + cross-refs Yes Weaviate Cloud Qdrant External Filter-aware HNSW Sparse + dense JSON payloads Yes (Rust) Qdrant Cloud Weak Medium Strong
Where each store leans hardest. The axes converge everywhere except where the index sits and what it ships on top.

pgvector — deep dive

pgvector architecture pgvector is a Postgres extension: vector columns sit on the same rows as your primary data, ANN indexes (HNSW or IVFFlat) live inside the database, and queries combine SQL filters with vector distance in a single planner pass. Postgres — your primary database pgvector extension · one server, one transaction log documents table id · tenant_id · title · body · created_at · embedding vector(1536) SQL columns tenant_id, created_at… JSONB metadata tags, source, lang FK to rows joins still work embedding vec(1536) same row · same TX ANN index HNSW (in-memory graph) or IVFFlat (cluster lists) CREATE INDEX … USING hnsw B-tree / GIN indexes on scalar + JSONB columns planner can intersect with ANN filter-first or vector-first Query path SELECT … WHERE tenant_id=? ORDER BY embedding <=> $1 LIMIT k one planner, one transaction, joins to other tables Agent / app Retrieval tool issues plain SQL via existing connection pool Same DB credentials RLS · row-level security reuses tenant isolation No sync job writes are atomic with rows no separate vector pipeline Operate as Postgres backups, replicas, monitoring whatever you already run
pgvector adds a vector type and ANN indexes to Postgres; the index sits on the same rows as the SQL data, under the same planner and the same transaction.

Storage and index model

pgvector is a Postgres extension. You add a column of type vector(d) to an existing table, and the embeddings live on the same row as the rest of your data — same primary key, same foreign keys, same transaction log. The ANN index is one of two shapes: HNSW, a navigable small-world graph kept in shared buffers, or IVFFlat, which clusters the vectors into lists and scans a configurable number of nearest lists at query time. Either way, the index file is a normal Postgres relation; backups, replicas, and point-in-time recovery work without anything new. This is the same vector-search machinery covered in the chunking and vector search primer, but located inside the database that already authoritatively answers "what is this row?".

Query path (vector + filter + hybrid)

Queries are plain SQL. ORDER BY embedding <=> $1 LIMIT k runs the ANN index; a WHERE clause on scalar columns goes through the existing B-tree or GIN index; and the Postgres planner decides whether to filter first and then ANN-search the survivors, or ANN-search and then filter. Hybrid search composes from existing parts: Postgres full-text search (tsvector + ts_rank) joined to the same row that holds the vector, with rank fusion done in SQL — the pattern is the same one in hybrid search and reranking, but written as one query instead of two services. Because the join target is the row itself, agentic queries that need a vector match and the actual business fields ("title, body, author name, last updated") return in one round-trip.

What the runtime makes hard

Two things. First, filtered ANN at very large scale is a known sharp edge — the planner can sometimes pick a strategy that defeats the HNSW index's pruning, and at billions of vectors with high-cardinality filters you may have to coach it with index hints or partition tables. Second, write amplification: HNSW index build and incremental rebuilds tax the same Postgres instance that serves your transactional workload, so heavy ingestion competes for I/O and CPU with the OLTP path. The pragmatic move is to put pgvector on a read replica (or a dedicated logical-replicated copy) when ingest volume gets serious, which is more or less the same playbook you would already run for any read-heavy Postgres workload.

Pinecone — deep dive

Pinecone architecture Pinecone is a fully managed external vector service: your app writes vectors plus metadata over HTTPS into namespaced indexes, the index lives in Pinecone's serverless storage tier, and the agent queries it as a remote API separate from your primary database. YOUR INFRA Primary DB Postgres / Mongo / DynamoDB authoritative business rows Ingest pipeline chunk · embed · upsert cron / queue worker you keep IDs in sync Agent / app retrieval tool calls Pinecone then joins back to primary DB by stored ID Two systems to operate no shared transaction Pinecone — managed service no servers, you operate · serverless billing Index name · dimension · metric · cloud region Namespace A tenant / collection Namespace B tenant / collection Namespace C tenant / collection Serverless storage blob tier for cold vectors pay per GB stored no shard sizing Query engine ANN + metadata filter sparse-dense hybrid pay per RU consumed Control plane · HTTPS API SDK clients · auth keys · region pinning · usage metrics upsert query matches → IDs
Pinecone is fully managed: vectors and metadata live in Pinecone's cloud, and your agent reaches the index over HTTPS while joining back to your primary DB by ID.

Storage and index model

Pinecone is a vector database as a service. You create an index with a name, a dimension, a distance metric, and a region; you partition it into namespaces for tenant or collection isolation; you upsert { id, values, metadata } records over HTTPS. The current serverless tier separates hot compute from a blob-storage cold tier, so the storage you pay for is decoupled from query throughput — there are no shards to size, no replicas to provision, no HNSW parameters to tune. The internal index and ANN algorithm are not part of the contract; Pinecone owns the implementation, and you trade transparency for never having to think about it.

Query path (vector + filter + hybrid)

Queries are an HTTPS POST: vector, top-k, optional metadata filter, optional namespace. Metadata filters are applied during traversal rather than after, so a high-cardinality filter still returns roughly the right top-k. Pinecone supports sparse-dense hybrid by accepting a sparse vector alongside the dense one and fusing the scores; if you want BM25-style lexical search you compute the sparse vector yourself (BM25, SPLADE, or similar) and pass both. The query returns IDs and scores plus the metadata you stored at upsert time — but the full row of business data still lives in your primary DB, so the agent then issues a second query to your Postgres or Mongo to enrich the results.

What the runtime makes hard

Three things. First, two systems of record: every row written to your primary DB has to be embedded, upserted, and kept consistent with Pinecone, so a stale-vector or missing-vector bug is a class of problem you did not have before. Second, vendor lock-in by definition — the service is not self-hostable, so the cheapest hedge is a clean abstraction layer over reads and writes. Third, data boundary: vectors and their attached metadata leave your infra, so regulated content needs an enterprise-region or BYOC plan, and an outage in Pinecone is an outage in your retrieval path. The service is fast to start with and easy to operate; the operating cost is the discipline to treat it as a remote dependency rather than a database you own.

Weaviate — deep dive

Weaviate architecture Weaviate is an open-source vector engine that ships a knowledge-graph schema layer on top: classes with typed properties and cross-references, modules that embed and rerank inline, and a GraphQL query API for vector + filter + relationship traversal. Weaviate engine open source · self-host or Weaviate Cloud · GraphQL + REST Schema — typed classes & cross-references Article title, body vector (auto) → Author, → Topic Author name, bio vector (auto) ← Article Topic label vector (auto) ← Article Modules — inline text2vec-openai · cohere · transformers reranker-cohere · reranker-transformers generative-openai (RAG-in-DB) HNSW · scalar filters per-class HNSW graph inverted indexes on scalars BM25 for hybrid search GraphQL — Get { Article(nearVector, where) { _additional, author { name } } } Agent / app GraphQL client one query gives vector + joins No embed call vectorizer module runs server-side Schema = contract classes + refs harder to evolve Multi-tenancy tenant per class isolated indexes
Weaviate ships a knowledge-graph schema layer on top: classes have typed properties and cross-references, modules embed and rerank inline, and a GraphQL query returns vector matches plus graph traversals together.

Storage and index model

Weaviate is an open-source vector engine with a schema layer baked into it. You declare classes with typed properties, including cross-references between classes — so Article references Author and Topic, the way a graph database does, and each class gets its own HNSW index over a class-level embedding. Vectorization happens server-side via modules: text2vec-openai, text2vec-cohere, text2vec-transformers, plus rerankers and a generative module that can run RAG-in-DB. The architecture is the closest of the four to a small graph RAG system in a box — vectors and typed relationships in one place.

Query path (vector + filter + hybrid)

The default query language is GraphQL. A Get { Article(nearVector: …, where: …) { _additional { distance } author { name } } } returns the vector match, the score, and joined cross-referenced objects in one round-trip. Hybrid search is a first-class operator: hybrid(query, alpha) blends BM25 with vector results, with the alpha parameter controlling fusion weight, and the reranker module can apply a cross-encoder pass at query time. Multi-tenancy is per-class — each tenant gets an isolated HNSW graph, which keeps tenants from polluting each other's recall at the cost of more graphs to maintain.

What the runtime makes hard

Schema evolution is the sharp edge. The schema is a contract — adding a property is fine, but changing a vectorizer, retypeing a property, or restructuring cross-references usually means a re-vectorize and a migration, which is exactly the work pgvector lets you skip by piggy-backing on whatever the application schema already is. Weaviate is also another stateful service in your topology — a separate failure domain with its own backup story, version upgrades, and module configuration. The trade is real: you get RAG-relevant primitives (vectorize, rerank, graph traversal) inside the database, but you operate one more system whose schema you have to keep honest.

Qdrant — deep dive

Qdrant architecture Qdrant is an open-source vector engine written in Rust, optimized for filtered ANN: points carry payloads with native payload indexes, HNSW links are filter-aware, scalar/product/binary quantization shrinks vectors, and clusters shard collections for horizontal scale. Qdrant cluster — Rust engine open source · self-host, Docker, or Qdrant Cloud · gRPC + REST Collection vector params (size, distance) · multi-vector named slots · sharded by ID Points { id, vector, payload: JSON } payload = structured metadata arbitrary keys · arrays · geo payload indexes per field Filter-aware HNSW graph traversal respects filters recall stays high under heavy filters no two-stage post-filter trap key differentiator at scale Quantization scalar · product · binary shrinks RAM 4×–32× rescore with full vectors Sharding · replicas · WAL consistent hashing by point ID replication factor per collection RAFT for metadata gRPC + REST API · search, scroll, recommend, batch upsert Agent / app Embed yourself no inline modules bring your model Filtered query payload conditions + vector + score Self-host or cloud single Docker to K8s cluster Observability Prometheus · web UI snapshots · backups
Qdrant is a Rust vector engine designed around filtered ANN: payload indexes feed a filter-aware HNSW, quantization shrinks vectors, and the cluster shards and replicates by point ID.

Storage and index model

Qdrant is an open-source vector engine written in Rust. Data is organized into collections, each of which has a vector size, a distance metric, and optional named multi-vector slots (for example one slot for a dense embedding, one for ColBERT-style late-interaction vectors). Each item is a point: an ID, one or more vectors, and a payload — arbitrary structured JSON, with per-field payload indexes that act like an inverted index on the metadata. The index of interest is HNSW, but Qdrant offers quantization — scalar, product, or binary — that shrinks the in-memory vector footprint by 4× to 32× and rescours the survivors with the full vectors, which is what lets a single node hold a lot of points.

Query path (vector + filter + hybrid)

The signature feature is filter-aware HNSW: the graph traversal evaluates payload filters as it walks, rather than ANN-searching first and discarding non-matches afterwards. The win is recall under heavy filters — the failure mode where a post-filter ANN query returns "I found nothing matching tenant_id=42 in the nearest 200, sorry" mostly does not happen here, because the traversal does not waste its budget on non-matching neighbors. Hybrid search is supported through sparse vectors plus dense (BM25-style sparse vectors fused with dense at query time), and the API exposes recommend, batch search, and scroll alongside plain search, which composes well into agentic retrieval loops where the agent issues repeated, scoped queries.

What the runtime makes hard

You self-host (or use Qdrant Cloud), which means you run another stateful service with its own sharding, replication factor, WAL, and snapshots. Quantization is powerful but not free — you choose the quantization level, rescore depth, and shard count yourself, so the operating overhead is "another distributed database to operate," not "an extension on the database you already have." The payload model is JSON rather than a schema, so there is no equivalent of Weaviate's typed cross-references; relationships are something the application reconstitutes. The honest position: Qdrant gives you the strongest filtered-ANN engine in this set, and the price is a separate stateful system you operate.

Cross-cutting comparison

Where the index sits relative to your primary data

Where the index sits relative to your primary data Four-column comparison: pgvector lives inside the Postgres that holds your primary rows; Pinecone is an external managed service reached over HTTPS; Weaviate is its own engine with a schema layer that becomes a parallel source of truth; Qdrant is a separate open-source engine you self-host alongside your primary database. Where the index sits relative to your primary data pgvector Inside Postgres, on the same rows; one transaction Pinecone External managed service over HTTPS; IDs join back Weaviate Own schema layer; becomes a parallel source of truth Qdrant Separate engine you self-host alongside primary DB
The headline axis. Four homes for the index, four different consistency and joinability stories.

This is the axis the feature lists hide, and it is the one that decides which store survives contact with a production agent. pgvector puts the index on the same rows as your primary data, under the same planner and transaction log, so a single SQL query can join "nearest vector" to "actual business fields" without a second hop. Pinecone moves the index to a managed cloud, which makes operations easy but turns every retrieval into a cross-system join: vectors there, source rows here, and the application is responsible for keeping the two consistent. Weaviate is something stranger — its schema layer wants to be a source of truth (classes, cross-references, server-side vectorization), which is wonderful when the application happens to fit Weaviate's shape, and exhausting when the application's domain schema and Weaviate's schema start to drift. Qdrant takes the most modest position: it is a vector index next door, the payload is opaque JSON, and you join in the application — but the engine is built for that exact shape and does it well.

Filtered ANN behavior at scale

Filtered ANN behavior at scale Four-column comparison of filtered ANN: pgvector relies on the Postgres planner to combine B-tree filters with HNSW; Pinecone applies metadata filters during traversal at managed scale; Weaviate uses inverted indexes alongside HNSW per class; Qdrant ships filter-aware HNSW that keeps recall high under heavy filters, the strongest in this dimension. Filtered ANN behavior at scale pgvector Postgres planner picks B-tree first or HNSW first; careful at high cardinality Pinecone Metadata filter during traversal; managed at scale, hidden internals Weaviate Inverted indexes alongside HNSW; per-class graphs; flat-search escape hatch Qdrant Filter-aware HNSW — graph respects filters; no two-stage post-filter, recall holds under heavy filters
The axis where the four pull furthest apart. Filtered ANN is what an agentic-RAG workload does most.

Agentic RAG queries are almost never "give me the nearest k vectors with no filter" — they are "the nearest k vectors that belong to this tenant, this document type, this date range." pgvector defers the strategy to the Postgres planner: at modest scale it picks the right combination of B-tree and HNSW, but at high cardinality you can land on a plan that defeats the ANN index's pruning, and the fix is index tuning or partitioning. Pinecone and Weaviate both filter during traversal, which mostly works; Pinecone hides the algorithm and asks you to trust it, Weaviate exposes inverted indexes per class and gives you a flat-search escape hatch when recall matters more than latency. Qdrant takes filter-aware ANN as its design point: the HNSW traversal evaluates payload conditions as it walks, so the recall-under-heavy-filters failure mode that hurts the other three is, by construction, the case Qdrant tunes for. If your agent's filters are wide and shallow, all four are fine; if they are narrow and selective ("only this customer's docs from the last 30 days, only invoice category"), Qdrant's behavior is the most predictable.

Operations and deployment story

Operations / deployment story Four-column comparison of the operations story: pgvector reuses your existing Postgres operations; Pinecone is fully managed with zero servers for you to run; Weaviate adds another stateful service with a schema migration story; Qdrant gives you a single Rust binary you self-host with sharding and quantization to tune. Operations / deployment story pgvector Reuses your existing Postgres backups, replicas, monitoring; no new system Pinecone Fully managed — zero servers, zero tuning, serverless billing; vendor-locked Weaviate Another stateful service to run; schema migrations to maintain Qdrant Single Rust binary; Docker → K8s cluster; shard, replicate, quantize yourself
The state question collapses into an operations question — who runs the index, and who pays for the second system of record.

The four split cleanly. pgvector adds zero new operational surface — your existing Postgres backups, replicas, monitoring, RLS, and cloud-DB managed services (RDS, Neon, Supabase, Aurora) cover it, which is the single biggest reason teams that start with pgvector mostly stay there. Pinecone moves the entire operational surface to the vendor: no servers, no shard sizing, no HNSW tuning, in exchange for a remote dependency in your retrieval critical path and a separate billing line. Weaviate is the heaviest of the four to operate — it is its own stateful service, with version upgrades, module configuration, and schema migrations that have no analog in your application database. Qdrant is lighter than Weaviate but still a separate stateful service: a single Rust binary scales from one Docker container to a sharded cluster, but the sharding, replication factor, and quantization choices are yours. Counterintuitively the question "should we operate fewer systems or more capable ones?" answers itself for most teams: fewer systems wins until a query pattern (massive scale, very heavy filters, multi-region) actively forces a more capable one.

When to pick which

Use case Pick pgvector if… Pick Pinecone if… Pick Weaviate if… Pick Qdrant if…
You already run Postgres Yes — co-located index, no new system, joins to existing rows for free. Only if Pinecone's scale or zero-ops story outweighs adding a second source of truth. Only if the knowledge-graph schema is the actual application model. Only if filtered-ANN behavior at your scale is the binding constraint.
You do NOT want to operate the index Use managed Postgres (RDS, Neon, Supabase) — pgvector rides along. Yes — managed-only is the whole product; no servers, no tuning. Weaviate Cloud exists, but you still own the schema and modules. Qdrant Cloud exists; self-hosting is the default expectation.
Heavy filtered ANN at huge scale Possible with HNSW + partitioning, but tuning gets sharp. Managed metadata filtering scales without your tuning. Per-class HNSW + inverted indexes work; flat-search fallback available. Yes — filter-aware HNSW is the design point.
RAG over typed entities with relationships Model the relationships in SQL — works, but no native graph traversal. Namespaces are flat; relationships live in your primary DB. Yes — typed classes + cross-references are first-class. JSON payloads only; the app reconstitutes relationships.
Regulated data, hard data-boundary Yes — same boundary as your Postgres. Requires enterprise region or BYOC. Self-host inside your boundary. Self-host inside your boundary.

FAQ

Is pgvector "good enough" for production RAG?

For most teams, yes — and the threshold for "no" is higher than the vendor pitches suggest. pgvector handles tens of millions of vectors comfortably on a normally-sized Postgres instance with HNSW; the corners where it gets sharp are very heavy filtered ANN at high cardinality, write throughput colliding with OLTP load, and indexes large enough that build time becomes operational pain. Before reaching for a dedicated vector DB, the cheap moves are: a read replica for vectors, partitioning by tenant or date, and pushing the embedding pipeline off the OLTP node. The team's choosing a vector database deep-dive walks the decision in more detail.

What is the actual difference between Pinecone, Weaviate, and Qdrant?

All three are dedicated vector engines, but they sit in different places. Pinecone is fully managed and proprietary — no self-host, no transparency into the index, but no operational work either. Weaviate is open source with a knowledge-graph schema layer baked in: typed classes, cross-references, inline vectorization modules, and a GraphQL API; it is the heaviest to operate and the most opinionated about how you model your data. Qdrant is open source and minimal — vectors plus JSON payloads, with filter-aware HNSW as its design point, optimized for filtered ANN at scale; it is lighter to operate than Weaviate and asks less of your schema.

How does filtered ANN actually behave differently between these stores?

The failure mode to watch is "the filter is narrow enough that the ANN graph traversal returns mostly non-matches, recall collapses, and the agent gets nothing useful." pgvector and Pinecone are fine for wide filters; pgvector starts to struggle when the Postgres planner picks the wrong leading index and you have to hint or partition; Weaviate's per-class HNSW plus inverted indexes works well within a class. Qdrant is the one that explicitly treats filter-aware HNSW as a primary design constraint: the graph traversal evaluates payload filters as it walks, so the "narrow filter, low recall" failure mostly does not happen. If your agent's queries are very selective, this is the dimension that matters most.

Can I use embeddings from one store with another?

Yes — embeddings are just floating-point arrays, so the same model (OpenAI text-embedding-3-large, Cohere, BGE, voyage-3, whatever) produces vectors any of the four can index. The store does not own the embedding; the embedding model does. This is why migrating between vector stores is usually "re-upsert the same vectors with the same IDs", not "re-embed everything from scratch" — assuming you kept the source text and the model name pinned. See embeddings: meaning as geometry for the underlying intuition.

Does it matter that Pinecone is closed source?

It matters about as much as any managed cloud dependency in your retrieval critical path. The pragmatic question is not "open vs closed" but "what happens if this vendor's pricing, region availability, or SLO no longer fits us in eighteen months?" The answer for Pinecone is "re-upsert your vectors elsewhere," which is cheaper than people think — the embedding model and source text are the assets, the index is replaceable. The hedge that costs nothing is a thin abstraction layer over reads and writes; the hedge that costs something is sticking with self-hostable stores so the exit is trivial.

Where does this sit in the broader RAG stack?

The vector store is one layer inside a retrieval pipeline that also includes chunking, embedding, hybrid search, reranking, and query rewriting. Picking the right store does not save a bad chunking strategy and does not invent recall the embeddings did not have. The honest sequence is: get RAG working end-to-end with whatever store is closest at hand (very often pgvector), measure where retrieval is actually failing using the patterns in evaluating RAG, and only then move the index if the data tells you to.

Further reading

On this wiki:

Project sources:

  • pgvector repo — the extension, with HNSW and IVFFlat index docs.
  • Pinecone docs — serverless indexes, namespaces, sparse-dense hybrid, metadata filters.
  • Weaviate docs — schema, modules, GraphQL query API, hybrid search.
  • Qdrant docs — collections, points, payload indexes, filter-aware HNSW, quantization.