Post by Tim Kellogg / Redsky

Limits of vector search a new GDM paper shows that embeddings can’t represent combinations of concepts well e.g. Dave likes blue trucks AND Ford trucks even k=2 sub-predicates make SOTA embedding models fall apart www.alphaxiv.org/pdf/2508.21038

aug 31, 2025, 11:06 am • 76 23

Replies

btw even adding a reranker won’t help if you’ve already dropped the relevant results in the first stage embedding retrieval agentic search DOES work, but now you’re relying on an expensive LLM to resolve simple boolean logic

aug 31, 2025, 11:06 am • 7 2 • view

multi-vector (late interaction) search like ColBERT also works, because it handles the predicate logic in cheaper latent space, but storage costs are a lot higher because, well it’s multi-vector (fwiw Qdrant and a few other vector DBs support multi-vectors) huggingface.co/jinaai/jina-...

aug 31, 2025, 11:06 am • 5 3 • view

you really need to capture the query and decompose it into multiple sub queries e.g. maybe get a 1B-3B LLM to rewrite the query into a DSL (e.g. a JSON breakdown of the various components and concepts in the query) and then push that logic into the database engine itself

aug 31, 2025, 11:06 am • 4 3 • view

Are you talking about rewriting the query and producing something you could prefer metadata on?

aug 31, 2025, 11:25 am • 1 0 • view

yeah, rewriting it into multiple queries with pareseable relationships between them

aug 31, 2025, 11:34 am • 1 0 • view

With AND what we usually want is intersection. So either directly use a DSL or a small LLM parses it out. From these we can seek intersections: simple and easy is to matrix multiply on unit vectors and filter or, use SVD (more complex but much more flexible). Geometric mean of hadamard as fallback

aug 31, 2025, 2:52 pm • 1 0 • view

alternatively, sparse approaches like SPLADE do this in latent space but use inverted indices (regular full text search, exact matches) arxiv.org/abs/2107.057...

aug 31, 2025, 11:06 am • 5 2 • view

imo if search is done perfectly, you effectively drive your LLM context to infinity but it’s very much not a solved problem to illustrate how underdeveloped this space is — research from 5 years ago still seems like the best ideas (contrast that to LLMs)

aug 31, 2025, 11:06 am • 9 2 • view

I've wondered about this same thing - If the limitations of vector databases could be improved upon by using a very small tool-calling model for specifically handling various calls to a backend database for various identified subqueries.

aug 31, 2025, 2:02 pm • 1 0 • view

An odd thing is that DeepMind came out with Muvera last year (arxiv.org/html/2405.19...) which takes multi-vector and encodes back into a single vector with pretty decent results. It would have been great for that to be included here. (BM25 rules the world around us still)

aug 31, 2025, 12:25 pm • 1 0 • view

that’s really cool, i’ll check it out

aug 31, 2025, 12:34 pm • 1 0 • view