Limits of vector search a new GDM paper shows that embeddings can’t represent combinations of concepts well e.g. Dave likes blue trucks AND Ford trucks even k=2 sub-predicates make SOTA embedding models fall apart www.alphaxiv.org/pdf/2508.21038
Limits of vector search a new GDM paper shows that embeddings can’t represent combinations of concepts well e.g. Dave likes blue trucks AND Ford trucks even k=2 sub-predicates make SOTA embedding models fall apart www.alphaxiv.org/pdf/2508.21038
btw even adding a reranker won’t help if you’ve already dropped the relevant results in the first stage embedding retrieval agentic search DOES work, but now you’re relying on an expensive LLM to resolve simple boolean logic
multi-vector (late interaction) search like ColBERT also works, because it handles the predicate logic in cheaper latent space, but storage costs are a lot higher because, well it’s multi-vector (fwiw Qdrant and a few other vector DBs support multi-vectors) huggingface.co/jinaai/jina-...
you really need to capture the query and decompose it into multiple sub queries e.g. maybe get a 1B-3B LLM to rewrite the query into a DSL (e.g. a JSON breakdown of the various components and concepts in the query) and then push that logic into the database engine itself
Are you talking about rewriting the query and producing something you could prefer metadata on?
yeah, rewriting it into multiple queries with pareseable relationships between them
With AND what we usually want is intersection. So either directly use a DSL or a small LLM parses it out. From these we can seek intersections: simple and easy is to matrix multiply on unit vectors and filter or, use SVD (more complex but much more flexible). Geometric mean of hadamard as fallback
alternatively, sparse approaches like SPLADE do this in latent space but use inverted indices (regular full text search, exact matches) arxiv.org/abs/2107.057...
imo if search is done perfectly, you effectively drive your LLM context to infinity but it’s very much not a solved problem to illustrate how underdeveloped this space is — research from 5 years ago still seems like the best ideas (contrast that to LLMs)
I've wondered about this same thing - If the limitations of vector databases could be improved upon by using a very small tool-calling model for specifically handling various calls to a backend database for various identified subqueries.
An odd thing is that DeepMind came out with Muvera last year (arxiv.org/html/2405.19...) which takes multi-vector and encodes back into a single vector with pretty decent results. It would have been great for that to be included here. (BM25 rules the world around us still)
that’s really cool, i’ll check it out