Isn’t the answer here in 2025 to figure out what the embedding vector of your specific topic is and use an LLM to search for that embedding vector in your corpus?
Isn’t the answer here in 2025 to figure out what the embedding vector of your specific topic is and use an LLM to search for that embedding vector in your corpus?
Iiuc this is the whole genius of word embedding models
That they are good at catching associations with topics that wouldn’t be included in a straight classifier approach
is there a good out of the box way to do that? I'm pretty handy with things like structural topic models and supervised learning at this point, but that might be a little outside my toolset at the moment.
OpenAI offers the mapping to embeddings as a service; then it's just a matter of finding the centroid of the ones with your word and then (probably) everything within some distance of that. Big problem is that I think you'd need a networked API call for each Could also do like a tiny llama model?
idk what the size of your data or this class is but I bet you can be more efficient than this, though.
this is, sadly, way outside my current expertise