avatar
Kevin Riggle @kevinriggle.bsky.social

Isn’t the answer here in 2025 to figure out what the embedding vector of your specific topic is and use an LLM to search for that embedding vector in your corpus?

sep 1, 2025, 7:31 pm • 1 0

Replies

avatar
Kevin Riggle @kevinriggle.bsky.social

Iiuc this is the whole genius of word embedding models

sep 1, 2025, 7:32 pm • 1 0 • view
avatar
Kevin Riggle @kevinriggle.bsky.social

That they are good at catching associations with topics that wouldn’t be included in a straight classifier approach

sep 1, 2025, 7:33 pm • 0 0 • view
avatar
William B. Fuckley @opinionhaver.bsky.social

is there a good out of the box way to do that? I'm pretty handy with things like structural topic models and supervised learning at this point, but that might be a little outside my toolset at the moment.

sep 1, 2025, 7:46 pm • 0 0 • view
avatar
🚂Cameron🌐🥑🔰🏗️🌇🌮🚝 🦁🚲 @csreid.bsky.social

OpenAI offers the mapping to embeddings as a service; then it's just a matter of finding the centroid of the ones with your word and then (probably) everything within some distance of that. Big problem is that I think you'd need a networked API call for each Could also do like a tiny llama model?

sep 1, 2025, 8:40 pm • 2 0 • view
avatar
🚂Cameron🌐🥑🔰🏗️🌇🌮🚝 🦁🚲 @csreid.bsky.social

idk what the size of your data or this class is but I bet you can be more efficient than this, though.

sep 1, 2025, 8:42 pm • 1 0 • view
avatar
Kevin Riggle @kevinriggle.bsky.social

this is, sadly, way outside my current expertise

sep 1, 2025, 7:47 pm • 1 0 • view