avatar
Sherlock Holmes-type guy @tokenize.bsky.social

And isn’t the summarized conversation always fed back into the model, regardless of the version? No LLM maintains state as far as I’m aware

aug 22, 2025, 4:45 pm • 1 0

Replies

avatar
Will Ratcliff @wcratcliff.bsky.social

With KV caching, you don't need to recompute the entire conversation history each time. The model caches the computed key-value pairs for all previous tokens in the conversation. If they don't store the cache, then yeah, it all has to be computed, but most will store it for some time.

aug 22, 2025, 4:49 pm • 1 0 • view
avatar
Sherlock Holmes-type guy @tokenize.bsky.social

And so the argument is that by routing queries to different models within a single conversation/session the KV caching needs to, at a minimum, be done more frequently if not every time?

aug 22, 2025, 4:57 pm • 2 0 • view
avatar
Will Ratcliff @wcratcliff.bsky.social

yeah, exactly

aug 22, 2025, 5:22 pm • 1 0 • view