Post by Sherlock Holmes-type guy / Redsky

Sherlock Holmes-type guy • Feed

And isn’t the summarized conversation always fed back into the model, regardless of the version? No LLM maintains state as far as I’m aware

aug 22, 2025, 4:45 pm • 1 0

Replies

With KV caching, you don't need to recompute the entire conversation history each time. The model caches the computed key-value pairs for all previous tokens in the conversation. If they don't store the cache, then yeah, it all has to be computed, but most will store it for some time.

aug 22, 2025, 4:49 pm • 1 0 • view

And so the argument is that by routing queries to different models within a single conversation/session the KV caching needs to, at a minimum, be done more frequently if not every time?

aug 22, 2025, 4:57 pm • 2 0 • view

yeah, exactly

aug 22, 2025, 5:22 pm • 1 0 • view