I think you can definitely sample real world use-case inputs and outputs
I think you can definitely sample real world use-case inputs and outputs
No you can't - there's no way to do it meaningfully, what would make your samples representative? (And that's if you strip away the layer that makes LLMs stochastic, if you leave it on you need to get into repetition and frequency of correct answer on every input.)
Certainly not trying to say it would be easy but I do think you can empirically observe the "performance" under so given set of assumption. As yes you will have to get in to repetition and frequency counts and so forth
Nope. If input language is text, what is a representative input sample?
Strikes me we must be talking at crossed purposes 🤷🏻♂️