that’s actually pretty good. the “predict next token” is pretraining. preferences is post training data is kind of an ever-present problem throughout the process
that’s actually pretty good. the “predict next token” is pretraining. preferences is post training data is kind of an ever-present problem throughout the process
I knew what post-training meant but not pretraining 😅 on data, I guess model makers mostly reuse what they already collected?
for data I asked because even models with a cutoff date supposedly in 2025 (like Gemini 2.5 in January) will often default to 2024 or even earlier knowledge. So maybe this is because most training data is <2023?