avatar
Tim Kellogg @timkellogg.me

maybe i read to fast, but i saw that part about choosing embedding dimensions based on number of attention heads pretty sure that’s only relevant for text generation trouble is “embeddings” can be either input or output, and the post seems to use it both ways without clarifying. a bit confusing

sep 1, 2025, 6:33 pm • 0 0

Replies

avatar
Vicki @vickiboykis.com

Yep, the post mentions focusing on text generation

sep 1, 2025, 6:36 pm • 1 0 • view
avatar
Tim Kellogg @timkellogg.me

well crap! lol, i guess i was thinking you were deeper into IR for some reason. sorry about that

sep 1, 2025, 6:39 pm • 0 0 • view