avatar
tocharian spongebart eatpants @theophite.bsky.social

like, right? if you have just an absolutely enormous error on a parameter (because you initialized a token to a random value), then v_t is going to be something like 1e-12, making the effective LR = eps2, which is 0.01, even if your actual LR is 1e-7.

aug 28, 2025, 10:58 pm • 4 0

Replies

No replies