like, right? if you have just an absolutely enormous error on a parameter (because you initialized a token to a random value), then v_t is going to be something like 1e-12, making the effective LR = eps2, which is 0.01, even if your actual LR is 1e-7.