(it does this, i think, because of amplification of signal between the text encoder and the UNet's x-attention domain, which then saturates at least one VAE channel and produces incorrect colors.)
(it does this, i think, because of amplification of signal between the text encoder and the UNet's x-attention domain, which then saturates at least one VAE channel and produces incorrect colors.)
er, -0.99, but you get my point.
like, right? if you have just an absolutely enormous error on a parameter (because you initialized a token to a random value), then v_t is going to be something like 1e-12, making the effective LR = eps2, which is 0.01, even if your actual LR is 1e-7.
Line search time? If the new point is worse than the old one, back up until it isn't worse
Hmm... is there a way to apply line search when you don't have the actual function (or a good approximation) close at hand?
this is a @nsaphra.bsky.social paper
lmao yes