Is that an implementation of the “models should work with Type 1 and Type 2 brains” paper? bsky.app/profile/segy...
Is that an implementation of the “models should work with Type 1 and Type 2 brains” paper? bsky.app/profile/segy...
no, there’s no inner vs outer loop. this is literally just a regular MoE where some experts are noop
the outer loop on the linked paper does that also so i think it is probably an influence. analysis on their approach later showed that this was, in some cases, the important detail