avatar
the beastly fido @theophite.bsky.social

we do a lot of things to try and fix this -- layer norms, batch norms (maybe; they work but we don't know why), skip connections -- but once the model starts developing polysemanticity, everything becomes entangled in a way which is very difficult to unstick.

jul 22, 2025, 8:28 pm • 1 0

Replies

avatar
DJ @eschaton-avoidance.bsky.social

Is this still true in character rather than token models? Trying to figure out if it's more of a structural thing (always lowest layer least plastic) or if there's an early-semantic "mess" where layers must start accepting the previous layers' framing of the world

jul 22, 2025, 8:31 pm • 1 0 • view
avatar
SE Gyges @segyges.bsky.social

if this helps: character and token models should look much the same, other than the embedding layer, iff they are well trained and you are lookin at the activations after at least one attention pass

jul 22, 2025, 8:38 pm • 3 0 • view
avatar
SE Gyges @segyges.bsky.social

like. the character embeddings will be dogshit simple, but to perform well the internal activations will take on the same shape after you've run attention to mix in the information you specifically want

jul 22, 2025, 8:40 pm • 1 0 • view
avatar
the beastly fido @theophite.bsky.social

this is true even of things which don't deal with words at all, although i have a strong intuition that models which deal with hierarchical signal decomposition (i.e., diffusion models) probably have less of a hard time with it than models where the signal is extremely white. (images are pink)

jul 22, 2025, 8:34 pm • 2 0 • view
avatar
DJ @eschaton-avoidance.bsky.social

this is neat, thanks!!

jul 22, 2025, 8:39 pm • 1 0 • view