avatar
the beastly fido @theophite.bsky.social

interesting. that level of plasticity is very much a surprise -- in silico, there is a steep decline in plasticity as you pass down the model. (neural processing is very much not as hierarchical either, though.)

jul 22, 2025, 8:16 pm • 2 0

Replies

avatar
DJ @eschaton-avoidance.bsky.social

Very much the opposite in humans. Major percepts (vision/audio/touch/pain) are highly stereotyped and organized (although plasticity is used as part of processing in some cases), while conceptual regions have minimal localization with the "concept-holding" circuit changing many times per second

jul 22, 2025, 8:19 pm • 2 0 • view
avatar
DJ @eschaton-avoidance.bsky.social

Memory is where concepts are calcified so hippocampus would be the best bet for your idea to maybe work? thinking more

jul 22, 2025, 8:19 pm • 1 0 • view
avatar
DJ @eschaton-avoidance.bsky.social

I had no idea about the in-silico result. Is this a result of backprop (downstream regions have fewer steps to propagate error) or something more fundamental?

jul 22, 2025, 8:21 pm • 1 0 • view
avatar
the beastly fido @theophite.bsky.social

internal covariate shift, essentially. there is very low plasticity near the input of the model because changes in deep representations result in signal which cannot be interpreted by neurons closer to the output layer. this is a big problem in, e.g., RLHF.

jul 22, 2025, 8:23 pm • 5 0 • view
avatar
DJ @eschaton-avoidance.bsky.social

ooooh, I had down/up mixed. That fits my (post hoc) intuition for hierarchical models in the sense that each layer is passing a code, if the code is changed a lot at the first telephone pass the ultimate output will be wrong-er

jul 22, 2025, 8:26 pm • 2 0 • view
avatar
the beastly fido @theophite.bsky.social

we do a lot of things to try and fix this -- layer norms, batch norms (maybe; they work but we don't know why), skip connections -- but once the model starts developing polysemanticity, everything becomes entangled in a way which is very difficult to unstick.

jul 22, 2025, 8:28 pm • 1 0 • view
avatar
DJ @eschaton-avoidance.bsky.social

Is this still true in character rather than token models? Trying to figure out if it's more of a structural thing (always lowest layer least plastic) or if there's an early-semantic "mess" where layers must start accepting the previous layers' framing of the world

jul 22, 2025, 8:31 pm • 1 0 • view
avatar
SE Gyges @segyges.bsky.social

if this helps: character and token models should look much the same, other than the embedding layer, iff they are well trained and you are lookin at the activations after at least one attention pass

jul 22, 2025, 8:38 pm • 3 0 • view
avatar
SE Gyges @segyges.bsky.social

like. the character embeddings will be dogshit simple, but to perform well the internal activations will take on the same shape after you've run attention to mix in the information you specifically want

jul 22, 2025, 8:40 pm • 1 0 • view
avatar
the beastly fido @theophite.bsky.social

this is true even of things which don't deal with words at all, although i have a strong intuition that models which deal with hierarchical signal decomposition (i.e., diffusion models) probably have less of a hard time with it than models where the signal is extremely white. (images are pink)

jul 22, 2025, 8:34 pm • 2 0 • view
avatar
DJ @eschaton-avoidance.bsky.social

this is neat, thanks!!

jul 22, 2025, 8:39 pm • 1 0 • view
avatar
DJ @eschaton-avoidance.bsky.social

We fundamentally don't know anything about what the (necessary) "fixed-points" of concepts are. There must be some latent structure the concept is being "passed" around in a way that is de-constructable, but it just happens that location is not it and no competitors are clear frontrunners

jul 22, 2025, 8:28 pm • 2 0 • view
avatar
the beastly fido @theophite.bsky.social

yeah that is one of my biggest problems with neuroscience: addressability in a system which seems substantially nonlocal

jul 22, 2025, 8:32 pm • 3 0 • view
avatar
DJ @eschaton-avoidance.bsky.social

Some competitors/supplements to spatial just in case you want to read more are (in any combination): - Topological (spatial or functional, I think spatial topological models are silly tho) - Frequency/power spectrum-ish stuff - Functional connectivity kinds of things - Whatever Friston is on about

jul 22, 2025, 8:36 pm • 2 0 • view
avatar
the beastly fido @theophite.bsky.social

yeah, i was actually just thinking through how i'd build addressability via cumulative noise, and, like, "well, if the noise spectrum were 1/f, then so long as the dominant frequencies in the characteristic noise were fixed points (and we ran a high-pass filter) then plasticity wouldn't be that bad.

jul 22, 2025, 8:39 pm • 2 0 • view
avatar
DJ @eschaton-avoidance.bsky.social

I'm pretty sure fixed-ish oscillation frequencies (alpha, beta, whatnot) do (in theory) serve the purpose of addressability in the brain in the frequency band/functional connectivity literature, but I don't have a great source

jul 22, 2025, 8:42 pm • 1 0 • view
avatar
DJ @eschaton-avoidance.bsky.social

For more understood neural (not well understood) solutions to addressability, hippocampal conceptual-spatial maps are neat too (www.jneurosci.org/content/40/3...)

jul 22, 2025, 8:37 pm • 2 0 • view