avatar
slowXtal @slowxtal.bsky.social

I know, rlhf and stuff ... Anyway, i found the paper interesting [2410.02724] Large Language Models as Markov Chains share.google/c6E5REHiGuWv...

jul 30, 2025, 10:07 pm โ€ข 2 0

Replies

avatar
Nafnlaus ๐Ÿ‡ฎ๐Ÿ‡ธ ๐Ÿ‡บ๐Ÿ‡ฆ ๐Ÿ‡ฌ๐Ÿ‡ช @nafnlaus.bsky.social

They're Markov chains to the same degree a human is a Markov chain: if you're a pedant who cares more for terminology than actualities, and if you also ignore the fact that they don't actually meet the Markov criteria (neither humans nor real-world LLMs are actually deterministic from fixed states)

jul 30, 2025, 11:55 pm โ€ข 4 0 โ€ข view
avatar
Nafnlaus ๐Ÿ‡ฎ๐Ÿ‡ธ ๐Ÿ‡บ๐Ÿ‡ฆ ๐Ÿ‡ฌ๐Ÿ‡ช @nafnlaus.bsky.social

(LLMs because of flash attention & its ilk, which are near universal in usage) But if you ignore that both actually fail the Markov criteria, then you can equally well represent the brain as a Markov chain as an LLM. And both are an exercise purely for pedants unrelated to real-world practicality.

jul 30, 2025, 11:55 pm โ€ข 2 0 โ€ข view
avatar
Nafnlaus ๐Ÿ‡ฎ๐Ÿ‡ธ ๐Ÿ‡บ๐Ÿ‡ฆ ๐Ÿ‡ฌ๐Ÿ‡ช @nafnlaus.bsky.social

The theoretical Markov chain they're talking about to represent a LLM is astronomical. Managing the state transition would require an amount of memory beyond all human comprehension. Beyond every unit of planck space times every unit of planck time across the entire history of the universe.

jul 31, 2025, 12:07 am โ€ข 2 0 โ€ข view
avatar
Nafnlaus ๐Ÿ‡ฎ๐Ÿ‡ธ ๐Ÿ‡บ๐Ÿ‡ฆ ๐Ÿ‡ฌ๐Ÿ‡ช @nafnlaus.bsky.social

It's an "angels dancing on the head of a pin" naval gazing exercise. It's not reality. And you can do what they're doing with literally *anything* that's deterministic. Which, as mentioned, LLMs in the real world aren't.

jul 31, 2025, 12:07 am โ€ข 2 0 โ€ข view
avatar
Nafnlaus ๐Ÿ‡ฎ๐Ÿ‡ธ ๐Ÿ‡บ๐Ÿ‡ฆ ๐Ÿ‡ฌ๐Ÿ‡ช @nafnlaus.bsky.social

They're trying to use this theoretical, not-actually-doable concept to try to come with real-world descriptions in their representation of edge-case LLM behavior - which again, you do for any deterministic system in the universe. But it's not actually /meaningful/.

jul 31, 2025, 12:07 am โ€ข 2 0 โ€ข view
avatar
Nafnlaus ๐Ÿ‡ฎ๐Ÿ‡ธ ๐Ÿ‡บ๐Ÿ‡ฆ ๐Ÿ‡ฌ๐Ÿ‡ช @nafnlaus.bsky.social

In particular, they're focused on the case where if you have a really wrong temperature value, LLMs can get stuck in a loop. They're describing the loop in terms of Markov states. But that throws away all of the *actual meaning* of what's happening to cause a LLM to get into a loop.

jul 31, 2025, 12:11 am โ€ข 2 0 โ€ข view
avatar
Nafnlaus ๐Ÿ‡ฎ๐Ÿ‡ธ ๐Ÿ‡บ๐Ÿ‡ฆ ๐Ÿ‡ฌ๐Ÿ‡ช @nafnlaus.bsky.social

I swear, some people have burned out their brains on math and see everything through the lens of statistics. "Wow, look at that fox running - that's some neat statistics right there!"

jul 31, 2025, 12:11 am โ€ข 2 0 โ€ข view
avatar
Singularity's Bounty e/๐Ÿ‡บ๐Ÿ‡ฆ @catblanketflower.yuwakisa.com

One problem is that a lot of people have exploded the Markov chain in all kinds of bizarre variations from states, transitions, transition probability matrix and critically โ€œstateless except the last stateโ€ 4o had a fun line: like saying โ€œa piano is a drum if you hit it percussively.โ€

jul 31, 2025, 3:11 pm โ€ข 1 0 โ€ข view
avatar
Singularity's Bounty e/๐Ÿ‡บ๐Ÿ‡ฆ @catblanketflower.yuwakisa.com

I think if someone makes that claim the first response is to ask โ€œprecisely which definition of Markov chain are you using?โ€ Because it takes some clever dancing to roll up LLMs encoding historyโ€”a fundamental principle

jul 31, 2025, 3:15 pm โ€ข 0 0 โ€ข view
avatar
Nafnlaus ๐Ÿ‡ฎ๐Ÿ‡ธ ๐Ÿ‡บ๐Ÿ‡ฆ ๐Ÿ‡ฌ๐Ÿ‡ช @nafnlaus.bsky.social

Maybe a good question would be, "are we describing a Markov chain whose state information you could actually fit within our universe?" Describing everything as state-transition probabilities entirely ignores the "how".

jul 31, 2025, 4:17 pm โ€ข 1 0 โ€ข view
avatar
Arseny Khakhalin @khakhalin.bsky.social

Haha for a spiking network neuroscientist a human IS a markov chain, and not in a philosophical sense, but like, in a simple practical sense. Like table is made of wood, and croissants of butter, so me and you are made of markov chains :)

jul 31, 2025, 3:55 pm โ€ข 2 0 โ€ข view
avatar
Nafnlaus ๐Ÿ‡ฎ๐Ÿ‡ธ ๐Ÿ‡บ๐Ÿ‡ฆ ๐Ÿ‡ฌ๐Ÿ‡ช @nafnlaus.bsky.social

If a term describes everything, then it describes nothing.

jul 31, 2025, 4:13 pm โ€ข 1 0 โ€ข view
avatar
Arseny Khakhalin @khakhalin.bsky.social

language? numbers? I'm not sure I agree :) it is not exactly surprising that a non-deterministic stochastic process on a graph can be used to describe lots of things. But also a pyramid is not a markov chain. Neither is an empty set. Touchรฉ!

jul 31, 2025, 4:25 pm โ€ข 0 0 โ€ข view
avatar
Nafnlaus ๐Ÿ‡ฎ๐Ÿ‡ธ ๐Ÿ‡บ๐Ÿ‡ฆ ๐Ÿ‡ฌ๐Ÿ‡ช @nafnlaus.bsky.social

"Language" and "numbers" are two nouns. I'm not sure how they're supposed to be questions. Pyramids and empty sets, absent a transformation, are extremely simple Markov chains, with a 100% probability of transitioning to the same state.

jul 31, 2025, 4:34 pm โ€ข 1 0 โ€ข view
avatar
Nafnlaus ๐Ÿ‡ฎ๐Ÿ‡ธ ๐Ÿ‡บ๐Ÿ‡ฆ ๐Ÿ‡ฌ๐Ÿ‡ช @nafnlaus.bsky.social

My big issue with trying to describe everything in terms of Markov chains is that it's the dumbest, least-informative way to represent the world. State transitions are just numbers with no information about the processes underlying why those numbers exist. And since they lack any sort of "process"..

jul 31, 2025, 4:34 pm โ€ข 1 0 โ€ข view
avatar
Nafnlaus ๐Ÿ‡ฎ๐Ÿ‡ธ ๐Ÿ‡บ๐Ÿ‡ฆ ๐Ÿ‡ฌ๐Ÿ‡ช @nafnlaus.bsky.social

..they're the least efficient way you can possibly address a complex problem. If I say "I'm going to write a thousand random words then write the word banana", and do just that, it's a *conceptually* trivial problem, but building a Markov chain solve it, the state info couldn't fit in our universe.

jul 31, 2025, 4:34 pm โ€ข 1 0 โ€ข view
avatar
Nafnlaus ๐Ÿ‡ฎ๐Ÿ‡ธ ๐Ÿ‡บ๐Ÿ‡ฆ ๐Ÿ‡ฌ๐Ÿ‡ช @nafnlaus.bsky.social

Reducing everything to Markov chains is literally the inverse of the scientific process, where the goal is to understand the order that's actually underlying a complex system. You're taking gold and turning it into lead.

jul 31, 2025, 4:36 pm โ€ข 1 0 โ€ข view
avatar
Nafnlaus ๐Ÿ‡ฎ๐Ÿ‡ธ ๐Ÿ‡บ๐Ÿ‡ฆ ๐Ÿ‡ฌ๐Ÿ‡ช @nafnlaus.bsky.social

I'm not going to say it's not useful. If there's an immensely complex system, but it's reduceable down to just a handful of states give or take, then it can be useful to do higher-level modeling based on those states. But that isn't at all applicable to discussions of LLMs.

jul 31, 2025, 4:38 pm โ€ข 1 0 โ€ข view
avatar
Arseny Khakhalin @khakhalin.bsky.social

But also I'm not talking about reducing it descriptively, like as if I were a markov chain that alternates between doomscrolling and complaining. No! The other way around. Each neuron can fire or not (2 states, if doing spherical vacoom). A pair of two neurons is a chain of a cross-product of states

jul 31, 2025, 4:48 pm โ€ข 0 0 โ€ข view
avatar
Arseny Khakhalin @khakhalin.bsky.social

I mean, of course!! Saying everything is an ODE implies that you hope to solve them, at least numerically. With processes on a graph you have dynamic systems, complexity, chaos, emergence, stable states, the whole happy world. That's where the fun starts.

jul 31, 2025, 4:44 pm โ€ข 0 0 โ€ข view
avatar
Arseny Khakhalin @khakhalin.bsky.social

I mean that language and math CAN describe everything (they are not "terms" though). Saying "X is a markov chain" is more like restricting yourself to a type of math (everything is ODE, everything is a vector). It's trivially true. The question is whether it is productive. And for brains, it is!

jul 31, 2025, 4:40 pm โ€ข 0 0 โ€ข view
avatar
Nafnlaus ๐Ÿ‡ฎ๐Ÿ‡ธ ๐Ÿ‡บ๐Ÿ‡ฆ ๐Ÿ‡ฌ๐Ÿ‡ช @nafnlaus.bsky.social

For individual neurons or small clusters or neurons I'll absolutely buy that, in terms of simplifying the modeling of larger-scale phenomena (e.g. abstracting away chemistry). But go ahead and show me the state information for an entire brain.

jul 31, 2025, 4:50 pm โ€ข 1 0 โ€ข view
avatar
Nafnlaus ๐Ÿ‡ฎ๐Ÿ‡ธ ๐Ÿ‡บ๐Ÿ‡ฆ ๐Ÿ‡ฌ๐Ÿ‡ช @nafnlaus.bsky.social

The utility of representing something as Markov chains depends on whether the complexity is convergent to a simply-described behavior (e.g. the behavior of gas atoms in a balloon is complex, but PV=nRT is not), or divergent to a behavior of ever-increasing complexity with scale.

jul 31, 2025, 4:50 pm โ€ข 1 0 โ€ข view
avatar
Nafnlaus ๐Ÿ‡ฎ๐Ÿ‡ธ ๐Ÿ‡บ๐Ÿ‡ฆ ๐Ÿ‡ฌ๐Ÿ‡ช @nafnlaus.bsky.social

The reason Markov chains failed with language for anything more complex than autocomplete, and Transformers succeeded, is because language is a task of describing all nuances of the human condition within the universe we exist in, and this is highly divergent with scale.

jul 31, 2025, 4:50 pm โ€ข 1 0 โ€ข view
avatar
Arseny Khakhalin @khakhalin.bsky.social

In this context MC is not an underlying abstraction but a practical attempt of high-level description. Of course they failed! MCs are used in neuro at this level as well: say, to describe the p of transition between grooming and sleeping in a mouse that is bored. But that's not what I meant.

jul 31, 2025, 4:57 pm โ€ข 0 0 โ€ข view
avatar
slowXtal @slowxtal.bsky.social

Ignoring plasticity,maybe ?

jul 31, 2025, 4:12 pm โ€ข 1 0 โ€ข view
avatar
Nafnlaus ๐Ÿ‡ฎ๐Ÿ‡ธ ๐Ÿ‡บ๐Ÿ‡ฆ ๐Ÿ‡ฌ๐Ÿ‡ช @nafnlaus.bsky.social

First, assume a spherical cow.

jul 31, 2025, 4:23 pm โ€ข 0 0 โ€ข view
avatar
Arseny Khakhalin @khakhalin.bsky.social

You're not getting it. Spiking network neuroscientists study how networks spike. Our brain is a spiking network. Transitions in this network form a stochastic temporal process on a finite set of states. Calling it a spherical cow is really stretching it.

jul 31, 2025, 4:29 pm โ€ข 0 0 โ€ข view
avatar
Arseny Khakhalin @khakhalin.bsky.social

The weak aspect of this approximation in particular is that is explaining _too much_, it is computationally cumbersome, so it's on an opposite end of a spectrum from a spherical cow, as far as models of brains / cognition go.

jul 31, 2025, 4:29 pm โ€ข 0 0 โ€ข view
avatar
Arseny Khakhalin @khakhalin.bsky.social

Plastic malleable creative and loving markov chain!

jul 31, 2025, 4:23 pm โ€ข 0 0 โ€ข view