it feels like a logical outcome of entropix — they found out that the attention logits were important and useful for sampling, this instead uses them for increasing TTC wo reasoning
it feels like a logical outcome of entropix — they found out that the attention logits were important and useful for sampling, this instead uses them for increasing TTC wo reasoning
No replies