The basic design of LLVM is a lossy compression of the internet. Input and previous output tokens produces a probability distribution of the next output token, mimicing training data. Then "spice" is added to fix errors; like the LLM talks to itself first. What did I miss?