Their intelligence is for sure "jagged" as Karapthy describes it, which means you can't really do that sort of interpolation on their capabilities. Sometimes they can do very complicated things well and still get easy-to-us-things wrong.
Their intelligence is for sure "jagged" as Karapthy describes it, which means you can't really do that sort of interpolation on their capabilities. Sometimes they can do very complicated things well and still get easy-to-us-things wrong.
For example, I used an LLM to help me debug a complicated, subtle issue with a ML model I'd set up and how I was using its results (my cosine similarity was flipping its sign randomly train-to-train). Though a bunch of back-and-forth description, it finally helped me figure out the issue.
It's for sure weird that it can do that sort of thing, with deep domain knowledge and a great ability to explore the issue with me and eventually explain what the problem was...and still maybe get state palindromes wrong! But that just means its a tool people should carefully, IMO.
I'm not a "vibe coding" person and I wouldn't use LLM-generated results without reading and understanding it myself. I just think the "they're useless" camp is missing the boat, and should really explore when and where they can be useful. Because I think they really can be!
I've seen generative AI do some really cool things that humans basically can't (Matt Parker did a video about jigsaws with two distinct "correct" solutions) but expecting correct complex answers requiring some creativity is off the table for me. Your debugging example seems to me more #1 (cont'd)
Generating code coverage test cases and flagging discrepancies in responses is the sort of repetetive detailed work that computers are excellent at - relatively little "intelligence" needed once the parameters are properly specified. But analyze obfuscated code? Good luck, don't get your disk wiped