No correlation. One LLM doing better on a benchmark than another may still mean it gets more things wrong outside the benchmark. There's no way to calculate the real world accuracy of an LLM, it would be exhaustive testing with no ground truth to check against for nearly all inputs.