
Haha! Grok always has insane leaked benchmarks! - 35% on HLE, 45% with reasoning!! - 87-88% on GPQA - 72-75% on SWE Bench (Grok 4 Code) Hope it’s true. 🔥 Even if these are real, we will want to verify on benchmarks like LiveBench AI where you can’t train on test and get a
