ARC-AGI Benchmark Sees Progress with 71% Accuracy Achi

Progress on $1M ARC-AGI benchmark that is very hard for LLMs by carefully-crafted few-shot prompt to generate many possible Python programs to implement the transformations, generating ~5k guesses, selecting the best ones using the examples, and a debugging step, which is… https://t.co/jCfuY1fsps

Burny — Effective Omni@burny_tech

11 d

Progress on $1M ARC-AGI benchmark that is very hard for LLMs by carefully-crafted few-shot prompt to generate many possible Python programs to implement the transformations, generating ~5k guesses, selecting the best ones using the examples, and a debugging step. https://t.co/jCfuY1fsps

Dileep George@dileeplearning

11 d

50% on ARC-AGI with GPT-4o This wonderful blog post brings out another point that I didn't explicitly mention in my blog -- ARC-AGI gets solved with a bunch of very clever tricks around existing models, and more search compute. https://t.co/YvoT4PC3yz https://t.co/CeXqixsbSF

Harrison Kinsley@Sentdex

11 d

The solution to ARC-AGI will not be considered remotely close to AGI. Going thru samples, this strikes me as a very narrow intelligence problem. But it's a very cool challenge and uses an area LLMs in particular tend to be weak at: cell-based rules (like Game of Life).

Buck Shlegeris@bshlgrs

11 d

ARC-AGI’s been hyped over the last week as a benchmark that LLMs can’t solve. This claim triggered my dear coworker Ryan Greenblatt so he spent the last week trying to solve it with LLMs. Ryan gets 71% accuracy on a set of examples where humans get 85%; this is SOTA. https://t.co/tqrzcMz9qD

Similar Stories

ARC-AGI Benchmark Sees Progress with 71% Accuracy Achieved by Ryan Greenblatt, $1 Million Prize

Similar Stories

Sources

ARC-AGI Benchmark Sees Progress with 71% Accuracy Achieved by Ryan Greenblatt, $1 Million Prize