Yann LeCun and Abacus AI Launch LiveBench, a New LLM B

A team of AI researchers/academics, incl. @ylecun developed a new open LLM benchmark called LiveBench. It evaluates models using contamination-free test data and objective scoring. I spoke w/some of its creators: @micahgoldblum & folks from @abacusai. https://t.co/V75Jp6rNFz

The Daily AI@thedailyAi_

15 d

LiveBench is an open LLM benchmark that uses contamination-free test data and objective scoring: Yann LeCun and other researchers have developed LiveBench, an open AI benchmark evaluating models using challenging, contamination-free… https://t.co/A7CygWiM8e #AI #AIbenchmarking

VentureBeat@VentureBeat

15 d

LiveBench is an open LLM benchmark that uses contamination-free test data and objective scoring https://t.co/pYKQFRAIKx https://t.co/1So8uyes1C

Bindu Reddy@bindureddy

16 d

Announcing LiveBench AI - The WORLD'S FIRST LLM Benchmark That Can't Be Gamed!! We (Abacus AI) partnered with Yann LeCunn and his team to create LiveBench AI! LiveBench is a living/breathing benchmark with new challenges that you CAN'T simply memorize. Unlike blind human eval,… https://t.co/w0Xq2d2m5L

khalid@k_saifullaah

16 d

since training on test sets is becoming an increasingly concerning issue, we are excited to introduce LiveBench, a benchmark that is alive and contamination-free! We have also made it super lightweight and easy to run, with only around 200 questions per category. check it out👇 https://t.co/DdSvTEF1lc

Micah Goldblum@micahgoldblum

16 d

🚨 Announcing LiveBench, a challenging new general-purpose live LLM benchmark! 🚨 Thanks @crwhite_ml and @SpamuelDooley for leading the charge! Link: https://t.co/blOR8qLInV Existing LLM benchmarks have serious limitations: 🧵 https://t.co/O1A74cs4R0

Micah Goldblum@micahgoldblum

16 d

🚨 Announcing LiveBench, a challenging new general-purpose live LLM benchmark! 🚨 Thanks @crwhite_ml and @SpamuelDooley for leading the charge on this one! Link: https://t.co/blOR8qLInV Existing LLM benchmarks have serious limitations: 🧵 https://t.co/NCjIOc2A3G

Similar Stories

Yann LeCun and Abacus AI Launch LiveBench, a New LLM Benchmark

Similar Stories

Sources

Yann LeCun and Abacus AI Launch LiveBench, a New LLM Benchmark