A team of AI researchers, including Yann LeCun, has announced the launch of LiveBench, a new general-purpose live LLM benchmark. LiveBench aims to address the limitations of existing LLM benchmarks by using contamination-free test data and objective scoring. Developed in collaboration with Abacus AI, LiveBench is designed to be lightweight and easy to run, featuring around 200 questions per category. This benchmark is unique in that it presents new challenges that models cannot simply memorize, making it a more robust tool for evaluating AI models.
A team of AI researchers/academics, incl. @ylecun developed a new open LLM benchmark called LiveBench. It evaluates models using contamination-free test data and objective scoring. I spoke w/some of its creators: @micahgoldblum & folks from @abacusai. https://t.co/V75Jp6rNFz
LiveBench is an open LLM benchmark that uses contamination-free test data and objective scoring: Yann LeCun and other researchers have developed LiveBench, an open AI benchmark evaluating models using challenging, contamination-free… https://t.co/A7CygWiM8e #AI #AIbenchmarking
LiveBench is an open LLM benchmark that uses contamination-free test data and objective scoring https://t.co/pYKQFRAIKx https://t.co/1So8uyes1C
Announcing LiveBench AI - The WORLD'S FIRST LLM Benchmark That Can't Be Gamed!! We (Abacus AI) partnered with Yann LeCunn and his team to create LiveBench AI! LiveBench is a living/breathing benchmark with new challenges that you CAN'T simply memorize. Unlike blind human eval,… https://t.co/w0Xq2d2m5L
since training on test sets is becoming an increasingly concerning issue, we are excited to introduce LiveBench, a benchmark that is alive and contamination-free! We have also made it super lightweight and easy to run, with only around 200 questions per category. check it out👇 https://t.co/DdSvTEF1lc
🚨 Announcing LiveBench, a challenging new general-purpose live LLM benchmark! 🚨 Thanks @crwhite_ml and @SpamuelDooley for leading the charge! Link: https://t.co/blOR8qLInV Existing LLM benchmarks have serious limitations: 🧵 https://t.co/O1A74cs4R0
🚨 Announcing LiveBench, a challenging new general-purpose live LLM benchmark! 🚨 Thanks @crwhite_ml and @SpamuelDooley for leading the charge on this one! Link: https://t.co/blOR8qLInV Existing LLM benchmarks have serious limitations: 🧵 https://t.co/NCjIOc2A3G