Open LLM Leaderboard 2 Released with New Benchmarks; Q

Fabulous talk today by @BorisMPower of @OpenAI at @Yale @yaledatascience @YINSedge on “ChatGPT and the Future of LLMs.” The developments are mind-blowing. #HNL https://t.co/tnY4c6USj9

Smells Like ML@smellslikeml

4 d

Big news! The open llm leaderboard will be hard to game for a couple weeks! Looking forward to checking out Leaderboard 3 but for now, I'm choosing models based on my use case with MyxMatch Find fitness for free: https://t.co/uu8qp62QBB https://t.co/xVITCi3qq6 https://t.co/BJj2MoHlL2

clem 🤗@ClementDelangue

4 d

Pumped to announce the brand new open LLM leaderboard. We burned 300 H100 to re-run new evaluations like MMLU-pro for all major open LLMs! Some learning: - Qwen 72B is the king and Chinese open models are dominating overall - Previous evaluations have become too easy for recent…

Thomas Wolf@Thom_Wolf

4 d

Very excited to release the new version of the Open LLM Leaderboard, v2 – it's much harder than the previous version as you can see on some of the v1 <> v2 scores comparison I'm posting below Updated: As open models keeps getting better and saturating some of the evaluations it… https://t.co/zv6dSQCnhJ

Adina Yakup@AdeenaY8

4 d

Open LLM Leaderboard 2⃣️ 开源大模型排行榜 is now available on the @huggingface Hub 🔥🚀🏆 https://t.co/xpcsXRgBi7 ✨ New high-quality datasets for various tests. ✨ Chat templates in @AiEleuther's harness. ✨ Community voting system to prioritize model evaluations.…

Adina Yakup@AdeenaY8

4 d

Open LLM Leaderboard 2⃣️ is now available on the @huggingface Hub 🔥🚀🏆 https://t.co/xpcsXRgBi7 ✨ New high-quality datasets for various tests. ✨ Chat templates in @AiEleuther's harness. ✨ Community voting system to prioritize model evaluations. ✨"Maintainer's highlight" :…

Vaibhav (VB) Srivastav@reach_vb

4 d

Qwen 72B Instruct wins the Open LLM Leaderboard 2.0, for now ;) https://t.co/4EEHFCo0Rz

Weyaxi@Weyaxi

4 d

🚀 Very big update on the Open LLM Leaderboard! 🔥 New Evals: 📊 IFEval 📚 BBH 🔢 MATH Lvl 5 🤖 GPQA 🤔 MUSR 🔬 MMLU-PRO https://t.co/yTRTFhydq6

Clémentine Fourrier 🍊@clefourrier

4 d

To better get how the Open LLM Leaderboard v2 works, you should look at @ailozovskaya 's thread showcasing some of the cool front end features she added! https://t.co/7nXKEGozWy

Konrad Szafer@KonradSzafer

4 d

Open LLM Leaderboard is now rebuilt to better monitor the performance of LLMs! Many things have changed, starting from the task composition, technical details of the evaluation, and interface, to cool new visualizations! 🤗🚀 Check out our blog: https://t.co/9U1AJGjX1J https://t.co/NSqDgCeTbb https://t.co/wN8uMXlItD

Florent Daudens@fdaudens

4 d

Look at that 👀 Actual benchmarks have become too easy for recent models, much like grading high school students on middle school problems makes little sense. So the team worked on a new version of the Open LLM Leaderboard with new benchmarks. Stellar work from @clefourrier… https://t.co/jdmfp8oSM3 https://t.co/qzzCesnp71

Hailey Schoelkopf@haileysch__

4 d

Exciting stuff now coming to the new-and-improved Open LLM Leaderboard! @clefourrier and team very hard at work 👀 https://t.co/RDai6LvgeC

Adina Yakup@AdeenaY8

4 d

Open LLM Leaderboard 2⃣️is out!!🚀🏆 And Qwen2 is 🔥🔥🔥 https://t.co/P3fGrARlHp https://t.co/rYYy4hAqeQ

Alina Lozovskaya@ailozovskaya

4 d

🔥I'm super happy to see the new Open LLM Leaderboard 2 in production! It was immense work of the entire team 💖 🔗The link to Open LLM Leaderboard remains the same https://t.co/ecrYahipwt 🗒️And the blog is very informative, be sure to check it out! https://t.co/6zTEhyXDNo https://t.co/fwUH8aATzP https://t.co/yZXURnCioc

Adina Yakup@AdeenaY8

4 d

Open LLM Leaderboard 2⃣️is out!!🚀🏆 https://t.co/rYYy4hAqeQ

Philipp Schmid@_philschmid

4 d

Open LLM Leaderboard 2 released! Evaluating LLMs is not easy. Finding new ways to compare LLM fairly, transparently, and reproducibly is important! Benchmarks are not perfect, but they give us a first understanding of how well models perform and where their strengths are. What's… https://t.co/G5nZVNMAj2

Clémentine Fourrier 🍊@clefourrier

4 d

LLM performances have been plateauing... so we decided to make the Open LLM Leaderboard steep again 🏔️ 😈 Introducing the Leaderboard 2️⃣ Expect... - new benchmarks - fairer reporting - cool features (did I hear voting and chat template?) 🧵 https://t.co/6uKKuTSFrX

Louis-François Bouchard 🎙🎥@Whats_AI

6 d

New video! How do we Evaluate LLMs? 👀 The what, why, when and how! https://t.co/Fx8UsjWU1i https://t.co/S21yKjMVvA

Louis-François Bouchard 🎙🎥@Whats_AI

6 d

How do we Evaluate LLMs? https://t.co/Fx8UsjWU1i https://t.co/gAmk3kPPqw

Mark R. Hinkle@mrhinkle

6 d

Most businesses struggle to evaluate their AI models. Did you know few companies can accurately assess the performance of their AI systems? This skill, known as LLM evaluation, is crucial for better decision-making and improving the results from your genAI systems. Evaluation…

Similar Stories

Open LLM Leaderboard 2 Released with New Benchmarks; Qwen 72B Leads on Huggingface Hub

Similar Stories

Sources

Open LLM Leaderboard 2 Released with New Benchmarks; Qwen 72B Leads on Huggingface Hub