Scale AI Enters LLM Evaluation Game with Private Evalu

📢New: Part 2 (of 3) What We Learned from a Year of Building with LLMs https://t.co/FCSuJFSld1 If you liked Part 1, Part 2 is a banger. We answer the following: Some of my favorite takes: AI Engineering Is NOT All You Need Look At Your Data We found that most people are… https://t.co/3cj3bSBdQQ

DataChazGPT (not a bot)@DataChaz

1 mo

What we learned from a year of building with LLMs. Great read for anyone into building stuff with large language models. https://t.co/CH8bTuw6SC https://t.co/CH8bTuw6SC

DataChazGPT (not a bot)@DataChaz

1 mo

What we learned from a year of building with LLMs. Great read if you're into AI. https://t.co/CH8bTuw6SC

Alex Northstar@NorthstarBrain

1 mo

"What we learned from a year of building with LLMs." This is the absolute top 10 reads if you're into AI. For legit operators. https://t.co/WHCLK4fBFi

Harrison Chase@hwchase17

1 mo

Really enjoyed reading "What We Learned from a Year of Building with LLMs" https://t.co/81inhIy1Ml ⭐️One key part of it about evals that stuck out at me: the importance of pairwise comparisons This doesn't mean scoring two models individually and then comparing the scores.… https://t.co/GDAvjzmER2

PostgresML@postgresml

1 mo

Building LLM infrastructure presents a series of tradeoffs that aren't obvious at the outset, even for seasoned teams. This is our journey to high-performance LLMs at scale. https://t.co/i4u8QN0Ppx

Muratcan Koylan@youraimarketer

1 mo

What We Learned from a Year of Building with LLMs Building effective AI products is still a complex task. O’Reilly recently shared a report that offers crucial insights and best practices for developing successful applications using LLMs. This report is actually based on the… https://t.co/mcIG2YmQw6

Grigor Khachatryan@grigorkh

1 mo

LLMs are all the buzz in language processing, transforming how we communicate online 🗣️ From chatbots to content creation, these models are reshaping digital interactions. Dive into the world of LLMs and uncover their impact on AI language applications! #LLMs #AI #NLP

Giskard@giskard_ai

1 mo

Is your business using #LLMs for applications like #chatbots, but unsure about their reliability? 👇 Evaluating LLMs is crucial for detecting issues like hallucinations, mitigating bias, enhancing safety, protecting sensitive data, and optimizing performance. Without proper… https://t.co/sFmPeTyxWE

Techmeme@Techmeme

1 mo

Six AI/ML experts detail what they learned from building real-world applications on top of LLMs over the past year, including common pitfalls around prompting (O'Reilly Media) https://t.co/eTD6WEbfFg 📫 Subscribe: https://t.co/OyWeKSRpIM https://t.co/94V0BmbmCv

Rechat@rechathq

1 mo

Thrilled to see @emollick discussing the complexities of working with LLMs. Rechat AI Advisor @HamelHusain, along with his co-authors, shares invaluable lessons in this insightful article. Learn how Rechat's AI copilot, Lucy, leverages LLMs to innovate real estate. https://t.co/19njXpGFd0

Jim Fan@DrJimFan

1 mo

A great complementary benchmark to LMSys Arena - private, clean, trusted third party evaluation of public models. Scale AI is doing an outstanding community service here. 👏 https://t.co/5TUW0oK2h2

Alex Volkov (Thursd/AI)@altryne

1 mo

As predicted, scale is entering the LLM eval game, but with a private (read: non trainable on) evals for frontier models! This is great, very trusted resource, in addition to LMSys, Reddit Vibes, X shitpoasting and broken open evals. Will cover tomorrow on @thursdai_pod ! https://t.co/cp2a4FrHTL

Andrej Karpathy@karpathy

1 mo

Nice, a serious contender to @lmsysorg in evaluating LLMs has entered the chat. LLM evals are improving, but not so long ago their state was very bleak, with qualitative experience very often disagreeing with quantitative rankings. This is because good evals are very difficult… https://t.co/EEqCegELOl https://t.co/2TecSoIRyt

Similar Stories

Scale AI Enters LLM Evaluation Game with Private Evaluations, Praised for Community Service

Similar Stories

Sources

Scale AI Enters LLM Evaluation Game with Private Evaluations, Praised for Community Service