SnorkelAI Sets State-of-the-Art on AlpacaEval 2.0 with

Get your hands on the new 7B model that put @SnorkelAI SOTA on AlpacaEval 2.0! This work is foundational to programmatic alignment support in Snorkel Flow for steerable LLMs w/out manual preference annotations. Available for download, sandbox, or API calls at:… https://t.co/SSkPvdS47o

Snorkel AI@SnorkelAI

5 mo

Get your hands on the new 7B model that put @SnorkelAI SOTA on AlpacaEval 2.0! This work is foundational to programmatic alignment support in Snorkel Flow for steerable LLMs w/out manual preference annotations. Download, sandbox, or API calls https://t.co/NcXJMG1MjP https://t.co/At2vGmBesh

Teortaxes▶️@teortaxesTex

5 mo

Length-penalized AlpacaEval 2.0, with ranking deltas vs the vanilla leaderboard. https://t.co/Kbf9T5ozJ7

Alex Ratner@ajratner

5 mo

Way to go @HoangTranDV @chris_m_glaze !! State of the art on AlpacaEval 2.0- showing again that smaller model + better data wins! More exciting: pointing the way for our new *programmatic alignment* support in @SnorkelAI for steerable LLMs w/out manual preference annotations https://t.co/uraodhR8A5

Teortaxes▶️@teortaxesTex

5 mo

Apropos of nothing, I've done the dumbest correction to AlpacaEval 2.0 one could think of: "normalized" by response verbosity. Win Rate*((avg length [GPT-4 Turbo; GPT 3.5])/$length) I think the new list (right) makes more sense, but we'd better correct for all biases seriously. https://t.co/YBxT9Y05HU

Braden Hancock@bradenjhancock

5 mo

Fresh result out of @SnorkelAI Research: you can top the AlpacaEval 2.0 LLM leaderboard (under the judge, GPT-4 Turbo, and over GPT-4, Gemini Pro, Claude 2, Llama 2, Mixtral, etc.) with only a 7B model if you align it right! We use a small reward model (0.4B) to curate training…

Similar Stories

SnorkelAI Sets State-of-the-Art on AlpacaEval 2.0 with 7B Model, Credits HoangTranDV, Chris M. Glaze

Similar Stories

Sources

SnorkelAI Sets State-of-the-Art on AlpacaEval 2.0 with 7B Model, Credits HoangTranDV, Chris M. Glaze