Claude 3 Opus Surpasses GPT-4 Turbo, Command R+ Rises

Congratulations to @aidangomez and @cohere for this amazing breakthrough! On the side, our Gemma IT team also pushed our model thanks to the feedback from the open community. Great day for open models! https://t.co/FkqFl4RIK3

lmsys.org@lmsysorg

3 mo

Congrats @openai on the new GPT-4 Turbo launch🔥 The model is now in the Arena! Come challenge it with your toughest prompts🧩 https://t.co/J9eq5ECGhD https://t.co/58sX5Jflka

Alex Volkov (Thursd/AI)@altryne

3 mo

Is this the first open weights model to beat GPT4 not on some "narrow unique task" in benchmarks, but on real world user preference? Damn, just a few weeks ago GPT4 was toppled for the first time from the top and now is beaten by open weights models 😮 Go @cohere ! https://t.co/boJK1KSzAf

Alex Kolicich@AlexKolicich

3 mo

Hearing more and more that Claude 3 Opus is outperforming GPT-4 for most tasks & that GPT-4 performance has been degrading. Lots of devs switching. Common themes: Claude answers more questions (less "I can't answer that") Claude is better at code-gen https://t.co/8pXU7GrIZo

Alex Kolicich@AlexKolicich

3 mo

Hearing more and more that Claude 3 Opus is outperforming GPT-4 for most tasks & that GPT-4 performance has been degrading. Lots of devs switching. Common themes: Claude answers more questions. (less "I can't answer that") Claude is better at code-gen https://t.co/8pXU7GrIZo

Towards Data Science@TDataScience

3 mo

Exploring robust evaluation methods for RAG workflows, @aparnadhinak looks at the performance of GPT-4, Claude 2.1, and Claude 3.0 Opus, and zooms in on their performance around the generation part of the pipeline. https://t.co/8xSKnYNsvZ

LangChain@LangChainAI

3 mo

LangSmith Evaluations With the rapid pace of AI, developers are often faced with a paradox of choice: how to choose the right prompt, how to trade-off LLM quality vs cost? Evaluations can accelerate development with structured process for making these decisions. But, we've… https://t.co/FfSNqPUwdV

Esin Durmus@esindurmusnlp

3 mo

Our latest study measures how persuasive language models like Claude are compared to humans. We find a general scaling trend: newer models tend to be more persuasive, with Claude 3 Opus generating arguments that don't differ statistically from human-written ones. https://t.co/ftKiWEVe4b

Bindu Reddy@bindureddy

3 mo

Now that we have about two dozen LLMs in the market, here are the dimensions that matter when it comes to using them. Reasoning - Claude 3 Opus beats everything out Code - GPT-4 is still king here Cost - Claude Haiku is your best bet Latency - Claude or a local open-source…

Borriss@_Borriss_

3 mo

If GPT-5 is substantially better than - for example - Claude 3, how will one be able to tell.. This will get harder and harder to distinguish for "normal folks".

Lewis Tunstall@_lewtun

3 mo

Cohere beating Meta and Mistral to GPT-4 performance with an open weights model was not in my LLM bingo card - huge congrats to the team for making this tech widely accessible 🔥! https://t.co/yU7t2v0wlF

Thomas Wolf@Thom_Wolf

3 mo

If you didn’t follow all, the situation has dramatically changed on the arena of LLMs recently: - @AnthropicAI’s Claude 3 opus is now the undefeated winner of all closed-source models (just look at this win-rate line!) - @cohere Command R+ is the new super strong leader of… https://t.co/rGkMsnuifw

Sebastian Ruder@seb_ruder

3 mo

🤖 Excited that Command R+ is the top open-weights model on Chatbot Arena! 🛠️🗺️ This doesn't even assess RAG, tool use, and multilingual capabilities where ⌘R+ does well. 🛝You can try out ⌘R+ on the playground (https://t.co/zaofovCC1K) or in the🤗 Space… https://t.co/bZ0MXweu8i

Lewis Tunstall@_lewtun

3 mo

Cohere beating Meta and Mistral to GPT-4 performance with an open weights model was not in my LLM bingo - huge congrats to the team for making this tech widely accessible 🔥! https://t.co/yU7t2v0wlF

Matthew Carrigan@carrigmat

3 mo

This is legitimately historic for AI: We now have an open model that outperforms the original GPT-4, both 0314 and 0613. A phenomenal achievement from @cohere https://t.co/VfmiLkja6w

Jeremy Nguyen ✍🏼 🚢@JeremyNguyenPhD

3 mo

GPT-4 vs Claude 3 Opus reasoning ability: Without running the prompt below —which model do you think will do better, GPT-4 or Claude Opus? https://t.co/XrOkHLgma7

Nils Reimers@Nils_Reimers

3 mo

There are 𝐭𝐰𝐨 benchmarks that I trust for LLMs: - Your own evals 🔍 - Chatbot Arena 🤖 (users do a blind A/B test) Amazing to see that Command R+ from @cohere is the first open-weights model outperforming GPT-4. And this is not yet testing RAG & tool use. https://t.co/aUfHCcUznr

Ø ClustAI@ClustAI

3 mo

🚀wow! congrats @cohere!! Command R+ by @cohere has surged to 6th place, matching GPT-4-0314 with over 13K human votes! Huge congrats to @cohere for leading the pack with the best open model https://t.co/f1ZW3vpHg4 #models #llms #gpus #nvidia 4090 https://t.co/jhEVCPYW4Z

Anand Sanwal@asanwal

3 mo

Feel like Claude is now better than chatGPT Higher fidelity responses esp as a brainstorming partner Anyone else seeing similar? Or have something you feel is even better?

Anand Sanwal@asanwal

3 mo

Feel like Claude is now better than chatGPT Better, more high fidelity responses esp as a brainstorming partner Anyone else seeing similar? Or have something you feel is even better?

Clémentine Fourrier 🍊@clefourrier

3 mo

Did you know that Command R+ is on the Open LLM Leaderboard? It's notably got very good scores on MMLU and GSM8K! Congrats @upstage on the cool model :) https://t.co/BcMLOEjLxn

Maxime Voisin@maximevoisin_ai

3 mo

Congrats to the team for building the best open weights model ❤️ Plus these benchmarks don't even measure RAG & Tools performance -- where Command R/R+ shine. Excited about what's coming next. https://t.co/ekL0dgvyyk

lmsys.org@lmsysorg

3 mo

Exciting news - the latest Arena result are out! @cohere's Command R+ has climbed to the 6th spot, matching GPT-4-0314 level by 13K+ human votes! It's undoubtedly the **best** open model on the leaderboard now🔥 Big congrats to @cohere's incredible work & valuable contribution… https://t.co/5PzpPolC9F

Mario Klingemann💧💦@quasimondo

3 mo

Just after a few tests I get the impression that when it comes to writing creative code Claude Opus beats GPT-4 by lengths without even trying too hard. It only gets ruined by @AnthropicAI's content filtering policy which seems to be operated by coin flips.

Nofil@lostlifon

3 mo

its unbelievable how bad gpt4 is right now holy shit. Just tried something for a client that gpt4 failed miserably at and both @AnthropicAI claude haiku and @cohere r+ worked perfectly for. also the cohere playground is stunning weoow

Matt Shumer@mattshumer_

3 mo

Claude 3 Opus feels like the first model I can really hand off tasks to without writing a really detailed spec. GPT-4 was always quite capable but even for simple tasks, you really needed to poke and prod it. Not so with Claude.

Rohan Paul@rohanpaul_ai

3 mo

" CLAUDE -3-OPUS significantly outperforms all closed-source LLMs, while the open-source M IXTRAL is on par with GPT-3.5-TURBO." Paper - https://t.co/fTFIP6frYf https://t.co/l3XPB4nrQM

Rohan Paul@rohanpaul_ai

3 mo

Claude 3 Opus Destroys Other Models at long-context text Summarization 🔥 📌 This Princeton University paper found that Opus seems to be almost twice as good compared to the second best model (GPT-4 Turbo) at book or long-context summarization. It's much more faithful and… https://t.co/LMnfKy6ZDl

Sully@SullyOmarr

3 mo

OpenAI really needs to drop a haiku or sonnet competitor The tools api, json mode, community libs and overall dev experience for OpenAI is so much better than Claude atm But it makes 0 sense to use gpt4 for anything right now, it’s way too slow and expensive for most use cases

Similar Stories

Claude 3 Opus Surpasses GPT-4 Turbo, Command R+ Rises in AI Rankings

Similar Stories

Sources

Claude 3 Opus Surpasses GPT-4 Turbo, Command R+ Rises in AI Rankings