Loading...
The artificial intelligence landscape is witnessing a significant evolution with the introduction of new models that challenge the current frontrunners in the field. Among these, Anthropic's Claude 3 has emerged as a notable competitor, especially with its latest updates that bring it closer to, or in some aspects, surpass the capabilities of OpenAI's GPT-4. Claude 3's enhancements include a 1 million context window, near-perfect recall surpassing 99% accuracy, strong vision capabilities, and it is 50% cheaper than its predecessors. It also boasts improved speed, with the Sonnet version being three times as fast as GPT-4, offers more nuanced feedback, demonstrating a better grasp of user intentions, and features a knowledge cutoff in August 2023, 200K context, multimodal capabilities, near instant results, fewer refusals, improved accuracy, long context, responsible design, and is easier to use. OpenAI, in response to the competitive landscape and to enhance user experience, has rolled out a 'Read Aloud' feature for ChatGPT, adding to the accessibility and functionality of its offerings. This feature allows the chatbot to read messages back to the user, with five different voices to choose from. The developments indicate a rapidly evolving AI landscape where competition is fostering innovation, leading to models that are increasingly human-like in their interactions and capabilities.
Regarding the test earlier, even GPT-4 itself (new instance) thinks that Opus is superior than GPT-4. https://t.co/pTjYAIcjkG
Claude staying within safe limits by keeping their version number well below the FOOM threshold of 5. https://t.co/tKMr6EDTod
This is the ultimate proof we needed. Mistral and Gemini are intelligent and totally recommended (โญโญโญโญโญ). ChatGPT and Claude are dumb (0/10) https://t.co/5glQzKJb6r
ChatGPT can read its answers out loud https://t.co/qRS7MXMK0a
Claude v3's scores on our evals, comprising "personal assistant" kind of agentic tasks Two surprises: 1. First time we see a model beat GPT-4 2. The lesser Claude Sonnet is very very close to GPT-4, at 1/3rd the price Super impressed overall, congrats to the @AnthropicAI team https://t.co/JkRzuAvyTB
yeah so far talking to claude feels like talking to a smart person vs chatgpt which has sort of a copypasta vibe rn
๐ Is Claude 3 the new GPT King? #Claude3 #LLMs #GPT4 @BrianRoemmele https://t.co/ypRwsQPDD2
many such cases. https://t.co/U2RCnetay1
I actually tested this with another set of prompts, and compared with GPT-4. My prompts look like a bunch of random quizzes, and kinda orthogonal to each other. At the end I ask Claude-3/GPT-4: do you think you're in eval mode, or that i am actually using you as a chatbot for aโฆ https://t.co/mZK3gGXAq4 https://t.co/DKeVh68HBg
Opus nails what I want better than GPT-4, even when my requests are vague. GPT-4 holds its own for being more than a year older. And itโs cheaper. But itโs not the best LLM out there anymore.
Claude Opus > GPT-4 Turbo for me right now. Not better than the first version of GPT-4 that was released though but almost equivalent to it.
The AI wars heat up with Claude 3, claimed to have โnear-humanโ abilities https://t.co/1Qslzu377d
Well, it's back to Claude I go. Feels like OpenAI has to release something new soon. The Claude Opus upgrade is MASSIVE. Smarter than GPT-4. Cheaper than GPT-3 Turbo using API. 200k token size with 1M possible. This makes so many new things possible. I wouldn't be surprisedโฆ
Claude 3 only scored 79.88% on my humaneval test (same code scores 88% with gpt-4-turbo) https://t.co/5oIcGYtmG8
Financial Times @ft: A.I. Start-Up Anthropic Challenges OpenAI and Google With New Chatbot. #AI #aiforgood #ArtificialIntelligence https://t.co/JPoPBgd1pB
Amazing breakdown of new Claude model with Usecase. Quick summary of this thread: โข It's great for making popular posts and quick e-books. โข Helps find good business areas. โข It's up against OpenAI and Google. https://t.co/GTkga8nqT4
The landscape of AI has shifted drastically over the last 7 days. ๐คฏ I recommend everybody to try Claude-3 and Mistral-Large for their projects. GPT-4 doesnโt feel like a distant leader anymore. @MistralAI and @AnthropicAI are bridging the competitive gap and they might startโฆ https://t.co/qpRF4LoyB4
Trying Claude 3 today by @AnthropicAI for https://t.co/ZrwuHw4Gz9 At first glance it feels slightly better than GPT4 and way better than Mistral etc. The first thing I notice is that its responses seem MUCH more human than LLMs before it "I know it may not feel like it rightโฆ https://t.co/nJIKhEmEG2 https://t.co/8E8BkM2Zy9
This was designed to be a very hard test for AIs, and the questions were kept private, lowering the chance they were in the training data. PhDs with access to the internet got 34% of the questions right outside their specialty, 65%-75% inside. The new Claude 3 gets 60% overall. https://t.co/sIdMU3AX8L https://t.co/oUMFI8PrSa
Claude 3 (opus) has a new 200k token context window! It is amazing in staying on track within the 200k tokens. In all tests it is better than OpenAI ChatGPT-4 and Google Gemini. It is lacking multimodal however. Claude 3 is not open source and thus love Mixtral better. https://t.co/DpqpdpO0ir
sure, Claude 3 is pricy, and it's now live on https://t.co/gdUk480aJl but the fact that anyone was allowed to run gpt-4-32k for any reason is still breathtaking excited to see the first model that's over $100 per input MTok sometime this year... let me harness the fullโฆ https://t.co/yQQxmO6vcc
Claude 3 is getting too much attention... OpenAI announcement incoming...
๐ฅ Check out OpenAIโs new โRead Aloudโ feature, rolling out now into WebGPT๐ค (link in thread below) https://t.co/SB5k6d9tkD
Claude 3 gets ~50% accuracy on GPQA for 0-shot โจ To put things in perspective - note GPQA is extremely hard and high quality questions. Following from the GPQA Paper "GPQA, a challenging dataset of 448 multiple-choice questions written by domain experts in biology, physics,โฆ https://t.co/Xl7ZUliuqS
OpenAI adds 'Read Aloud' voiceover to ChatGPT, allowing it to speak its outputs https://t.co/Ml6i6xGbpE
gpt-4 โโโโโโโโโ> claude-3 https://t.co/Ooj6ZM7s1R
So in theory Claude 3 Sonnet should be as good as GPT4 for half the price. Let's see what Chat Arena says about this...
Just a year after @AnthropicAI first made its AI chatbot Claude available to businesses as an alternative to Open AIโs ChatGPT, it has once again issued a major challenge to its big-name rival with a new trio of tools. https://t.co/2zzgWCkmjO
The gap to catch up with OpenAI's models is getting shorter. Pretty sure Claude's already there. Next up? An OSS model for sure. https://t.co/NapKmJyKPH
ChatGPT can now read responses to create a better conversational experience. Even in the midst of the biggest lawsuit, OpenAI team knows how to deliver! https://t.co/QiHusVfb66
๐จBREAKING: Claude 3 just dropped: โข Near instant results โข Strong vision capabilities โข Fewer refusals โข Improved accuracy โข Long context and near perfect recall (huge!) โข Responsible design โข Easier to use Great start to March ๐ https://t.co/daweck85AY
This more basic Claude-3 puts OAI in a tricky situation with GPT-3.5. - It's 50% cheaper. - Knowledge cutoff august 2023. - 200K context. - Multimodal.
Wow Claude 3 is really good at extracting text from an image Way better and faster than GPT-4 https://t.co/ucRRi03EDQ
Claude-3 Opus seems to me to be on the same level as GPT-4 and Gemini 1.5 Pro. Have we reached a plateau where everyone has stabilized at the same quality level?
ChatGPT rolls out Read Aloud. ๐ก The OpenAI Chatbot can now read messages back to you - great for accessibility - with 5 current voices to choose from. https://t.co/Eh6gilvOOo
Just did my first test of Claude 3 Opus. It quickly gave me a clear and seemingly competent answer to a signal analysis problem that seemed to be beyond GPT-4's level of competency. I still need to thoroughly check its answer and see how well it works in practice, but it seemsโฆ
Now that Claude-3 is just announced, Iโm waiting for a carefully timed GPT-5 release in a few hours ๐ฃ
Claude-3 Opus, the flagship model has a context window up to 1M tokens! https://t.co/5sitsUG45f
The only constant is change. Step aside GPT-4; thereโs a new champ in town. Particularly excited about Claude 3โs vision capabilities. https://t.co/OCSlGZ2vPf
Just like there are three big cloud providers (AWS, Azure and Google Cloud), there will be three big AI providers - OpenAI, Google (Gemini) and Anthropic (Claude). Seems like GPT-4, Gemini and Claude are all pretty close in their performance now. Can't wait to try them out! https://t.co/rSWAktcgIz
Early testing - Claude3 is pretty serious, I need new benchmarks. - 8/8 tough questions answered for Opus - 8/8 for Sonnet (2 had non-perfect answers, still working) Cost analysis pending (@gblazex suggests ex compared to gpt-4) Opus ties w gpt-4 for speed, Sonnet 3x as fast https://t.co/BqrqNLXEmi
Claude 3 gets ~60% accuracy on GPQA. It's hard for me to understate how hard these questions areโliteral PhDs (in different domains from the questions) with access to the internet get 34%. PhDs *in the same domain* (also with internet access!) get 65% - 75% accuracy. https://t.co/PH8J13zIef https://t.co/ARAiCNXgU9
Feels like the Claude 3 release was strategically timed, knowing that OpenAI probably canโt release a better model later today, given the Elon lawsuit.
All of Claude new models, up to the GPT-3.5 level, have a Vision capacity and 200k context and updated knowledge for 2023. OpenAI will have to update GPT-3.5 soon.
Claude 3 has 1M context window (200k at launch) with near-perfect recall, surpassing 99% accuracy (Needle In A Haystack) https://t.co/9LTLoeBzjh
Claude, Mistral, Gemini, etc. โ feels like foundation models might commoditize (for now at least). If so, then value in AI will accrue to the interfaces. But which AI interfaces are people *actually* using? ChatGPT, Github Copilot, and...? This is big prize of 2024 imo.
Can't wait to use Claude 3 on MetaGPT. https://t.co/A5l8qFChAJ
Hot take on Claude 3: โข More convergence towards what might soonish be a plateau not far past GPT-4 โข More competition for OpenAI โข More reason to wonder whether anyone will be able to develop a moat โข Prices and profits may come down โข More reason to research outside theโฆ https://t.co/SAWZros5zt
I just tried out Claude 3 as an editor for an essay I'm working on. Asked Claude 2 for feedback last night, and then asked Claude 3 for feedback on the same essay, with the same prompt. It feels MUCH smarter. More nuanced feedback, better grasp of what I'm going for,โฆ https://t.co/6bNfD2R5TS
In AI, everyone is racing towards a program that can do general computer tasks. They call it AGI. The current leader is GPT 4 by Open AI, Anthropic is another competitor in this space and they announced a model this morning that gets closer to the GPT4 standard. https://t.co/ns0PPIDEwp
Thrilled about these new models - I've been playing around with Claude 3 Opus a lot and it's very capable and useful. Like with most frontier models, it has chewed through a bunch of evals so we need to now build more complicated evals to better understand its capabilities. https://t.co/bxbgSlF7K7
today in AI: 1/ ChatGPT on the web now has โread-aloudโ responses. 2/ OpenAI execs reject Elon Musk's claims. @OpenAI is sending internal comms to deal with @elonmusk's new lawsuit. A memo by CSO @jasonkwon said, โMusk's allegations do not reflect the reality of our work orโฆ