AI Experts Question Bard's Search Advantage in ELO Ran

Mystery solved - Bard has access to the web while the rest of the services don’t! This gives it a inherent advantage https://t.co/UanRrCKmiB

Hrishi@hrishioa

5 mo

Bard is definitely winning in the ELO rankings because it can do web searches, and multiple ones at the same time. It's at least contributing. Can't seem to access this functionality via the API. @lmsysorg how are you guys doing it? https://t.co/xkPA0aQR7G

Dominick Romano@dromanocpm

5 mo

Google’s propaganda machine is in full effect. I’ve used Gemini Pro a lot in the last week for testing, I asked it to explain a complex manifold, that was an interesting answer. I can even make the 2 talk to each other, even GPT3.5 Turbo is better. And there is so much pro-Google… https://t.co/t0rsdIASJl

Igor Babuschkin@ibab_ml

5 mo

This is a misleading result. Bard uses RAG (access to Google search) here while the other top competitors like GPT-4, Mistral Medium and Claude are not using search. Access to recent information through search is an advantage that needs to be controlled for or at least pointed… https://t.co/ugQ1x7LMfI

Omar Sanseviero@osanseviero

5 mo

So...can I submit search-powered Mixtral to lmsys as well? If the new Bard ranking has access to search, this is now an uneven comparison 🤔

Philipp Singer@ph_singer

5 mo

Isn't the Bard API also doing RAG with web search there? Sounds pretty unfair comparison, if that is the case. Would need to be compared to Copilot, Perplexity, etc. https://t.co/kghUvt1NTy

Banghua Zhu@BanghuaZ

5 mo

Not sure if that's a fair comparison when bard is using search API while GPT-4 and other models are not (example below). The baremetal Gemini Pro API seems to be in between Mixtral 8*7B and GPT-3.5. So the key difference is search that greatly improves human preference? https://t.co/2TlebUnJuo https://t.co/uhpTR96K41

Jeremy Nguyen ✍🏼 🚢@JeremyNguyenPhD

5 mo

Everyone's talking about Gemini outperforming older GPT-4 models on benchmarks —but are you actually USING Gemini over GPT-4 for your work? I ran 1k+ API calls to both models last week and, for my task, Gemini wasn't even close. How does it do for your work? https://t.co/H0ufb9V481

Ilya Abyzov@IlyaAbyzov

5 mo

I love Chatbot Arena, but this is crazy. "GPT-4-Turbo" is a bunch of model weights. "Bard (Gemini Pro)" is the Google crawler/index + Gemini There's no way on earth a live crawler should be compared to base model weights. If anything, it's actually crazy that GPT-4 disconnected… https://t.co/7Xr3pWxgaU

Ilya Abyzov@IlyaAbyzov

5 mo

I love Chatbot Arena, but this is crazy. "GPT-4-Turbo" is a bunch of model weights. "Bard (Gemini Pro)" is the Google crawler/index + Bard There's no way on earth a live crawler should be compared to base model weights. If anything, it's actually crazy that GPT-4 disconnected… https://t.co/7Xr3pWxgaU

Similar Stories

AI Experts Question Bard's Search Advantage in ELO Rankings Against GPT-4

Similar Stories

Sources

AI Experts Question Bard's Search Advantage in ELO Rankings Against GPT-4