Recent discussions among AI influencers and users on Twitter suggest that Google's Bard, potentially undergoing stealth testing as 'Ultra', is showing improved performance in coding tasks, even being compared to GPT-4. However, critics argue that Bard's use of Google's search API and RAG (Retrieval-Augmented Generation) gives it an unfair advantage over competitors like GPT-4, Mistral Medium, and Claude, which do not utilize search in their operations. The debate centers around whether Bard's integration of a live crawler and index, referred to as 'Gemini Pro', should be compared to the base model weights of other AI models, as well as to other tools like Copilot and Perplexity. Some users challenge the validity of the comparisons, citing Bard's access to recent information through search as a significant advantage that skews results. Despite this, there are acknowledgments of Bard's accuracy improvements, especially after corrections to previously incorrect facts, and its text-to-speech capabilities are rated favorably.
Gotta give credit where it's due. Bard results for my "what's up in UFC?" query -- can be critiqued for boring format, but not of inaccuracy. After inserting random incorrect facts, that has appeared to be fixed. Better blending of LLM and Search. Text to speech 8/10 also ๐ https://t.co/ihiykL436b
This is a misleading result. Bard uses RAG (access to Google search) here while the other top competitors like GPT-4, Mistral Medium and Claude are not using search. Access to recent information through search is an advantage that needs to be controlled for or at least pointedโฆ https://t.co/ugQ1x7LMfI
Isn't the Bard API also doing RAG with web search there? Sounds pretty unfair comparison, if that is the case. Would need to be compared to Copilot, Perplexity, etc. https://t.co/kghUvt1NTy
Not sure if that's a fair comparison when bard is using search API while GPT-4 and other models are not (example below). The baremetal Gemini Pro API seems to be in between Mixtral 8*7B and GPT-3.5. So the key difference is search that greatly improves human preference? https://t.co/2TlebUnJuo https://t.co/uhpTR96K41
I love Chatbot Arena, but this is crazy. "GPT-4-Turbo" is a bunch of model weights. "Bard (Gemini Pro)" is the Google crawler/index + Gemini There's no way on earth a live crawler should be compared to base model weights. If anything, it's actually crazy that GPT-4 disconnectedโฆ https://t.co/7Xr3pWxgaU
I love Chatbot Arena, but this is crazy. "GPT-4-Turbo" is a bunch of model weights. "Bard (Gemini Pro)" is the Google crawler/index + Bard There's no way on earth a live crawler should be compared to base model weights. If anything, it's actually crazy that GPT-4 disconnectedโฆ https://t.co/7Xr3pWxgaU
I challenge everyone who is sharing that Bard/Gemini is reaching the same level as GPT-4 to test Bard, Gemini. Real testing. Difficult problems. Stop being influenced by tweets. The competition is still far away. Very far away.
Google's Bard shines on third party metrics -- and AI influencers quick to chime in their praise. Does not pass my "UFC test" though -- still makes stuff up, gets dates, facts and records wrong ๐คทโโ๏ธ https://t.co/jrynAFeSbf https://t.co/NJnLsNF1M4
You are wrapping gpt-4 and calling it โbardโ, are you not? https://t.co/FORPYBN3DP https://t.co/WNksI8uUry
Someone on my feed last night was saying Bard was suddenly better at code than GPT-4. Google is doing some work over there apparently. Possible pre-release stealth testing of Ultra behind the mask of Bard? https://t.co/A1NSLaNEJU