Concerns have been raised regarding the fairness of the ELO rankings in which Bard, an AI model, is currently leading. Critics point out that Bard's ability to perform web searches gives it an inherent advantage over other top competitors such as GPT-4, Mistral Medium, and Claude, which do not have this capability. The issue lies in the fact that Bard's access to recent information through search is not being controlled for in the rankings, leading to calls for a clearer distinction between AI models with and without web access. The functionality to perform multiple web searches, which is contributing to Bard's success, is reportedly not accessible via the API, prompting questions about how this feature is being utilized.
If Bard is higher in the @lmsysorg Arena benchmark because of web RAG then obviously itโs a flawed comparison. ๐ to ๐ https://t.co/vll8auovbM
Bard is the only model in this list with Internet access, which the table should make clearer As @appenz aptly put it, open book vs closed book exams should be graded differently https://t.co/cvKJGZOOSW
Mystery solved - Bard has access to the web while the rest of the services donโt! This gives it a inherent advantage https://t.co/UanRrCKmiB
Bard is definitely winning in the ELO rankings because it can do web searches, and multiple ones at the same time. It's at least contributing. Can't seem to access this functionality via the API. @lmsysorg how are you guys doing it? https://t.co/xkPA0aQR7G
This is a misleading result. Bard uses RAG (access to Google search) here while the other top competitors like GPT-4, Mistral Medium and Claude are not using search. Access to recent information through search is an advantage that needs to be controlled for or at least pointedโฆ https://t.co/ugQ1x7LMfI