Search
News Chat
Login
Search
Top
For You
Business
cbdcs
Crypto
Culture
Environment
Politics
Science
Sports
Tech
Video Games
World
HumanEval News
Top stories
Google Launches High-Performing Gemma 2 Series with 9B and 27B Parameter Models, Excelling in Benchmarks
Authors
48
3 days
AI
Business
Tech
BigCodeBench Introduced by Terry Yue Zhuo to Evaluate LLMs on Realistic Coding Tasks with 50% Success Rate
Authors
11
12 days
AI
Tech
Prediction markets for HumanEval
Prediction markets for HumanEval
xAI Grok will beat OpenAI's flagship model on HumanEval benchmarks by the end of 2024.
Nov 24, 4:13 PM
Jan 1, 4:59 AM
10.32%
chance
89
3965
Option
Votes
YES
NO
2330
752
What will be true of Grok-2?
Mar 29, 7:34 PM
Jan 1, 7:59 AM
24
2100
Will Grok 2 'exceed current [March 28 2024] AI on all metrics'?
Mar 29, 11:21 AM
Jan 1, 4:59 AM
34.96%
chance
111
23825
Option
Votes
NO
YES
1464
481
Will a new lab create a top-performing AI frontier model before 2028?
Jun 14, 4:48 PM
Jan 1, 10:59 PM
55%
chance
36
4857
Option
Votes
NO
YES
1106
905
Will Google Gemini perform better (text) than GPT-4?
Sep 24, 7:03 PM
Dec 31, 11:59 PM
32.35%
chance
43
5052
Option
Votes
YES
NO
1400
355
Will LLaMA-3 be on par with or better than GPT-4?
Nov 16, 11:34 PM
Jan 1, 7:59 AM
81.15%
chance
39
5047
Option
Votes
NO
YES
1639
693
Is GPT-4 (0613) more capable than GPT-4 (0314)?
Nov 9, 12:32 PM
Jul 1, 4:59 AM
71.47%
chance
5
108
Option
Votes
NO
YES
168
105
Is GPT-4-turbo (1106-preview) more capable than GPT-4 (0613)?
Nov 9, 12:29 PM
Jul 1, 4:59 AM
78.2%
chance
8
312
Option
Votes
NO
YES
326
121
Articles
Latest stories
Google Launches High-Performing Gemma 2 Series with 9B and 27B Parameter Models, Excelling in Benchmarks
Authors
48
3 days
AI
Business
Tech
BigCodeBench Introduced by Terry Yue Zhuo to Evaluate LLMs on Realistic Coding Tasks with 50% Success Rate
Authors
11
12 days
AI
Tech
Mistral AI Launches Codestral-22B, an Open-Weight Generative AI Model for Coding in 80 Languages
Authors
70
1 month
AI
Tech
OpenAI Releases GPT-4 Turbo with Vision Support, Scores 88.2% on HumanEval
Authors
5
2 months
AI
Tech
Google Launches Gemini, Beats GPT-4 in Benchmarks, Introduces $19.99/Month Advanced Subscription
Authors
102
5 months
AI
Tech
Google to Release Gemini Ultra Tomorrow, Competitor to GPT-4, Outperforming in Benchmarks, Available via Paid Subscription
Authors
8
5 months
AI
Tech
Meta's Code Llama 70B Scores 67.8, Tops GPT-4, Open-Source on HuggingChat
Authors
21
5 months
AI
Tech
OpenChat-3.5-1210 Surpasses ChatGPT and Grok Models with Near 15 Point Increase in HumanEval
Authors
11
6 months
AI
Tech
Previous
Next