HumanEval News

Google Launches High-Performing Gemma 2 Series with 9B and 27B Parameter Models, Excelling in Benchmarks
Authors
48
3 days
AI
Business
Tech
BigCodeBench Introduced by Terry Yue Zhuo to Evaluate LLMs on Realistic Coding Tasks with 50% Success Rate
Authors
11
12 days
AI
Tech

xAI Grok will beat OpenAI's flagship model on HumanEval benchmarks by the end of 2024.

Nov 24, 4:13 PMJan 1, 4:59 AM

10.32%chance

893965

OptionVotes

YES

2330

752

What will be true of Grok-2?

Mar 29, 7:34 PMJan 1, 7:59 AM

242100

Will Grok 2 'exceed current [March 28 2024] AI on all metrics'?

Mar 29, 11:21 AMJan 1, 4:59 AM

34.96%chance

11123825

OptionVotes

YES

1464

481

Will a new lab create a top-performing AI frontier model before 2028?

Jun 14, 4:48 PMJan 1, 10:59 PM

55%chance

364857

OptionVotes

YES

1106

905

Will Google Gemini perform better (text) than GPT-4?

Sep 24, 7:03 PMDec 31, 11:59 PM

32.35%chance

435052

OptionVotes

YES

1400

355

Will LLaMA-3 be on par with or better than GPT-4?

Nov 16, 11:34 PMJan 1, 7:59 AM

81.15%chance

395047

OptionVotes

YES

1639

693

Is GPT-4 (0613) more capable than GPT-4 (0314)?

Nov 9, 12:32 PMJul 1, 4:59 AM

71.47%chance

5108

OptionVotes

YES

168

105

Is GPT-4-turbo (1106-preview) more capable than GPT-4 (0613)?

Nov 9, 12:29 PMJul 1, 4:59 AM

78.2%chance

8312

OptionVotes

YES

326

121

Latest stories

Google Launches High-Performing Gemma 2 Series with 9B and 27B Parameter Models, Excelling in Benchmarks
Authors
48
3 days
AI
Business
Tech
BigCodeBench Introduced by Terry Yue Zhuo to Evaluate LLMs on Realistic Coding Tasks with 50% Success Rate
Authors
11
12 days
AI
Tech
Mistral AI Launches Codestral-22B, an Open-Weight Generative AI Model for Coding in 80 Languages
Authors
70
1 month
AI
Tech
OpenAI Releases GPT-4 Turbo with Vision Support, Scores 88.2% on HumanEval
Authors
5
2 months
AI
Tech
Google Launches Gemini, Beats GPT-4 in Benchmarks, Introduces $19.99/Month Advanced Subscription
Authors
102
5 months
AI
Tech
Google to Release Gemini Ultra Tomorrow, Competitor to GPT-4, Outperforming in Benchmarks, Available via Paid Subscription
Authors
8
5 months
AI
Tech
Meta's Code Llama 70B Scores 67.8, Tops GPT-4, Open-Source on HuggingChat
Authors
21
5 months
AI
Tech
OpenChat-3.5-1210 Surpasses ChatGPT and Grok Models with Near 15 Point Increase in HumanEval
Authors
11
6 months
AI
Tech

HumanEval News

Top stories

Prediction markets for HumanEval

Prediction markets for HumanEval

Prediction markets for HumanEval

xAI Grok will beat OpenAI's flagship model on HumanEval benchmarks by the end of 2024.

What will be true of Grok-2?

Will Grok 2 'exceed current [March 28 2024] AI on all metrics'?

Will a new lab create a top-performing AI frontier model before 2028?

Will Google Gemini perform better (text) than GPT-4?

Will LLaMA-3 be on par with or better than GPT-4?

Is GPT-4 (0613) more capable than GPT-4 (0314)?

Is GPT-4-turbo (1106-preview) more capable than GPT-4 (0613)?

Latest stories

Top stories

Prediction markets for HumanEval

Prediction markets for HumanEval

Prediction markets for HumanEval

xAI Grok will beat OpenAI's flagship model on HumanEval benchmarks by the end of 2024.

What will be true of Grok-2?

Will Grok 2 'exceed current [March 28 2024] AI on all metrics'?

Will a new lab create a top-performing AI frontier model before 2028?

Will Google Gemini perform better (text) than GPT-4?

Will LLaMA-3 be on par with or better than GPT-4?

Is GPT-4 (0613) more capable than GPT-4 (0314)?

Is GPT-4-turbo (1106-preview) more capable than GPT-4 (0613)?

Latest stories