Cognition Labs Report: Devin's SWE-bench Score Increas

Providing Devin with prewritten unit tests, I.e test driven development, boosts its performance from ~13% to 23% on SWEBench tests. This is incredibly impressive. https://t.co/pdpvNFzd6k https://t.co/bQlV6n1JyE

Harry Tormey 🇮🇪 | 🇺🇸| 🇺🇦@htormey

4 mo

Cognition Labs just released their report explaining Devins 13% score on the SWEBench benchmarks. Mandatory reading if you are interested in AI writing code. https://t.co/bQlV6n1JyE

OpenAgents ⚡@OpenAgentsInc

4 mo

Why did they benchmark against HumanEval They say their approach goes beyond code snippets HumanEval is all about code snippets Test this on swe-bench! Microsoft vs Devin vs OpenAgents Let's race up the swe-bench ladder 👍 https://t.co/a4DH01KtCs

Neal Wu@WuNeal

4 mo

Excited to share more details on our state of the art SWE-bench result. We have some analysis of the results plus examples of the code edits Devin made. Check them out! https://t.co/Flw1EJMj7S

Cognition@cognition_labs

4 mo

We’re sharing our technical report for Devin’s results on SWE-bench: https://t.co/9Aoz0MLx7x Highlights in 🧵 https://t.co/8nI7cqsmSY

Similar Stories

Cognition Labs Report: Devin's SWE-bench Score Increases to 23% with Unit Tests, Challenging Microsoft and OpenAgents

Similar Stories

Sources

Cognition Labs Report: Devin's SWE-bench Score Increases to 23% with Unit Tests, Challenging Microsoft and OpenAgents