Cognition Labs released a technical report on Devin's 13% score on SWE-bench benchmarks, which increased to 23% with prewritten unit tests. They claim their approach surpasses code snippets, challenging Microsoft and OpenAgents on SWE-bench ladder.
Providing Devin with prewritten unit tests, I.e test driven development, boosts its performance from ~13% to 23% on SWEBench tests. This is incredibly impressive. https://t.co/pdpvNFzd6k https://t.co/bQlV6n1JyE
Cognition Labs just released their report explaining Devins 13% score on the SWEBench benchmarks. Mandatory reading if you are interested in AI writing code. https://t.co/bQlV6n1JyE
Why did they benchmark against HumanEval They say their approach goes beyond code snippets HumanEval is all about code snippets Test this on swe-bench! Microsoft vs Devin vs OpenAgents Let's race up the swe-bench ladder ๐ https://t.co/a4DH01KtCs
Excited to share more details on our state of the art SWE-bench result. We have some analysis of the results plus examples of the code edits Devin made. Check them out! https://t.co/Flw1EJMj7S
Weโre sharing our technical report for Devinโs results on SWE-bench: https://t.co/9Aoz0MLx7x Highlights in ๐งต https://t.co/8nI7cqsmSY