Princeton Researchers Develop SWE-agent, Close to Devi

Gregory Wieber@dreamwieber

3 mo

How LLMs are trained https://t.co/1wx0nP4ZBL

Harry Tormey 🇮🇪 | 🇺🇸| 🇺🇦@htormey

3 mo

One of the tasks from SWE-bench, the benchmark used to assess AI agents like Devin, contains the line it needs to change in order to fix the bug. This is why it's important to read the data used for benchmarking. You can browse a list of these issues here https://t.co/ABBYWBL66W https://t.co/4Yy94rF3mx

Jim Fan@DrJimFan

3 mo

The moat of software AI agents is not the thin wrapper layer (Devin, SWE-Agent), but the underlying LLM. Instead of benchmarking the wrapper, I think SWE-Bench is excellent for evaluating coding LLMs instead: Hold the agent layer fixed and vary only the LLM backend. Provide all… https://t.co/uublPJfm3f

Aleksa Gordić 🍿🤖@gordic_aleksa

3 mo

Hector Liu from @llm360 talking about how to pretrain LLMs from scratch - joining us in 2.5 hours in my server! link: https://t.co/C21orV2hzx don't miss it! https://t.co/YuwNpH9SEh

ARK Invest@ARKInvest

3 mo

Less than a month after Cognition Labs released Devin, an AI coding agent that apparently solves software bugs better than any prior agent, researchers at Princeton have released SWE-agent, which scored 12.29%—nearly as high as Devin’s 13.84%. https://t.co/1LqOpEAIYi

Vani-Veena@vanivina9

3 mo

“The fusion of #Human ingenuity and the #Computational prowess of LLMs heralds a new era of functionality” #FutureWithAI🪩 https://t.co/37VzKHUSaA

Bindu Reddy@bindureddy

3 mo

Considering a new idea for LLM benchmarking Given that benchmarks can be beaten by training on test We are thinking of setting up a service where folks can submit their LLMs and humans combined with AI can test the LLM The more votes the LLMs wins the higher it floats up the…

Aydyn Tairov@tairov

3 mo

A wave of pure implementations of LLM trainings is coming 🤓 https://t.co/qCOaYgnDEh

Similar Stories

Princeton Researchers Develop SWE-agent, Close to Devin's Performance in Coding LLM Evaluation

Similar Stories

Sources

Princeton Researchers Develop SWE-agent, Close to Devin's Performance in Coding LLM Evaluation