
Introduction
- This Devin alternative scores 12.3% on the FULL swe benchmark
- "An open source Devin getting 12.29% on 100% of the SWE Bench test set vs Devin's 13.84% on 25% of the test set!"
- SWE-agent works by interacting with a specialized terminal, which allows it to:
- 🔍 Open, scroll and search through files
- ✍️ Edit specific lines w/ automatic syntax check
- 🧪 Write and execute tests
- This custom-built interface is critical for good performance. Simply connecting an LM to a vanilla bash terminal does not work well.
- "Our key insight is that LMs require carefully designed agent-computer interfaces (similar to how humans like good UI design). E.g. When the LM messes up indentation, our editor prevents it and gives feedback."
- SWE-agent was released by the Princeton NLP team.
- What makes SWE-agent special is that it performs almost as well as Devin on the SWE-bench.
- It is important to say that the performance varies based on the model used by the agent.
- The changes and innovations in SWE-agent compared to Devin are:
- The code in SWE Agent is executed locally via Docker.
- It uses "Agent-Computer Interface" (ACI) - constraining the interface makes the agent easier to use for LMs. Only a few commants are allowed: run code, look for code, edit code and submit changes to GitHub.
- Any code the agent writes goes through a syntax check (linter) before being submitted. If the syntax is incorrect, the agent gets feedback and is forced to redo the code.
- The agent can only read 100 lines of code at a time, rather than the entire file. This makes it easier for the language model to understand the code.
Request Update
If you are the owner of [SWE Agent] and want to update the information, please contact us.