Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatic benchmarking of gpt-engineer with swe-bench #913

Open
AntonOsika opened this issue Dec 18, 2023 · 6 comments
Open

Automatic benchmarking of gpt-engineer with swe-bench #913

AntonOsika opened this issue Dec 18, 2023 · 6 comments
Assignees
Labels
enhancement New feature or request

Comments

@AntonOsika
Copy link
Owner

AntonOsika commented Dec 18, 2023

Feature description

We have a way to easily add benchmarks:

https://www.loom.com/share/206805143fbb4302b5455a5329eaab17?sid=f689608f-8e49-44f7-b55f-4c81e9dc93e6

This issue is about looking into if swe-bench is a good benchmark to add and then add a simple version of it.

@AntonOsika AntonOsika added enhancement New feature or request triage Interesting but stale issue. Will be close if inactive for 3 more days after label added. labels Dec 18, 2023
@AntonOsika AntonOsika changed the title Automatic benchmarking of gpt-engineer with [swe-bench](https://github.com/princeton-nlp/SWE-bench) Automatic benchmarking of gpt-engineer with swe-bench Dec 18, 2023
@viborc viborc moved this to Todo in gpt-engineer roadmap Feb 8, 2024
@ErikBjare
Copy link
Collaborator

Tempted to prioritize this higher after the Devin announcement (just as @batwood001 in #1062).

@viborc
Copy link
Collaborator

viborc commented Mar 13, 2024

Makes sense. Let's figure it out this Thursday at our tech planning meeting and the availability of people.

@AntonOsika AntonOsika removed the triage Interesting but stale issue. Will be close if inactive for 3 more days after label added. label Mar 14, 2024
@viborc viborc moved this from Todo to In Progress in gpt-engineer roadmap Mar 28, 2024
@Mohit-Dhawan98
Copy link

@viborc can you assign this to me?

@viborc
Copy link
Collaborator

viborc commented Mar 28, 2024

@viborc can you assign this to me?

Done!

@viborc
Copy link
Collaborator

viborc commented May 4, 2024

This is more of a general update to the community than anything else. The work on this issue is ongoing, and @Mohit-Dhawan98 is working on it with @ATheorell's support. We'll likely have SWE bench support in the near future!

@viborc
Copy link
Collaborator

viborc commented Jul 18, 2024

Someone from the OpenDevin suggested we might look into their work here and possibly learn from it and re-use if needed. Putting this here for our reference: https://github.com/OpenDevin/OpenDevin/tree/main/evaluation/swe_bench

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants