Bench config #1126

ATheorell · 2024-04-21T13:42:49Z

Added a .toml for configuration of the benchmarks for increased customization and better UX. Style leans heavily on project_config.py @ErikBjare.

CHANGES COMMAND LINE INTERFACE TO THE BENCH APPLICATION.

Would be great if we can get this integrated relatively soon, since I think future work should use this config #913

Opinions? @ErikBjare @azrv @Mohit-Dhawan98 @AntonOsika

🚀 This description was created by Ellipsis for commit `dca2dc0`

Summary:

This PR introduces a new configuration system for the benchmarking tool, allowing for greater customization of the benchmarks to run, and includes minor changes to pyproject.toml.

Key points:

Added a .toml for configuration of the benchmarks for increased customization and better UX.
Style leans heavily on project_config.py @ErikBjare.
CHANGES COMMAND LINE INTERFACE TO THE BENCH APPLICATION.
Added BenchConfig class in bench_config.py for handling benchmark configurations.
Updated __main__.py in the benchmark directory to use the new configuration system.
Updated the loading functions for each benchmark to take a configuration object as an argument.
Removed the task_name argument from the run function in run.py.
Added a default configuration file default_bench_config.toml.
Added a new test file test_BenchConfig.py to test the new configuration system.
Minor changes to pyproject.toml.

Generated with ❤️ by ellipsis.dev

ellipsis-dev

❌ Changes requested.

Reviewed the entire pull request up to f4aa2d1
Looked at 542 lines of code in 12 files
Took 1 minute and 48 seconds to review

More info

Skipped 0 files when reviewing.
Skipped posting 5 additional comments because they didn't meet confidence threshold of 50%.

1. /gpt_engineer/benchmark/__main__.py:112:

Assessed confidence : 80%
Grade: 0%
Comment:
The get_benchmark function is now expected to receive a config parameter, but the get_benchmark function in the benchmarks.load module does not seem to handle the case where this parameter is not provided. This could lead to errors if the function is called without a config parameter elsewhere in the codebase. Consider providing a default value for the config parameter in the get_benchmark function definition.
Reasoning:
The PR introduces a new configuration system for benchmarks, which allows users to specify which benchmarks to run and their parameters. The changes seem to be well implemented, with the new configuration system being integrated into the existing benchmarking system. However, there are a few potential issues that need to be addressed.

2. /gpt_engineer/benchmark/benchmarks/apps/load.py:78:

Assessed confidence : 80%
Grade: 0%
Comment:
The load_apps function now expects a config parameter, but the function does not seem to handle the case where this parameter is not provided. This could lead to errors if the function is called without a config parameter elsewhere in the codebase. Consider providing a default value for the config parameter in the load_apps function definition.
Reasoning:
The PR introduces a new configuration system for benchmarks, which allows users to specify which benchmarks to run and their parameters. The changes seem to be well implemented, with the new configuration system being integrated into the existing benchmarking system. However, there are a few potential issues that need to be addressed.

3. /gpt_engineer/benchmark/benchmarks/gpteng/load.py:206:

Assessed confidence : 80%
Grade: 30%
Comment:
The load_gpteng function now expects a config parameter, but the function does not seem to handle the case where this parameter is not provided. This could lead to errors if the function is called without a config parameter elsewhere in the codebase. Consider providing a default value for the config parameter in the load_gpteng function definition.
Reasoning:
The PR introduces a new configuration system for benchmarks, which allows users to specify which benchmarks to run and their parameters. The changes seem to be well implemented, with the new configuration system being integrated into the existing benchmarking system. However, there are a few potential issues that need to be addressed.

4. /gpt_engineer/benchmark/benchmarks/gptme/load.py:19:

Assessed confidence : 80%
Grade: 0%
Comment:
The load_gptme function now expects a config parameter, but the function does not seem to handle the case where this parameter is not provided. This could lead to errors if the function is called without a config parameter elsewhere in the codebase. Consider providing a default value for the config parameter in the load_gptme function definition.
Reasoning:
The PR introduces a new configuration system for benchmarks, which allows users to specify which benchmarks to run and their parameters. The changes seem to be well implemented, with the new configuration system being integrated into the existing benchmarking system. However, there are a few potential issues that need to be addressed.

5. /gpt_engineer/benchmark/benchmarks/mbpp/load.py:65:

Assessed confidence : 80%
Grade: 0%
Comment:
The load_mbpp function now expects a config parameter, but the function does not seem to handle the case where this parameter is not provided. This could lead to errors if the function is called without a config parameter elsewhere in the codebase. Consider providing a default value for the config parameter in the load_mbpp function definition.
Reasoning:
The PR introduces a new configuration system for benchmarks, which allows users to specify which benchmarks to run and their parameters. The changes seem to be well implemented, with the new configuration system being integrated into the existing benchmarking system. However, there are a few potential issues that need to be addressed.

Workflow ID: wflow_E5aNaWWb4Zd6TX0y

Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. We'll respond in a few minutes. Learn more here.

⌛ You have 5 hours remaining in your free trial. Upgrade at https://app.ellipsis.dev for $20/developer/month. If you have any questions, reach us at [email protected]

ellipsis-dev · 2024-04-21T13:44:48Z

gpt_engineer/benchmark/bench_config.py

+    gpteng: GptengConfig = field(default_factory=GptengConfig)
+
+    @classmethod
+    def from_toml(cls, config_file: Path | str):


The from_toml method in the BenchConfig class does not seem to handle the case where the provided config_file does not exist or is not a valid TOML file. This could lead to unhandled exceptions if the method is called with an invalid file path. Consider adding error handling to this method to provide a more informative error message in such cases.

ellipsis-dev

👍 Looks good to me!

Performed an incremental review on dca2dc0
Looked at 11 lines of code in 1 files
Took 1 minute and 1 seconds to review

More info

Skipped 0 files when reviewing.
Skipped posting 1 additional comments because they didn't meet confidence threshold of 50%.

1. pyproject.toml:103:

Assessed confidence : 0%
Comment:
Good addition of a newline at the end of the file. This adheres to POSIX standards and ensures the file can be easily concatenated with other files.
Reasoning:
The change in the PR is adding a newline at the end of the file. This is a good practice as it ensures that the file can be easily concatenated with other files. It also adheres to the POSIX standards.

Workflow ID: wflow_I5JIqbvh7ESkjAWV

Not what you expected? You can customize the content of the reviews using rules. Learn more here.

⌛ You have 5 hours remaining in your free trial. Upgrade at https://app.ellipsis.dev for $20/developer/month. If you have any questions, reach us at [email protected]

ATheorell added 6 commits April 21, 2024 10:10

config succesfully imported

7e5a2f8

WIP

b23ce56

All bench suits running with config file options

f0f7fd8

Simplified benchconfig significantly

d7aec6b

ruff fix

d446900

Fix pytest mark warning

f4aa2d1

ellipsis-dev bot reviewed Apr 21, 2024

View reviewed changes

added newline at the end of pyproject file

dca2dc0

ellipsis-dev bot reviewed Apr 21, 2024

View reviewed changes

FIx posixpath iteration bug (occurs when bench is installed)

3542d17

viborc merged commit 7630560 into main Apr 25, 2024
4 checks passed

viborc deleted the bench_config branch April 25, 2024 17:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bench config #1126

Bench config #1126

ATheorell commented Apr 21, 2024 •

edited by ellipsis-dev bot

Loading

ellipsis-dev bot left a comment

ellipsis-dev bot Apr 21, 2024

ellipsis-dev bot left a comment

Bench config #1126

Bench config #1126

Conversation

ATheorell commented Apr 21, 2024 • edited by ellipsis-dev bot Loading

Summary:

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot Apr 21, 2024

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ATheorell commented Apr 21, 2024 •

edited by ellipsis-dev bot

Loading