Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bench config #1126

Merged
merged 8 commits into from
Apr 25, 2024
Merged

Bench config #1126

merged 8 commits into from
Apr 25, 2024

Conversation

ATheorell
Copy link
Collaborator

@ATheorell ATheorell commented Apr 21, 2024

Added a .toml for configuration of the benchmarks for increased customization and better UX. Style leans heavily on project_config.py @ErikBjare.

CHANGES COMMAND LINE INTERFACE TO THE BENCH APPLICATION.

Would be great if we can get this integrated relatively soon, since I think future work should use this config #913

Opinions? @ErikBjare @azrv @Mohit-Dhawan98 @AntonOsika


🚀 This description was created by Ellipsis for commit dca2dc0

Summary:

This PR introduces a new configuration system for the benchmarking tool, allowing for greater customization of the benchmarks to run, and includes minor changes to pyproject.toml.

Key points:

  • Added a .toml for configuration of the benchmarks for increased customization and better UX.
  • Style leans heavily on project_config.py @ErikBjare.
  • CHANGES COMMAND LINE INTERFACE TO THE BENCH APPLICATION.
  • Added BenchConfig class in bench_config.py for handling benchmark configurations.
  • Updated __main__.py in the benchmark directory to use the new configuration system.
  • Updated the loading functions for each benchmark to take a configuration object as an argument.
  • Removed the task_name argument from the run function in run.py.
  • Added a default configuration file default_bench_config.toml.
  • Added a new test file test_BenchConfig.py to test the new configuration system.
  • Minor changes to pyproject.toml.

Generated with ❤️ by ellipsis.dev

Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Changes requested.

  • Reviewed the entire pull request up to f4aa2d1
  • Looked at 542 lines of code in 12 files
  • Took 1 minute and 48 seconds to review
More info
  • Skipped 0 files when reviewing.
  • Skipped posting 5 additional comments because they didn't meet confidence threshold of 50%.
1. /gpt_engineer/benchmark/__main__.py:112:
  • Assessed confidence : 80%
  • Grade: 0%
  • Comment:
    The get_benchmark function is now expected to receive a config parameter, but the get_benchmark function in the benchmarks.load module does not seem to handle the case where this parameter is not provided. This could lead to errors if the function is called without a config parameter elsewhere in the codebase. Consider providing a default value for the config parameter in the get_benchmark function definition.
  • Reasoning:
    The PR introduces a new configuration system for benchmarks, which allows users to specify which benchmarks to run and their parameters. The changes seem to be well implemented, with the new configuration system being integrated into the existing benchmarking system. However, there are a few potential issues that need to be addressed.
2. /gpt_engineer/benchmark/benchmarks/apps/load.py:78:
  • Assessed confidence : 80%
  • Grade: 0%
  • Comment:
    The load_apps function now expects a config parameter, but the function does not seem to handle the case where this parameter is not provided. This could lead to errors if the function is called without a config parameter elsewhere in the codebase. Consider providing a default value for the config parameter in the load_apps function definition.
  • Reasoning:
    The PR introduces a new configuration system for benchmarks, which allows users to specify which benchmarks to run and their parameters. The changes seem to be well implemented, with the new configuration system being integrated into the existing benchmarking system. However, there are a few potential issues that need to be addressed.
3. /gpt_engineer/benchmark/benchmarks/gpteng/load.py:206:
  • Assessed confidence : 80%
  • Grade: 30%
  • Comment:
    The load_gpteng function now expects a config parameter, but the function does not seem to handle the case where this parameter is not provided. This could lead to errors if the function is called without a config parameter elsewhere in the codebase. Consider providing a default value for the config parameter in the load_gpteng function definition.
  • Reasoning:
    The PR introduces a new configuration system for benchmarks, which allows users to specify which benchmarks to run and their parameters. The changes seem to be well implemented, with the new configuration system being integrated into the existing benchmarking system. However, there are a few potential issues that need to be addressed.
4. /gpt_engineer/benchmark/benchmarks/gptme/load.py:19:
  • Assessed confidence : 80%
  • Grade: 0%
  • Comment:
    The load_gptme function now expects a config parameter, but the function does not seem to handle the case where this parameter is not provided. This could lead to errors if the function is called without a config parameter elsewhere in the codebase. Consider providing a default value for the config parameter in the load_gptme function definition.
  • Reasoning:
    The PR introduces a new configuration system for benchmarks, which allows users to specify which benchmarks to run and their parameters. The changes seem to be well implemented, with the new configuration system being integrated into the existing benchmarking system. However, there are a few potential issues that need to be addressed.
5. /gpt_engineer/benchmark/benchmarks/mbpp/load.py:65:
  • Assessed confidence : 80%
  • Grade: 0%
  • Comment:
    The load_mbpp function now expects a config parameter, but the function does not seem to handle the case where this parameter is not provided. This could lead to errors if the function is called without a config parameter elsewhere in the codebase. Consider providing a default value for the config parameter in the load_mbpp function definition.
  • Reasoning:
    The PR introduces a new configuration system for benchmarks, which allows users to specify which benchmarks to run and their parameters. The changes seem to be well implemented, with the new configuration system being integrated into the existing benchmarking system. However, there are a few potential issues that need to be addressed.

Workflow ID: wflow_E5aNaWWb4Zd6TX0y


Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. We'll respond in a few minutes. Learn more here.

⌛ You have 5 hours remaining in your free trial. Upgrade at https://app.ellipsis.dev for $20/developer/month. If you have any questions, reach us at [email protected]

gpteng: GptengConfig = field(default_factory=GptengConfig)

@classmethod
def from_toml(cls, config_file: Path | str):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The from_toml method in the BenchConfig class does not seem to handle the case where the provided config_file does not exist or is not a valid TOML file. This could lead to unhandled exceptions if the method is called with an invalid file path. Consider adding error handling to this method to provide a more informative error message in such cases.

Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me!

  • Performed an incremental review on dca2dc0
  • Looked at 11 lines of code in 1 files
  • Took 1 minute and 1 seconds to review
More info
  • Skipped 0 files when reviewing.
  • Skipped posting 1 additional comments because they didn't meet confidence threshold of 50%.
1. pyproject.toml:103:
  • Assessed confidence : 0%
  • Comment:
    Good addition of a newline at the end of the file. This adheres to POSIX standards and ensures the file can be easily concatenated with other files.
  • Reasoning:
    The change in the PR is adding a newline at the end of the file. This is a good practice as it ensures that the file can be easily concatenated with other files. It also adheres to the POSIX standards.

Workflow ID: wflow_I5JIqbvh7ESkjAWV


Not what you expected? You can customize the content of the reviews using rules. Learn more here.

⌛ You have 5 hours remaining in your free trial. Upgrade at https://app.ellipsis.dev for $20/developer/month. If you have any questions, reach us at [email protected]

@viborc viborc merged commit 7630560 into main Apr 25, 2024
4 checks passed
@viborc viborc deleted the bench_config branch April 25, 2024 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants