Merge branch 'main' into 1023-ruff-linting-for-python-code

# Conflicts: # tests/core/default/test_steps.py
AntonOsika · May 3, 2024 · 673a72a · 673a72a
2 parents d23545b + a0794bf
commit 673a72a
Show file tree

Hide file tree

Showing 24 changed files with 1,405 additions and 971 deletions.
diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md
@@ -1,8 +1,8 @@
-# Contributing to GPT-engineer
+# Contributing to gpt-engineer
 
-GPT-engineer is a community project and lives from your contributions - they are warmly appreciated. The main contribution avenues are:
-- Bug report: report when something in GPT-engineer doesn't work. Do not report errors in programs written _by_ GPT-engineer.
-- Feature request: provide a detailed sketch about something you want to have implemented in GPT-engineer. There is no guarantee that features will be implemented.
+The gpt-engineer is a community project and lives from your contributions - they are warmly appreciated. The main contribution avenues are:
+- Bug report: report when something in gpt-engineer doesn't work. Do not report errors in programs written _by_ gpt-engineer.
+- Feature request: provide a detailed sketch about something you want to have implemented in gpt-engineer. There is no guarantee that features will be implemented.
 - Discussion: raise awareness of a potential improvement. This is often a good starting point before making a detailed feature request.
 - Pull request: implement code and have it reviewed and potentially merged by the maintainers. Implementations of existing feature requests or fixes to bug reports are likely to be merged.
 
@@ -12,7 +12,7 @@ By participating in this project, you agree to abide by the [code of conduct](ht
 Code that is likely to introduce breaking changes, or significantly change the user experience for users and developers, require [board approval](https://github.com/gpt-engineer-org/gpt-engineer/blob/main/GOVERNANCE.md) to be merged. Smaller code changes can be merged directly.
 As a rule, cosmetic pull requests, for example rephrasing the readme or introducing more compact syntax, that do not yield clear practical improvements are not merged. Such pull requests are generally discouraged, both to save time for the maintainers and to establish a lower bar for becoming a contributor.
 
-## Getting Started with Pull Requests to GPT-engineer
+## Getting Started with Pull Requests to gpt-engineer
 
 To get started with contributing, please follow these steps:
 
@@ -115,6 +115,6 @@ At the beginning this might seem like a tedious process (having to add the file
 
 ## Licensing
 
-By contributing to GPT Engineer, you agree that your contributions will be licensed under the [LICENSE](https://github.com/gpt-engineer-org/gpt-engineer/blob/main/LICENSE) file of the project.
+By contributing to gpt-engineer, you agree that your contributions will be licensed under the [LICENSE](https://github.com/gpt-engineer-org/gpt-engineer/blob/main/LICENSE) file of the project.
 
-Thank you for your interest in contributing to GPT Engineer! We appreciate your support and look forward to your contributions.
+Thank you for your interest in contributing to gpt-engineer! We appreciate your support and look forward to your contributions.
diff --git a/README.md b/README.md
@@ -1,11 +1,14 @@
-# GPT-Engineer
+# gpt-engineer
 
-[![Discord Follow](https://dcbadge.vercel.app/api/server/8tcDQ89Ej2?style=flat)](https://discord.gg/8tcDQ89Ej2)
 [![GitHub Repo stars](https://img.shields.io/github/stars/gpt-engineer-org/gpt-engineer?style=social)](https://github.com/gpt-engineer-org/gpt-engineer)
+[![Discord Follow](https://dcbadge.vercel.app/api/server/8tcDQ89Ej2?style=flat)](https://discord.gg/8tcDQ89Ej2)
+[![License](https://img.shields.io/github/license/gpt-engineer-org/gpt-engineer)](https://github.com/gpt-engineer-org/gpt-engineer/blob/main/LICENSE)
+[![GitHub Issues or Pull Requests](https://img.shields.io/github/issues/gpt-engineer-org/gpt-engineer)](https://github.com/gpt-engineer-org/gpt-engineer/issues)
+![GitHub Release](https://img.shields.io/github/v/release/gpt-engineer-org/gpt-engineer)
 [![Twitter Follow](https://img.shields.io/twitter/follow/antonosika?style=social)](https://twitter.com/antonosika)
 
-GPT-engineer lets you:
-- Specify a software in natural language
+gpt-engineer lets you:
+- Specify software in natural language
 - Sit back and watch as an AI writes and executes the code
 - Ask the AI to implement improvements
 
@@ -23,7 +26,7 @@ For **development**:
 - `poetry install`
 - `poetry shell` to activate the virtual environment
 
-We actively support Python 3.10 - 3.12. The last version to support python 3.8 - 3.9 was [0.2.6](https://pypi.org/project/gpt-engineer/0.2.6/).
+We actively support Python 3.10 - 3.12. The last version to support Python 3.8 - 3.9 was [0.2.6](https://pypi.org/project/gpt-engineer/0.2.6/).
 
 ### Setup API Key
 
@@ -36,7 +39,7 @@ Choose **one** of:
 - Custom model:
     - See [docs](https://gpt-engineer.readthedocs.io/en/latest/open_models.html), supports local model, azure, etc.
 
-Check the [Windows README](./WINDOWS_README.md) for windows usage.
+Check the [Windows README](./WINDOWS_README.md) for Windows usage.
 
 **Other ways to run:**
 - Use Docker ([instructions](docker/README.md))
@@ -58,9 +61,9 @@ Check the [Windows README](./WINDOWS_README.md) for windows usage.
 By running gpt-engineer you agree to our [terms](https://github.com/gpt-engineer-org/gpt-engineer/blob/main/TERMS_OF_USE.md).
 
 
-## Relation to gptengineer.app
-[gptengineer.app](https://gptengineer.app/) is a commercial project for automatic generation of web-apps.
-It features a UI for non-technical users, connected to a git controlled codebase.
+## Relation to gptengineer.app (GPT Engineer)
+[gptengineer.app](https://gptengineer.app/) is a commercial project for the automatic generation of web apps.
+It features a UI for non-technical users connected to a git-controlled codebase.
 The gptengineer.app team is actively supporting the open source community.
 
 
@@ -73,13 +76,13 @@ Editing the `preprompts` is how you make the agent remember things between proje
 
 ### Vision
 
-By default, GPT Engineer expects text input via a `prompt` file. It can also accept imagine inputs for vision capable models. This can be useful for adding UX or architecture diagrams as additional context for GPT Engineer. You can do this by specifiying an image directory with the --image_directory flag and setting a vision capable model in the second cli argument.
+By default, gpt-engineer expects text input via a `prompt` file. It can also accept imagine inputs for vision-capable models. This can be useful for adding UX or architecture diagrams as additional context for GPT Engineer. You can do this by specifying an image directory with the—-image_directory flag and setting a vision-capable model in the second cli argument.
 
 E.g. `gpte projects/example-vision gpt-4-vision-preview --prompt_file prompt/text --image_directory prompt/images -i`
 
 ### Open source, local and alternative models
 
-By defaul GPT Engineer supports OpenAI Models via the OpenAI API or Azure Open AI API, and Anthropic models.
+By default, gpt-engineer supports OpenAI Models via the OpenAI API or Azure Open AI API, and Anthropic models.
 
 With a little extra set up you can also run with open source models, like WizardCoder. See the [documentation](https://gpt-engineer.readthedocs.io/en/latest/open_models.html) for example instructions.
 
@@ -93,7 +96,7 @@ If you want to see our broader ambitions, check out the [roadmap](https://github
 [discord](https://discord.gg/8tcDQ89Ej2)
 to get input on how you can [contribute](.github/CONTRIBUTING.md) to it.
 
-gpt-engineer is [governed](https://github.com/gpt-engineer-org/gpt-engineer/blob/main/GOVERNANCE.md) by a board of long term contributors. If you contribute routinely and have an interest in shaping the future of gpt-engineer, you will be considered for the board.
+gpt-engineer is [governed](https://github.com/gpt-engineer-org/gpt-engineer/blob/main/GOVERNANCE.md) by a board of long-term contributors. If you contribute routinely and have an interest in shaping the future of gpt-engineer, you will be considered for the board.
 
 ## Example
 

diff --git a/gpt_engineer/applications/cli/learning.py b/gpt_engineer/applications/cli/learning.py
@@ -79,7 +79,7 @@ class Learning:
     Attributes
     ----------
     prompt : str
-        The initial prompt provided to the GPT Engineer.
+        A JSON string representing the prompt provided to GPT Engineer.
     model : str
         The name of the model used during the session.
     temperature : float
@@ -98,7 +98,7 @@ class Learning:
         The version of the learning data schema.
     """
 
-    prompt: Prompt
+    prompt: str
     model: str
     temperature: float
     config: str
@@ -266,7 +266,7 @@ def extract_learning(
         An instance of Learning containing all the session details and user feedback.
     """
     return Learning(
-        prompt=prompt,
+        prompt=prompt.to_json(),
         model=model,
         temperature=temperature,
         config=json.dumps(config),

diff --git a/gpt_engineer/applications/cli/main.py b/gpt_engineer/applications/cli/main.py
@@ -247,7 +247,7 @@ def prompt_yesno() -> bool:
 )
 def main(
     project_path: str = typer.Argument(".", help="path"),
-    model: str = typer.Argument("gpt-4-0125-preview", help="model id string"),
+    model: str = typer.Argument("gpt-4-turbo", help="model id string"),
     temperature: float = typer.Option(
         0.1,
         "--temperature",

diff --git a/gpt_engineer/benchmark/__init__.py b/gpt_engineer/benchmark/__init__.py
diff --git a/gpt_engineer/benchmark/__main__.py b/gpt_engineer/benchmark/__main__.py
@@ -20,6 +20,7 @@
     The standard boilerplate for invoking the main function when the script is executed.
 """
 import importlib
+import os.path
 
 from typing import Annotated, Optional
 
@@ -29,9 +30,12 @@
 from langchain.globals import set_llm_cache
 
 from gpt_engineer.applications.cli.main import load_env_if_needed
+from gpt_engineer.benchmark.bench_config import BenchConfig
 from gpt_engineer.benchmark.benchmarks.load import get_benchmark
 from gpt_engineer.benchmark.run import print_results, run
 
+app = typer.Typer()  # creates a CLI app
+
 
 def get_agent(path):
     """
@@ -52,19 +56,24 @@ def get_agent(path):
     return agent_module.default_config_agent()
 
 
+@app.command(
+    help="""
+        Run any benchmark(s) against the specified agent.
+
+        \b
+        Currently available benchmarks are: apps and mbpp
+    """
+)
 def main(
     path_to_agent: Annotated[
         str,
         typer.Argument(
             help="python file that contains a function called 'default_config_agent'"
         ),
     ],
-    benchmarks: Annotated[
-        str, typer.Argument(help="benchmark name(s) separated by ','")
-    ],
-    task_name: Annotated[
+    bench_config: Annotated[
         Optional[str], typer.Argument(help="optional task name in benchmark")
-    ] = None,
+    ] = os.path.join(os.path.dirname(__file__), "default_bench_config.toml"),
     verbose: Annotated[
         bool, typer.Option(help="print results for each task", show_default=False)
     ] = False,
@@ -78,8 +87,8 @@ def main(
         The file path to the Python module that contains a function called 'default_config_agent'.
     benchmarks : str
         A comma-separated string of benchmark names to run.
-    task_name : Optional[str], default=None
-        An optional task name to run within the benchmark.
+    bench_config : Optional[str], default=default_bench_config.toml
+        Configuration file for choosing which benchmark problems to run. See default config for more details.
     verbose : bool, default=False
         A flag to indicate whether to print results for each task.
 
@@ -89,13 +98,27 @@ def main(
     """
     set_llm_cache(SQLiteCache(database_path=".langchain.db"))
     load_env_if_needed()
+    config = BenchConfig.from_toml(bench_config)
+    print("using config file: " + bench_config)
+    benchmarks = list()
+    for specific_config_name in vars(config):
+        specific_config = getattr(config, specific_config_name)
+        if hasattr(specific_config, "active"):
+            if specific_config.active:
+                benchmarks.append(specific_config_name)
 
-    benchmarks = benchmarks.split(",")
     for benchmark_name in benchmarks:
-        benchmark = get_benchmark(benchmark_name)
+        benchmark = get_benchmark(benchmark_name, config)
+        if len(benchmark.tasks) == 0:
+            print(
+                benchmark_name
+                + " was skipped, since no tasks are specified. Increase the number of tasks in the config file at: "
+                + bench_config
+            )
+            continue
         agent = get_agent(path_to_agent)
 
-        results = run(agent, benchmark, task_name, verbose=verbose)
+        results = run(agent, benchmark, verbose=verbose)
         print(
             f"\n--- Results for agent {path_to_agent}, benchmark: {benchmark_name} ---"
         )

diff --git a/gpt_engineer/benchmark/bench_config.py b/gpt_engineer/benchmark/bench_config.py
@@ -0,0 +1,56 @@
+from dataclasses import dataclass, field
+from pathlib import Path
+
+from gpt_engineer.core.project_config import read_config
+
+
+@dataclass
+class AppsConfig:
+    active: bool | None = True
+    test_start_index: int | None = 0
+    test_end_index: int | None = 1
+    train_start_index: int | None = 0
+    train_end_index: int | None = 0
+
+
+@dataclass
+class MbppConfig:
+    active: bool | None = True
+    test_len: int | None = 1
+    train_len: int | None = 0
+
+
+@dataclass
+class GptmeConfig:
+    active: bool | None = True
+
+
+@dataclass
+class GptengConfig:
+    active: bool | None = True
+
+
+@dataclass
+class BenchConfig:
+    """Configuration for the GPT Engineer CLI and gptengineer.app via `gpt-engineer.toml`."""
+
+    apps: AppsConfig = field(default_factory=AppsConfig)
+    mbpp: MbppConfig = field(default_factory=MbppConfig)
+    gptme: GptmeConfig = field(default_factory=GptmeConfig)
+    gpteng: GptengConfig = field(default_factory=GptengConfig)
+
+    @classmethod
+    def from_toml(cls, config_file: Path | str):
+        if isinstance(config_file, str):
+            config_file = Path(config_file)
+        config_dict = read_config(config_file)
+        return cls.from_dict(config_dict)
+
+    @classmethod
+    def from_dict(cls, config_dict: dict):
+        return cls(
+            apps=AppsConfig(**config_dict.get("apps", {})),
+            mbpp=MbppConfig(**config_dict.get("mbpp", {})),
+            gptme=GptmeConfig(**config_dict.get("gptme", {})),
+            gpteng=GptengConfig(**config_dict.get("gpteng", {})),
+        )
diff --git a/gpt_engineer/benchmark/benchmarks/apps/load.py b/gpt_engineer/benchmark/benchmarks/apps/load.py
@@ -16,8 +16,8 @@
 
 from datasets import Dataset, DatasetDict, load_dataset, load_from_disk
 
+from gpt_engineer.benchmark.bench_config import AppsConfig
 from gpt_engineer.benchmark.benchmarks.apps.problem import Problem
-from gpt_engineer.benchmark.benchmarks.apps.problems import PROBLEM_IDS
 from gpt_engineer.benchmark.types import Assertable, Benchmark, Task
 from gpt_engineer.core.default.disk_execution_env import DiskExecutionEnv
 from gpt_engineer.core.files_dict import FilesDict
@@ -57,12 +57,12 @@ def _get_dataset() -> Union[Dataset, DatasetDict]:
         print("Dataset not found locally, downloading...")
 
     dataset = load_dataset("codeparrot/apps", trust_remote_code=True)
-    dataset.save_to_disk(DATASET_PATH)
+    dataset.save_to_disk(str(DATASET_PATH))
 
     return dataset
 
 
-def load_apps():
+def load_apps(config: AppsConfig) -> Benchmark:
     """
     Loads the APPS benchmark, which consists of a series coding problems.
 
@@ -73,17 +73,19 @@ def load_apps():
     """
     dataset = _get_dataset()
     tasks = []
-
-    problems = [
-        Problem(
-            id=problem["problem_id"],
-            question=problem["question"],
-            input_output=problem["input_output"],
-            starter_code=problem["starter_code"],
-        )
-        for problem in dataset["test"]
-        if problem["problem_id"] in PROBLEM_IDS
-    ]
+    problems = list()
+    for dataset_type in ["test", "train"]:
+        problems += [
+            Problem(
+                id=problem["problem_id"],
+                question=problem["question"],
+                input_output=problem["input_output"],
+                starter_code=problem["starter_code"],
+            )
+            for index, problem in enumerate(dataset[dataset_type])
+            if (index < config.__getattribute__(dataset_type + "_end_index"))
+            and (index >= config.__getattribute__(dataset_type + "_start_index"))
+        ]
 
     for problem in problems:
         prompt = Prompt(
@@ -110,6 +112,6 @@ def load_apps():
         )
 
     return Benchmark(
-        name="APPS",
+        name="apps",
         tasks=tasks,
     )
diff --git a/gpt_engineer/benchmark/benchmarks/gpteng/load.py b/gpt_engineer/benchmark/benchmarks/gpteng/load.py
@@ -19,11 +19,13 @@
 
 from pathlib import Path
 
+from gpt_engineer.benchmark.bench_config import GptengConfig
 from gpt_engineer.benchmark.benchmarks.gpteng.eval_tools import (
     check_evaluation_component,
 )
 from gpt_engineer.benchmark.types import Assertable, Benchmark, Task
 from gpt_engineer.core.chat_to_files import chat_to_files_dict
+from gpt_engineer.core.prompt import Prompt
 
 evaluations = [
     {
@@ -192,7 +194,7 @@ def eval_to_task(case):
     return Task(
         name=case["name"],
         initial_code=chat_to_files_dict(Path(case["code_blob"]).read_text()),
-        prompt=prompt,
+        prompt=Prompt(prompt),
         command=None,
         assertions={
             f"{e['type']}_{i}": expect_to_assertion(e)
@@ -201,7 +203,7 @@ def eval_to_task(case):
     )
 
 
-def load_gpteng():
+def load_gpteng(config: GptengConfig) -> Benchmark:
     """
     Loads the GPT-Eng benchmark, which consists of a series of tasks for evaluation.