AGBenchmark: Codebase clean-up #6650

Pwuts · 2024-01-01T23:35:17Z

Codebase is a huge mess. This should fix the worst of it.

Changes 🏗️

Deduplicate configuration loading logic
Fix type errors, linting errors, and clean up CLI validation in main.py
Lint and typefix app.py
Replace .agent_protocol_client by agent-protcol-client, clean up schema.py
Use pathlib in agent_interface.py and agent_api_interface.py
Fix path prefix stacking in AgentApi requests
Improve typing, response validation, and readability in app.py
Clean up logging and print statements
Remove unused server.py and agent_interface.py::run_agent
Clean up conftest.py
Clean up generate_test.py file
Fix and add type annotations in execute_sub_process.py
Simplify const determination in agent_interface.py
Register category markers to prevent warnings
Fix indentation in 4_revenue_retrieval_2/data.json
Update agent_api_interface.py
Improve and centralize pathfinding
Clean up and improve CLI
Move AgentBenchmarkConfig and related functions to config.py
Fix ReportManager init parameter types and use pathlib
Improve typing surrounding ChallengeData and clean up its implementation
Clean up generate_test.py, conftest.py and main.py
Merge AGBenchmarkPathManager into AgentBenchmarkConfig and reduce fragmented/global state
Configurable port for serve subcommand
Add config subcommand
Gracefully handle incompatible challenge spec files in app.py
Move run_benchmark entrypoint to main.py, use it in /reports endpoint
Remove unused /updates endpoint and all related code
Clean up and update docstrings on AgentBenchmarkConfig
Restore mechanism to select (optional) categories in agent benchmark config

PR Quality Scorecard ✨

Have you used the PR description template? +2 pts
Is your pull request atomic, focusing on a single change? +5 pts
Have you linked the GitHub issue(s) that this PR addresses? +5 pts
Have you documented your changes clearly and comprehensively? +5 pts
Have you changed or added a feature? -4 pts
- Have you added/updated corresponding documentation? +4 pts
- Have you added/updated corresponding integration tests? +5 pts
Have you changed the behavior of AutoGPT? -5 pts
- Have you also run agbenchmark to verify that these changes do not regress performance? +10 pts

- Move the configuration loading logic to a separate `load_agbenchmark_config` function in `agbenchmark/config.py` module. - Replace the duplicate loading logic in `conftest.py`, `generate_test.py`, `ReportManager.py`, `reports.py`, and `__main__.py` with calls to `load_agbenchmark_config` function.

…idation in __main__.py - Fixed type errors and linting errors in `__main__.py` - Improved the readability of CLI argument validation by introducing a separate function for it

- Rearranged and cleaned up import statements - Fixed type errors caused by improper use of `psutil` objects - Simplified a number of `os.path` usages by converting to `pathlib` - Use `Task` and `TaskRequestBody` classes from `agent_protocol_client` instead of `.schema`

…ol-client`, clean up schema.py - Remove `agbenchmark.agent_protocol_client` (an offline copy of `agent-protocol-client`). - Add `agent-protocol-client` as a dependency and change imports to `agent_protocol_client`. - Fix type annotation on `agent_api_interface.py::upload_artifacts` (`ApiClient` -> `AgentApi`). - Remove all unused types from schema.py (= most of them).

…interface.py

…lity in app.py - Simplified response generation by leveraging type checking and conversion by FastAPI. - Introduced use of `HTTPException` for error responses. - Improved naming, formatting, and typing in `app.py::create_evaluation`. - Updated the docstring on `app.py::create_agent_task`. - Fixed return type annotations of `create_single_test` and `create_challenge` in generate_test.py. - Added default values to optional attributes on models in report_types_v2.py. - Removed unused imports in `generate_test.py`

- Introduced use of the `logging` library for unified logging and better readability. - Converted most print statements to use `logger.debug`, `logger.warning`, and `logger.error`. - Improved descriptiveness of log statements. - Removed unnecessary print statements. - Added log statements to unspecific and non-verbose `except` blocks. - Added `--debug` flag, which sets the log level to `DEBUG` and enables a more comprehensive log format. - Added `.utils.logging` module with `configure_logging` function to easily configure the logging library. - Converted raw escape sequences in `.utils.challenge` to use `colorama`. - Renamed `generate_test.py::generate_tests` to `load_challenges`.

…run_agent - Remove unused server.py file - Remove unused run_agent function from agent_interface.py

- Fix and add type annotations - Rewrite docstrings - Disable or remove unused code - Fix definition of arguments and their types in `pytest_addoption`

- Refactored the `create_single_test` function for clarity and readability - Removed unused variables - Made creation of `Challenge` subclasses more straightforward - Made bare `except` more specific - Renamed `Challenge.setup_challenge` method to `run_challenge` - Updated type hints and annotations - Made minor code/readability improvements in `load_challenges` - Added a helper function `_add_challenge_to_module` for attaching a Challenge class to the current module

- Simplify the logic that determines the value of `HELICONE_GRAPHQL_LOGS`

- Use the `pytest_configure` hook to register the known challenge categories as markers. Otherwise, Pytest will raise "unknown marker" warnings at runtime.

…l_2/data.json

- Add type annotations to `copy_agent_artifacts_into_temp_folder` function - Add note about broken endpoint in the `agent_protocol_client` library - Remove unused variable in `run_api_agent` function - Improve readability and resolve linting error

- Search path hierarchy for applicable `agbenchmark_config`, rather than assuming it's in the current folder. - Create `agbenchmark.utils.path_manager` with `AGBenchmarkPathManager` and exporting a `PATH_MANAGER` const. - Replace path constants defined in __main__.py with usages of `PATH_MANAGER`.

- Updated commands, options, and their descriptions to be more intuitive and consistent - Moved slow imports into the entrypoints that use them to speed up application startup - Fixed type hints to match output types of Click options - Hid deprecated `agbenchmark start` command - Refactored code to improve readability and maintainability - Moved main entrypoint into `run` subcommand - Fixed `version` and `serve` subcommands - Added `click-default-group` package to allow using `run` implicitly (for backwards compatibility) - Renamed `--no_dep` to `--no-dep` for consistency - Fixed string formatting issues in log statements

…ctions to config.py - Move the `AgentBenchmarkConfig` class from `utils/data_types.py` to `config.py`. - Extract the `calculate_info_test_path` function from `utils/data_types.py` and move it to `config.py` as a private helper function `_calculate_info_test_path`. - Move `load_agent_benchmark_config()` to `AgentBenchmarkConfig.load()`. - Changed simple getter methods on `AgentBenchmarkConfig` to calculated properties. - Update all code references according to the changes mentioned above.

…athlib - Fix the type annotation of the `benchmark_start_time` parameter in `ReportManager.__init__`, was mistyped as `str` instead of `datetime`. - Change the type of the `filename` parameter in the `ReportManager.__init__` method from `str` to `Path`. - Rename `self.filename` with `self.report_file` in `ReportManager`. - Change the way the report file is created, opened and saved to use the `Path` object.

…an up its implementation - Use `ChallengeData` objects instead of untyped `dict` in app.py, generate_test.py, reports.py. - Remove unnecessary methods `serialize`, `get_data`, `get_json_from_path`, `deserialize` from `ChallengeData` class. - Remove unused methods `challenge_from_datum` and `challenge_from_test_data` from `ChallengeData class. - Update function signatures and annotations of `create_challenge` and `generate_single_test` functions in generate_test.py. - Add types to function signatures of `generate_single_call_report` and `finalize_reports` in reports.py. - Remove unnecessary `challenge_data` parameter (in generate_test.py) and fixture (in conftest.py).

…n__.py - Cleaned up generate_test.py and conftest.py - Consolidated challenge creation logic in the `Challenge` class itself, most notably the new `Challenge.from_challenge_spec` method. - Moved challenge selection logic from generate_test.py to the `pytest_collection_modifyitems` hook in conftest.py. - Converted methods in the `Challenge` class to class methods where appropriate. - Improved argument handling in the `run_benchmark` function in `__main__.py`.

…nchmarkConfig and reduce fragmented/global state - Merge the functionality of `AGBenchmarkPathManager` into `AgentBenchmarkConfig` to consolidate the configuration management. - Remove the `.path_manager` module containing `AGBenchmarkPathManager`. - Pass the `AgentBenchmarkConfig` and its attributes through function arguments to reduce global state and improve code clarity.

- Added `--port` option to `serve` subcommand to allow for specifying the port to run the API on. - If no `--port` option is provided, the port will default to the value specified in the `PORT` environment variable, or 8080 if not set.

- Added a new subcommand `config` to the AGBenchmark CLI, to display information about the present AGBenchmark config.

github-actions · 2024-01-01T23:35:38Z

This PR exceeds the recommended size of 500 lines. Please make sure you are NOT addressing multiple issues with one PR.

netlify · 2024-01-01T23:36:19Z

✅ Deploy Preview for auto-gpt-docs ready!

Name	Link
🔨 Latest commit	`2135019`
🔍 Latest deploy log	https://app.netlify.com/sites/auto-gpt-docs/deploys/6594502fa8296100085460d4
😎 Deploy Preview	https://deploy-preview-6650--auto-gpt-docs.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

benchmark/agbenchmark/__main__.py

benchmark/agbenchmark/app.py

benchmark/agbenchmark/config.py

…n app.py - Added a check to skip deprecated challenges - Added logging to allow debugging of the loading process - Added handling of validation errors when parsing challenge spec files - Added missing `spec_file` attribute to `ChallengeData`

…it in `/reports` endpoint - Move `run_benchmark` and `validate_args` from __main__.py to main.py - Replace agbenchmark subprocess in `app.py:run_single_test` with `run_benchmark` - Move `get_unique_categories` from __main__.py to challenges/__init__.py - Move `OPTIONAL_CATEGORIES` from __main__.py to challenge.py - Reduce operations on updates.json (including `initialize_updates_file`) outside of API

…d code - Remove `updates_json_file` attribute from `AgentBenchmarkConfig` - Remove `get_updates` and `_initialize_updates_file` in app.py - Remove `append_updates_file` and `create_update_json` functions in agent_api_interface.py - Remove call to `append_updates_file` in challenge.py

…enchmarkConfig` - Add and update docstrings - Change base class from `BaseModel` to `BaseSettings`, allow extras for backwards compatibility - Make naming of path attributes on `AgentBenchmarkConfig` more consistent - Remove unused `agent_home_directory` attribute - Remove unused `workspace` attribute

…agent benchmark config

github-actions · 2024-01-02T16:33:46Z

This PR exceeds the recommended size of 500 lines. Please make sure you are NOT addressing multiple issues with one PR.

github-actions · 2024-01-02T16:42:32Z

This PR exceeds the recommended size of 500 lines. Please make sure you are NOT addressing multiple issues with one PR.

github-actions · 2024-01-02T16:47:38Z

This PR exceeds the recommended size of 500 lines. Please make sure you are NOT addressing multiple issues with one PR.

github-actions · 2024-01-02T16:52:46Z

This PR exceeds the recommended size of 500 lines. Please make sure you are NOT addressing multiple issues with one PR.

Pwuts · 2024-01-02T17:01:27Z

Blocked by this PR:

Fix Python client Div99/agent-protocol#93

jzanecook · 2024-01-02T17:08:53Z

Blocked by this PR:

Fix Python client AI-Engineer-Foundation/agent-protocol#93

Should be good upstream now.

github-actions · 2024-01-02T17:54:15Z

This PR exceeds the recommended size of 500 lines. Please make sure you are NOT addressing multiple issues with one PR.

…g` path attributes

- Fixes issue with fetching task artifact listings

github-actions · 2024-01-02T18:04:47Z

This PR exceeds the recommended size of 500 lines. Please make sure you are NOT addressing multiple issues with one PR.

Pwuts added 25 commits December 28, 2023 16:28

fix(benchmark): Fix type errors, linting errors, and clean up CLI val…

c14cfd8

…idation in __main__.py - Fixed type errors and linting errors in `__main__.py` - Improved the readability of CLI argument validation by introducing a separate function for it

refactor(benchmark): Use pathlib in agent_interface.py and agent_api_…

9fb7b75

…interface.py

fix(benchmark): Fix path prefix stacking in AgentApi requests

14d52b8

refactor(benchmark): Remove unused server.py and agent_interface.py::…

56d8d83

…run_agent - Remove unused server.py file - Remove unused run_agent function from agent_interface.py

refactor(benchmark): Clean up conftest.py

1aa1261

- Fix and add type annotations - Rewrite docstrings - Disable or remove unused code - Fix definition of arguments and their types in `pytest_addoption`

fix(benchmark): Fix and add type annotations in execute_sub_process.py

294f6ff

refactor(benchmark): Simplify const determination in agent_interface.py

1ea4123

- Simplify the logic that determines the value of `HELICONE_GRAPHQL_LOGS`

fix(benchmark): Register category markers to prevent warnings

c7cf2c7

- Use the `pytest_configure` hook to register the known challenge categories as markers. Otherwise, Pytest will raise "unknown marker" warnings at runtime.

refactor(benchmark/challenges): Fix indentation in 4_revenue_retrieva…

1db4bdc

…l_2/data.json

feat(benchmark/cli): Add config subcommand

116f8c9

- Added a new subcommand `config` to the AGBenchmark CLI, to display information about the present AGBenchmark config.

github-actions bot added the size/xl label Jan 1, 2024

Pwuts added code quality ⬆️ PRs that improve code quality Classic Benchmark labels Jan 1, 2024

ntindle reviewed Jan 2, 2024

View reviewed changes

Pwuts added 5 commits January 2, 2024 16:49

fix(benchmark): Restore mechanism to select (optional) categories in …

7b92e81

…agent benchmark config

lint(benchmark): Fix unused imports

2b56e67

Pwuts force-pushed the benchmark/clean-up branch from a202a7a to f8a97f9 Compare January 2, 2024 16:52

Pwuts marked this pull request as ready for review January 2, 2024 17:54

Pwuts requested a review from a team January 2, 2024 17:54

Pwuts requested a review from a team as a code owner January 2, 2024 17:54

Pwuts added 2 commits January 2, 2024 19:03

fix(benchmark): Rename left-behind references to `AgentBenchmarkConfi…

25c1aae

…g` path attributes

fix(benchmark): Update agent-protocol-client to v1.1.0

2135019

- Fixes issue with fetching task artifact listings

Pwuts force-pushed the benchmark/clean-up branch from a72eac2 to 2135019 Compare January 2, 2024 18:04

Pwuts merged commit 25cc6ad into master Jan 2, 2024
12 of 13 checks passed

Pwuts deleted the benchmark/clean-up branch January 2, 2024 21:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AGBenchmark: Codebase clean-up #6650

AGBenchmark: Codebase clean-up #6650

Pwuts commented Jan 1, 2024 •

edited

Loading

github-actions bot commented Jan 1, 2024

netlify bot commented Jan 1, 2024 •

edited

Loading

github-actions bot commented Jan 2, 2024

github-actions bot commented Jan 2, 2024

github-actions bot commented Jan 2, 2024

github-actions bot commented Jan 2, 2024

Pwuts commented Jan 2, 2024

jzanecook commented Jan 2, 2024

github-actions bot commented Jan 2, 2024

github-actions bot commented Jan 2, 2024

AGBenchmark: Codebase clean-up #6650

AGBenchmark: Codebase clean-up #6650

Conversation

Pwuts commented Jan 1, 2024 • edited Loading

Changes 🏗️

PR Quality Scorecard ✨

github-actions bot commented Jan 1, 2024

netlify bot commented Jan 1, 2024 • edited Loading

✅ Deploy Preview for auto-gpt-docs ready!

github-actions bot commented Jan 2, 2024

github-actions bot commented Jan 2, 2024

github-actions bot commented Jan 2, 2024

github-actions bot commented Jan 2, 2024

Pwuts commented Jan 2, 2024

jzanecook commented Jan 2, 2024

github-actions bot commented Jan 2, 2024

github-actions bot commented Jan 2, 2024

Pwuts commented Jan 1, 2024 •

edited

Loading

netlify bot commented Jan 1, 2024 •

edited

Loading