-
Notifications
You must be signed in to change notification settings - Fork 44.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-arch #4770
Comments
Note for: #4787 and other File Abilities Do we want to keep the operation log that logs all file operations? |
Let's start with the answer being no, and talk about it together later. I need a little more context about what it's for. My understanding is that this was primarily to get the system not to try to write the same file over and over and similar issues. That's not just a file operation problem though. We want smarter planning logic generally so something like that doesn't happen (which we're working on). |
Rough sketching out of a hello world using our refactored autogpt library. See the tracking issue here: #4770. # Run instructions There are two client applications for Auto-GPT included. ## CLI Application :star2: **This is the reference application I'm working with for now** :star2: The first app is a straight CLI application. I have not done anything yet to port all the friendly display stuff from the `logger.typewriter_log` logic. - [Entry Point](https://github.com/Significant-Gravitas/Auto-GPT/blob/re-arch/hello-world/autogpt/core/runner/cli_app/cli.py) - [Client Application](https://github.com/Significant-Gravitas/Auto-GPT/blob/re-arch/hello-world/autogpt/core/runner/cli_app/main.py) To run, you first need a settings file. Run ``` python REPOSITORY_ROOT/autogpt/core/runner/cli_app/cli.py make-settings ``` where `REPOSITORY_ROOT` is the root of the Auto-GPT repository on your machine. This will write a file called `default_agent_settings.yaml` with all the user-modifiable configuration keys to `~/auto-gpt/default_agent_settings.yml` and make the `auto-gpt` directory in your user directory if it doesn't exist). At a bare minimum, you'll need to set `openai.credentials.api_key` to your OpenAI API Key to run the model. You can then run Auto-GPT with ``` python REPOSITORY_ROOT/autogpt/core/runner/cli_app/cli.py make-settings ``` to launch the interaction loop. ## CLI Web App The second app is still a CLI, but it sets up a local webserver that the client application talks to rather than invoking calls to the Agent library code directly. This application is essentially a sketch at this point as the folks who were driving it have had less time (and likely not enough clarity) to proceed. - [Entry Point](https://github.com/Significant-Gravitas/Auto-GPT/blob/re-arch/hello-world/autogpt/core/runner/cli_web_app/cli.py) - [Client Application](https://github.com/Significant-Gravitas/Auto-GPT/blob/re-arch/hello-world/autogpt/core/runner/cli_web_app/client/client.py) - [Server API](https://github.com/Significant-Gravitas/Auto-GPT/blob/re-arch/hello-world/autogpt/core/runner/cli_web_app/server/api.py) To run, you still need to generate a default configuration. You can do ``` python REPOSITORY_ROOT/autogpt/core/runner/cli_web_app/cli.py make-settings ``` It invokes the same command as the bare CLI app, so follow the instructions above about setting your API key. To run, do ``` python REPOSITORY_ROOT/autogpt/core/runner/cli_web_app/cli.py client ``` This will launch a webserver and then start the client cli application to communicate with it. :warning: I am not actively developing this application. It is a very good place to get involved if you have web application design experience and are looking to get involved in the re-arch. --------- Co-authored-by: David Wurtz <[email protected]> Co-authored-by: Media <[email protected]> Co-authored-by: Richard Beales <[email protected]> Co-authored-by: Daryl Rodrigo <[email protected]> Co-authored-by: Daryl Rodrigo <[email protected]> Co-authored-by: Swifty <[email protected]> Co-authored-by: Nicholas Tindle <[email protected]> Co-authored-by: Merwane Hamadi <[email protected]>
Rough sketching out of a hello world using our refactored autogpt library. See the tracking issue here: Significant-Gravitas#4770. # Run instructions There are two client applications for Auto-GPT included. ## CLI Application :star2: **This is the reference application I'm working with for now** :star2: The first app is a straight CLI application. I have not done anything yet to port all the friendly display stuff from the `logger.typewriter_log` logic. - [Entry Point](https://github.com/Significant-Gravitas/Auto-GPT/blob/re-arch/hello-world/autogpt/core/runner/cli_app/cli.py) - [Client Application](https://github.com/Significant-Gravitas/Auto-GPT/blob/re-arch/hello-world/autogpt/core/runner/cli_app/main.py) To run, you first need a settings file. Run ``` python REPOSITORY_ROOT/autogpt/core/runner/cli_app/cli.py make-settings ``` where `REPOSITORY_ROOT` is the root of the Auto-GPT repository on your machine. This will write a file called `default_agent_settings.yaml` with all the user-modifiable configuration keys to `~/auto-gpt/default_agent_settings.yml` and make the `auto-gpt` directory in your user directory if it doesn't exist). At a bare minimum, you'll need to set `openai.credentials.api_key` to your OpenAI API Key to run the model. You can then run Auto-GPT with ``` python REPOSITORY_ROOT/autogpt/core/runner/cli_app/cli.py make-settings ``` to launch the interaction loop. ## CLI Web App The second app is still a CLI, but it sets up a local webserver that the client application talks to rather than invoking calls to the Agent library code directly. This application is essentially a sketch at this point as the folks who were driving it have had less time (and likely not enough clarity) to proceed. - [Entry Point](https://github.com/Significant-Gravitas/Auto-GPT/blob/re-arch/hello-world/autogpt/core/runner/cli_web_app/cli.py) - [Client Application](https://github.com/Significant-Gravitas/Auto-GPT/blob/re-arch/hello-world/autogpt/core/runner/cli_web_app/client/client.py) - [Server API](https://github.com/Significant-Gravitas/Auto-GPT/blob/re-arch/hello-world/autogpt/core/runner/cli_web_app/server/api.py) To run, you still need to generate a default configuration. You can do ``` python REPOSITORY_ROOT/autogpt/core/runner/cli_web_app/cli.py make-settings ``` It invokes the same command as the bare CLI app, so follow the instructions above about setting your API key. To run, do ``` python REPOSITORY_ROOT/autogpt/core/runner/cli_web_app/cli.py client ``` This will launch a webserver and then start the client cli application to communicate with it. :warning: I am not actively developing this application. It is a very good place to get involved if you have web application design experience and are looking to get involved in the re-arch. --------- Co-authored-by: David Wurtz <[email protected]> Co-authored-by: Media <[email protected]> Co-authored-by: Richard Beales <[email protected]> Co-authored-by: Daryl Rodrigo <[email protected]> Co-authored-by: Daryl Rodrigo <[email protected]> Co-authored-by: Swifty <[email protected]> Co-authored-by: Nicholas Tindle <[email protected]> Co-authored-by: Merwane Hamadi <[email protected]>
This issue has automatically been marked as stale because it has not had any activity in the last 50 days. You can unstale it by commenting or removing the label. Otherwise, this issue will be closed in 10 days. |
This issue was closed automatically because it has been stale for 10 days with no activity. |
Overview
Key Documents
The Motivation
The
master
branch of Auto-GPT is an organically grown amalgamation of many thoughts and ideas about agent-driven autonomous systems. It lacks clear abstraction boundaries, has issues of global state and poorly encapsulated state, and is generally just hard to make effective changes to. Mainly it's just a system that's hard to make changes to. And research in the field is moving fast, so we want to be able to try new ideas quickly.Initial Planning
A large group of maintainers and contributors met do discuss the architectural challenges associated with the existing codebase. Many much-desired features (building new user interfaces, enabling project-specific agents, enabling multi-agent systems) are bottlenecked by the global state in the system. We discussed the tradeoffs between an incremental system transition and a big breaking version change and decided to go for the breaking version change. We justified this by saying:
We can maintain, in essence, the same user experience as now even with a radical restructuring of the codebase
Our developer audience is struggling to use the existing codebase to build applications and libraries of their own, so this breaking change will largely be welcome.
Primary Goals
Secondary goals
The Branches
Base Feature Branch
This branch was the start of the re-arch effort where we sketched out the original interfaces. The current intention is to PR systems with stabilized interfaces into this branch so they can go through a round of cleanup and review.
This branch has mostly been dormant as we pivoted to a
running-agent-first
method in the next branch. There are now several stabilized systems that can be brought in.Hello World Branch
This branch was spun off to take a
running-agent-first
methodology to the interface development. That is, rather than measuring our progress on the re-arch by which systems we've buttoned up and have PR'ed, we measure progress by how far the agent can run through its logic. This lets us battle-test the interfaces we sketched out initially. Ideally, once the interfaces and implementations stabilize, we can PR them to the base feature branch. @collijk has been using this as his working branch and pushing directly to it. We'll likely need a revised workflow.Run instructions for the hello world branch can be found in the PR: #3969
The Agent Subsystems
Configuration
We want a lot of things from a configuration system. We lean heavily on it in the
master
branch to allow several parts of the system to communicate with each other. Recent work has made it so that the config is no longer a singleton object that is materialized from the import state, but it's still treated as a god object containing all information about the system and critically allowing any system to reference configuration information about other parts of the system.What we want
Agent
.System Status
Configurable
mixin for system components so we can walk the system to collate system configurationWorkspace
There are two ways to think about the workspace:
In the existing system there is one workspace. And because the workspace holds so much agent state, that means a user can only work with one agent at a time.
System Status
Memory
The memory system has been under extremely active development. See #3536 and #4208 for discussion and work in the
master
branch. The TL;DR is that we noticed a couple of months ago that theAgent
performed worse with permanent memory than without it. Since then the knowledge storage and retrieval system has been redesigned and partially implemented in themaster
branch.System Status
Planning/Prompt-Engineering
The planning system is the system that translates user desires/agent intentions into language model prompts. In the course of development, it has become pretty clear that
Planning
is the wrong name for this systemWhat we want
Planning Strategies
The new agent workflow has many, many interaction points for language models. We really would like to not distribute prompt templates and raw strings all through
the system. The re-arch solution is to encapsulate language model interactions into planning strategies. These strategies are defined by
LanguageModelClassification
they use (FAST
orSMART
)build_prompt
that takes strategy specific arguments and constructs aLanguageModelPrompt
(a simple container for lists of messages and functions to pass to the language model)parse_content
that parses the response content (a dict) into a better formatted dict. Contracts here are intentionally loose and will tighten once we have at least one other language model provider.System Status
Planner
system to take in args, build prompts, interact with a language model, and get responses.PromptStrategy
abstraction to encapsulate a parameterizeable interaction with a language model.PromptStrategy
instances to the user so they can do prompt tuning without touching code.Resources
Resources are kinds of services we consume from external APIs. They may have associated credentials and costs we need to manage. Management of those credentials is implemented as manipulation of the resource configuration. We have two categories of resources currently
What we want
System Status
Abilities
Along with planning and memory usage, abilities are one of the major augmentations of augmented language models. They allow us to expand the scope of what language models can do by hooking them up to code they can execute to obtain new knowledge or influence the world.
What we want
System Status
Plugins
Users want to add lots of features that we don't want to support as first-party. Or solution to this is a plugin system to allow users to plug in their functionality or to construct their agent from a public plugin marketplace. Our primary concern in the re-arch is to build a stateless plugin service interface and a simple implementation that can load plugins from installed packages or from zip files. Future efforts will expand this system to allow plugins to load from a marketplace or some other kind of service.
What is a Plugin
Plugins are a kind of garbage term. They refer to a number of things.
Usage in the existing system
The current plugin system is hook-based. This means plugins don't correspond to kinds of objects in the system, but rather to times in the system at which we defer execution to them. The main advantage of this setup is that user code can hijack pretty much any behavior of the agent by injecting code that supercedes the normal agent execution. The disadvantages to this approach are numerous:
What we want
System status
User Interfaces
There are two client applications for Auto-GPT included. Applications have responsibility for all user interaction (anything that shows up on the user's display that isn't actual system logs).
The CLI app
🌟 This is the reference application I'm working with for now 🌟
This application is essentially implemented all the way through the run loop but is missing some logic to handle things aside from user confirmation of next actions. It makes no effort to display nice output to the user at this point. It directly invokes methods on the Agent as it's primary form of interaction with the codebase.
Status
The CLI web-app
The second app is still a CLI, but it sets up a local webserver that the client application talks to rather than invoking calls to the Agent library code directly. This application is essentially a sketch at this point as the folks who were driving it have had less time (and likely not enough clarity) to proceed.
The Agent Run Loop
Status
(Checklist for the planned agent workflow)
ready_criteria
to see if we can make progress (we assume we can make progress in the meantime)Handle user allowing multiple future actions.This is an application concern, I thinkMajor Roadblocks
The core of the agent loop is under active development. Particularly with respect to memory storage, knowledge summarization, and memory retrieval. This is a complex research area and should be out of scope for the re-arch. However the existing agent loop is not useful and the updated memory abstractions are implemented but not yet used (so their interfaces in the
master
branch are not stable). @Pwuts and I put together a proposed new agent workflow to use the abstractions he's built, and I've begun implementing that workflow in the hello world branch as he's been occupied with other things. Ideally we would have a reference implementation in themaster
branch to guide us.How can you help get the re-arch over the finish line
Good things to work on
commands
in the master branch into their newAbility
interfaces (one command per PR please!)embedding
subpackage and lift out references to the objects referenced from it. Embeddings will be managed by the memory system and created with an already extantEmbeddingModelProvider
, so we don't need the extra abstraction layer.Things that definitely need work but have a plan for already (or need more things to be finished first)
master
branch.Things that will be rejected and make me mildly annoyed
The text was updated successfully, but these errors were encountered: