Re-arch #4770

collijk · 2023-06-22T00:12:17Z

Overview

Key Documents

Planned Agent Workflow
Original Architecture Diagram - This is sadly well out of date at this point.
Kanban

The Motivation

The master branch of Auto-GPT is an organically grown amalgamation of many thoughts and ideas about agent-driven autonomous systems. It lacks clear abstraction boundaries, has issues of global state and poorly encapsulated state, and is generally just hard to make effective changes to. Mainly it's just a system that's hard to make changes to. And research in the field is moving fast, so we want to be able to try new ideas quickly.

Initial Planning

A large group of maintainers and contributors met do discuss the architectural challenges associated with the existing codebase. Many much-desired features (building new user interfaces, enabling project-specific agents, enabling multi-agent systems) are bottlenecked by the global state in the system. We discussed the tradeoffs between an incremental system transition and a big breaking version change and decided to go for the breaking version change. We justified this by saying:

We can maintain, in essence, the same user experience as now even with a radical restructuring of the codebase
Our developer audience is struggling to use the existing codebase to build applications and libraries of their own, so this breaking change will largely be welcome.

Primary Goals

Separate the AutoGPT application code from the library code.
Remove global state from the system
Allow for multiple agents per user (with facilities for running simultaneously)
Create a serializable representation of an Agent
Encapsulate the core systems in abstractions with clear boundaries.

Secondary goals

Use existing tools to ditch any unneccesary cruft in the codebase (document loading, json parsing, anything easier to replace than to port).
Bring in the core agent loop updates being developed simultaneously by @Pwuts

The Branches

Base Feature Branch

This branch was the start of the re-arch effort where we sketched out the original interfaces. The current intention is to PR systems with stabilized interfaces into this branch so they can go through a round of cleanup and review.

This branch has mostly been dormant as we pivoted to a running-agent-first method in the next branch. There are now several stabilized systems that can be brought in.

Hello World Branch

This branch was spun off to take a running-agent-first methodology to the interface development. That is, rather than measuring our progress on the re-arch by which systems we've buttoned up and have PR'ed, we measure progress by how far the agent can run through its logic. This lets us battle-test the interfaces we sketched out initially. Ideally, once the interfaces and implementations stabilize, we can PR them to the base feature branch. @collijk has been using this as his working branch and pushing directly to it. We'll likely need a revised workflow.

Update the hello world branch with instructions for running the cli app and the cli web app and for generating user settings.

Run instructions for the hello world branch can be found in the PR: #3969

The Agent Subsystems

Configuration

We want a lot of things from a configuration system. We lean heavily on it in the master branch to allow several parts of the system to communicate with each other. Recent work has made it so that the config is no longer a singleton object that is materialized from the import state, but it's still treated as a god object containing all information about the system and critically allowing any system to reference configuration information about other parts of the system.

What we want

It should still be reasonable to collate the entire system configuration in a sensible way.
The configuration should be validatable and validated.
The system configuration should be a serializable representation of an Agent.
The configuration system should provide a clear (albeit very low-level) contract about user-configurable aspects of the system.
The configuration should reasonably manage default values and user-provided overrides.
The configuration system needs to handle credentials in a reasonable way.
The configuration should be the representation of some amount of system state, like api budgets and resource usage. These aspects are recorded in the configuration and updated by the system itself.
Agent systems should have encapsulated views of the configuration. E.g. the memory system should know about memory configuration but nothing about command configuration.

System Status

Design a distributed, hierarchical configuration management system using Pydantic
Build an abstraction to distinguish user-configurable fields
Implement a Configurable mixin for system components so we can walk the system to collate system configuration
Build a serializable representation of user configuration
Build a command line entry point to generate the user configuration from a base set of system default settings as a human-readable yaml format.
Build a mechanism to override system defaults with user settings on agent creation
Build a machine-readable serialized representation of the configuration that is dumped to the agent workspace on agent creation and updated when agent state changes.
Clean up docs, test, and port to the base feature branch.

Workspace

There are two ways to think about the workspace:

The workspace is a scratch space for an agent where it can store files, write code, and do pretty much whatever else it likes.
The workspace is, at any given point in time, the single source of truth for what an agent is. It contains the serializable state (the configuration) as well as all other working state (stored files, databases, memories, custom code).

In the existing system there is one workspace. And because the workspace holds so much agent state, that means a user can only work with one agent at a time.

System Status

Port the existing workspace implementation
Add workspace configuration and update the usage so that we allow multiple agents/multiple simultaneous projects
Create a new workspace in a standard place when we make new agents, set up output directories, serialize configuration
Allow agents to be reconstructed from workspaces
Clean up docs, test, and port to the base feature branch.

Memory

The memory system has been under extremely active development. See #3536 and #4208 for discussion and work in the master branch. The TL;DR is that we noticed a couple of months ago that the Agent performed worse with permanent memory than without it. Since then the knowledge storage and retrieval system has been redesigned and partially implemented in the master branch.

System Status

Design interfaces based on work done by @Pwuts and integration of permanent memory into the core agent loop.
Port/implement text splitting (perhaps grab text splitters from Langchain so we can avoid maintaining some pretty complicated machinery)
Port/implement text summarization from the master branch
Design and implement knowledge storage
Design and implement knowledge retrieval

Planning/Prompt-Engineering

The planning system is the system that translates user desires/agent intentions into language model prompts. In the course of development, it has become pretty clear that Planning is the wrong name for this system

What we want

It should be incredibly obvious what's being passed to a language model, when it's being passed, and what the language model response is. The landscape of language model research is developing very rapidly, so building complex abstractions between users/contributors and the language model interactions is going to make it very difficult for us to nimbly respond to new research developments.
Prompt-engineering should ideally be exposed in a parameterizeable way to users.
We should, where possible, leverage OpenAI's new function calling api to get outputs in a standard machine-readable format and avoid the deep pit of parsing json (and fixing unparsable json).

Planning Strategies

The new agent workflow has many, many interaction points for language models. We really would like to not distribute prompt templates and raw strings all through
the system. The re-arch solution is to encapsulate language model interactions into planning strategies. These strategies are defined by

The LanguageModelClassification they use (FAST or SMART)
A function build_prompt that takes strategy specific arguments and constructs a LanguageModelPrompt (a simple container for lists of messages and functions to pass to the language model)
A function parse_content that parses the response content (a dict) into a better formatted dict. Contracts here are intentionally loose and will tighten once we have at least one other language model provider.

System Status

Design core Planner system to take in args, build prompts, interact with a language model, and get responses.
Build the base implementation
Develop PromptStrategy abstraction to encapsulate a parameterizeable interaction with a language model.
Expose the configuration (the actual prompts) of the PromptStrategy instances to the user so they can do prompt tuning without touching code.
Implement all the prompt strategies we need

Resources

Resources are kinds of services we consume from external APIs. They may have associated credentials and costs we need to manage. Management of those credentials is implemented as manipulation of the resource configuration. We have two categories of resources currently

AI/ML model providers (including language model providers and embedding model providers, ie OpenAI)
Memory providers (e.g. Pinecone, Weaviate, ChromaDB, etc.)

What we want

Resource abstractions should provide a common interface to different service providers for a particular kind of service.
Resource abstractions should manipulate the configuration to manage their credentials and budget/accounting.
Resource abstractions should be composable over an API (e.g. I should be able to make an OpenAI provider that is both a LanguageModelProvider and an EmbeddingModelProvider and use it wherever I need those services).

System Status

Abilities

Along with planning and memory usage, abilities are one of the major augmentations of augmented language models. They allow us to expand the scope of what language models can do by hooking them up to code they can execute to obtain new knowledge or influence the world.

What we want

Abilities should have an extremely clear interface that users can write to.
Abilities should have an extremely clear interface that a language model can understand
Abilities should be declarative about their dependencies so the system can inject them
Abilities should be executable (where sensible) in an async run loop.
Abilities should be not have side effects unless those side effects are clear in their representation to an agent (e.g. the BrowseWeb ability shouldn't write a file, but the WriteFile ability can).

System Status

Plugins

Users want to add lots of features that we don't want to support as first-party. Or solution to this is a plugin system to allow users to plug in their functionality or to construct their agent from a public plugin marketplace. Our primary concern in the re-arch is to build a stateless plugin service interface and a simple implementation that can load plugins from installed packages or from zip files. Future efforts will expand this system to allow plugins to load from a marketplace or some other kind of service.

What is a Plugin

Plugins are a kind of garbage term. They refer to a number of things.

New commands for the agent to execute. This is the most common usage.
Replacements for entire subsystems like memory or language model providers
Application plugins that do things like send emails or communicate via whatsapp
The repositories contributors create that may themselves have multiple plugins in them.

Usage in the existing system

The current plugin system is hook-based. This means plugins don't correspond to kinds of objects in the system, but rather to times in the system at which we defer execution to them. The main advantage of this setup is that user code can hijack pretty much any behavior of the agent by injecting code that supercedes the normal agent execution. The disadvantages to this approach are numerous:

We have absolutely no mechanisms to enforce any security measures because the threat surface is everything.
We cannot reason about agent behavior in a cohesive way because control flow can be ceded to user code at pretty much any point and arbitrarily change or break the agent behavior
The interface for designing a plugin is kind of terrible and difficult to standardize
The hook based implementation means we couple ourselves to a particular flow of control (or otherwise risk breaking plugin behavior). E.g. many of the hook targets in the old workflow are not present or mean something entirely different in the new workflow.
Etc.

What we want

A concrete definition of a plugin that is narrow enough in scope that we can define it well and reason about how it will work in the system.
A set of abstractions that let us define a plugin by its storage format and location
A service interface that knows how to parse the plugin abstractions and turn them into concrete classes and objects.

System status

Build a plugin service interface and implementation
Define a plugin and plugin location with concrete schema
Implement loading plugins from import paths
Implement loading plugins from files
Implement loading plugins from other sources (maybe)

User Interfaces

There are two client applications for Auto-GPT included. Applications have responsibility for all user interaction (anything that shows up on the user's display that isn't actual system logs).

The CLI app

🌟 This is the reference application I'm working with for now 🌟

This application is essentially implemented all the way through the run loop but is missing some logic to handle things aside from user confirmation of next actions. It makes no effort to display nice output to the user at this point. It directly invokes methods on the Agent as it's primary form of interaction with the codebase.

Status

Create an interface to generate default user settings and set up a directory for agent workspaces.
Get input from a user on a task to complete and invoke the agent to turn it into a (name, role, goals).
Bootstrap an agent
Generate the agent initial plan
Enter the main agent run loop.
Shutdown and restart an agent
Make the UI pleasant

The CLI web-app

The second app is still a CLI, but it sets up a local webserver that the client application talks to rather than invoking calls to the Agent library code directly. This application is essentially a sketch at this point as the folks who were driving it have had less time (and likely not enough clarity) to proceed.

Create an interface to generate default user settings and set up a directory for agent workspaces.
Design an API and contracts
Get input from a user on a task to complete and invoke the agent to turn it into a (name, role, goals).
Bootstrap an agent
Generate the agent initial plan
Enter the main agent run loop.
Shutdown and restart an agent
Make the UI pleasant

The Agent Run Loop

Status

(Checklist for the planned agent workflow)

Major Roadblocks

The core of the agent loop is under active development. Particularly with respect to memory storage, knowledge summarization, and memory retrieval. This is a complex research area and should be out of scope for the re-arch. However the existing agent loop is not useful and the updated memory abstractions are implemented but not yet used (so their interfaces in the master branch are not stable). @Pwuts and I put together a proposed new agent workflow to use the abstractions he's built, and I've begun implementing that workflow in the hello world branch as he's been occupied with other things. Ideally we would have a reference implementation in the master branch to guide us.

How can you help get the re-arch over the finish line

Good things to work on

Things that definitely need work but have a plan for already (or need more things to be finished first)

🟡 Changes to the overall library interface (ie what classes are exposed from which modules and how are they imported). The current library interface is not final and definitely has issues and inconsistencies, but moving things creates big merge conflicts. We will decide the final library interface in a discussion together and change it all at once before the merge.
🟡 Adding in a message broker for agent communication. A message broker is a good idea, but adds a lot of complexity and a lot of scope. Using the CLI web app to think through a client-application communication protocol is a good first step to building a generic user-agent (and later agent-agent) communication protocol.
🟡 Changes on hello world to the core agent loop. If you're interested in hammering this out, please do so in the master branch.

Things that will be rejected and make me mildly annoyed

❌ Increases to scope beyond what's outlined above. This is a massive orchestration problem. New features PR will be welcome once the re-arch lands in the master branch.
❌ Big changes to interfaces for systems that are already widely used.

The text was updated successfully, but these errors were encountered:

ntindle · 2023-06-24T05:57:50Z

Note for: #4787 and other File Abilities

Do we want to keep the operation log that logs all file operations?

collijk · 2023-06-26T06:42:14Z

Note for: #4787 and other File Abilities

Do we want to keep the operation log that logs all file operations?

Let's start with the answer being no, and talk about it together later. I need a little more context about what it's for. My understanding is that this was primarily to get the system not to try to write the same file over and over and similar issues. That's not just a file operation problem though. We want smarter planning logic generally so something like that doesn't happen (which we're working on).

Rough sketching out of a hello world using our refactored autogpt library. See the tracking issue here: #4770. # Run instructions There are two client applications for Auto-GPT included. ## CLI Application :star2: **This is the reference application I'm working with for now** :star2: The first app is a straight CLI application. I have not done anything yet to port all the friendly display stuff from the `logger.typewriter_log` logic. - [Entry Point](https://github.com/Significant-Gravitas/Auto-GPT/blob/re-arch/hello-world/autogpt/core/runner/cli_app/cli.py) - [Client Application](https://github.com/Significant-Gravitas/Auto-GPT/blob/re-arch/hello-world/autogpt/core/runner/cli_app/main.py) To run, you first need a settings file. Run ``` python REPOSITORY_ROOT/autogpt/core/runner/cli_app/cli.py make-settings ``` where `REPOSITORY_ROOT` is the root of the Auto-GPT repository on your machine. This will write a file called `default_agent_settings.yaml` with all the user-modifiable configuration keys to `~/auto-gpt/default_agent_settings.yml` and make the `auto-gpt` directory in your user directory if it doesn't exist). At a bare minimum, you'll need to set `openai.credentials.api_key` to your OpenAI API Key to run the model. You can then run Auto-GPT with ``` python REPOSITORY_ROOT/autogpt/core/runner/cli_app/cli.py make-settings ``` to launch the interaction loop. ## CLI Web App The second app is still a CLI, but it sets up a local webserver that the client application talks to rather than invoking calls to the Agent library code directly. This application is essentially a sketch at this point as the folks who were driving it have had less time (and likely not enough clarity) to proceed. - [Entry Point](https://github.com/Significant-Gravitas/Auto-GPT/blob/re-arch/hello-world/autogpt/core/runner/cli_web_app/cli.py) - [Client Application](https://github.com/Significant-Gravitas/Auto-GPT/blob/re-arch/hello-world/autogpt/core/runner/cli_web_app/client/client.py) - [Server API](https://github.com/Significant-Gravitas/Auto-GPT/blob/re-arch/hello-world/autogpt/core/runner/cli_web_app/server/api.py) To run, you still need to generate a default configuration. You can do ``` python REPOSITORY_ROOT/autogpt/core/runner/cli_web_app/cli.py make-settings ``` It invokes the same command as the bare CLI app, so follow the instructions above about setting your API key. To run, do ``` python REPOSITORY_ROOT/autogpt/core/runner/cli_web_app/cli.py client ``` This will launch a webserver and then start the client cli application to communicate with it. :warning: I am not actively developing this application. It is a very good place to get involved if you have web application design experience and are looking to get involved in the re-arch. --------- Co-authored-by: David Wurtz <[email protected]> Co-authored-by: Media <[email protected]> Co-authored-by: Richard Beales <[email protected]> Co-authored-by: Daryl Rodrigo <[email protected]> Co-authored-by: Daryl Rodrigo <[email protected]> Co-authored-by: Swifty <[email protected]> Co-authored-by: Nicholas Tindle <[email protected]> Co-authored-by: Merwane Hamadi <[email protected]>

Rough sketching out of a hello world using our refactored autogpt library. See the tracking issue here: Significant-Gravitas#4770. # Run instructions There are two client applications for Auto-GPT included. ## CLI Application :star2: **This is the reference application I'm working with for now** :star2: The first app is a straight CLI application. I have not done anything yet to port all the friendly display stuff from the `logger.typewriter_log` logic. - [Entry Point](https://github.com/Significant-Gravitas/Auto-GPT/blob/re-arch/hello-world/autogpt/core/runner/cli_app/cli.py) - [Client Application](https://github.com/Significant-Gravitas/Auto-GPT/blob/re-arch/hello-world/autogpt/core/runner/cli_app/main.py) To run, you first need a settings file. Run ``` python REPOSITORY_ROOT/autogpt/core/runner/cli_app/cli.py make-settings ``` where `REPOSITORY_ROOT` is the root of the Auto-GPT repository on your machine. This will write a file called `default_agent_settings.yaml` with all the user-modifiable configuration keys to `~/auto-gpt/default_agent_settings.yml` and make the `auto-gpt` directory in your user directory if it doesn't exist). At a bare minimum, you'll need to set `openai.credentials.api_key` to your OpenAI API Key to run the model. You can then run Auto-GPT with ``` python REPOSITORY_ROOT/autogpt/core/runner/cli_app/cli.py make-settings ``` to launch the interaction loop. ## CLI Web App The second app is still a CLI, but it sets up a local webserver that the client application talks to rather than invoking calls to the Agent library code directly. This application is essentially a sketch at this point as the folks who were driving it have had less time (and likely not enough clarity) to proceed. - [Entry Point](https://github.com/Significant-Gravitas/Auto-GPT/blob/re-arch/hello-world/autogpt/core/runner/cli_web_app/cli.py) - [Client Application](https://github.com/Significant-Gravitas/Auto-GPT/blob/re-arch/hello-world/autogpt/core/runner/cli_web_app/client/client.py) - [Server API](https://github.com/Significant-Gravitas/Auto-GPT/blob/re-arch/hello-world/autogpt/core/runner/cli_web_app/server/api.py) To run, you still need to generate a default configuration. You can do ``` python REPOSITORY_ROOT/autogpt/core/runner/cli_web_app/cli.py make-settings ``` It invokes the same command as the bare CLI app, so follow the instructions above about setting your API key. To run, do ``` python REPOSITORY_ROOT/autogpt/core/runner/cli_web_app/cli.py client ``` This will launch a webserver and then start the client cli application to communicate with it. :warning: I am not actively developing this application. It is a very good place to get involved if you have web application design experience and are looking to get involved in the re-arch. --------- Co-authored-by: David Wurtz <[email protected]> Co-authored-by: Media <[email protected]> Co-authored-by: Richard Beales <[email protected]> Co-authored-by: Daryl Rodrigo <[email protected]> Co-authored-by: Daryl Rodrigo <[email protected]> Co-authored-by: Swifty <[email protected]> Co-authored-by: Nicholas Tindle <[email protected]> Co-authored-by: Merwane Hamadi <[email protected]>

github-actions · 2023-09-06T20:48:37Z

This issue has automatically been marked as stale because it has not had any activity in the last 50 days. You can unstale it by commenting or removing the label. Otherwise, this issue will be closed in 10 days.

github-actions · 2023-09-17T01:47:04Z

This issue was closed automatically because it has been stale for 10 days with no activity.

collijk added the re-arch label Jun 22, 2023

collijk added this to AutoGPT development kanban Jun 22, 2023

collijk added this to the v1.0 milestone Jun 22, 2023

collijk moved this to 📅 Milestoned (and in progress) in AutoGPT development kanban Jun 22, 2023

Pwuts added this to AutoGPT Roadmap Jun 22, 2023

Pwuts moved this to In Progress in AutoGPT Roadmap Jun 22, 2023

Pwuts assigned collijk and Swiftyos Jun 22, 2023

collijk mentioned this issue Jun 22, 2023

Re-arch hello world #3969

Merged

ntindle mentioned this issue Jun 25, 2023

[Re-arch]: List Files Ability #4791

Closed

6 tasks

collijk mentioned this issue Jul 13, 2023

Move app code to app subpackage. #4962

Closed

6 tasks

egavrev mentioned this issue Jul 13, 2023

Workflow diagram of AutoGPT #4945

Closed

1 task

This was referenced Jul 17, 2023

After the emergence of GPT-3.5-turbo-0613, what is the subsequent development direction of the project? #4695

Closed

SimpleAgent implementation Made Simple #5024

Closed

github-actions bot added the Stale label Sep 6, 2023

Pwuts moved this from In Progress to Done in AutoGPT Roadmap Sep 9, 2023

Pwuts moved this from ⏩ In Progress to 🔄 Second Chance in AutoGPT development kanban Sep 10, 2023

Pwuts moved this from Done to In Progress in AutoGPT Roadmap Sep 13, 2023

Pwuts mentioned this issue Sep 13, 2023

State management, persistence and resumption #4105

Closed

5 tasks

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 17, 2023

github-project-automation bot moved this from In Progress to Done in AutoGPT Roadmap Sep 17, 2023

github-project-automation bot moved this from 🔄 Second Chance to ✅ Done in AutoGPT development kanban Sep 17, 2023

Pwuts added meta Meta-issue about a topic that multiple issues already exist for and removed Stale labels Sep 17, 2023

Pwuts moved this from ✅ Done to 🔄 Second Chance in AutoGPT development kanban Sep 17, 2023

Pwuts moved this from Done to In Progress in AutoGPT Roadmap Sep 17, 2023

Pwuts reopened this Sep 17, 2023

Pwuts mentioned this issue Sep 21, 2023

AutoGPT: use config and LLM provider from core #5286

Merged

9 tasks

Swiftyos closed this as completed Nov 7, 2023

github-project-automation bot moved this from 🔄 Second Chance to ✅ Done in AutoGPT development kanban Nov 7, 2023

github-project-automation bot moved this from In Progress to Done in AutoGPT Roadmap Nov 7, 2023

Pwuts assigned Pwuts and unassigned collijk Nov 15, 2023

Swiftyos unassigned Swiftyos and Pwuts Jan 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-arch #4770

Re-arch #4770

collijk commented Jun 22, 2023 •

edited

Loading

ntindle commented Jun 24, 2023 •

edited

Loading

collijk commented Jun 26, 2023

github-actions bot commented Sep 6, 2023

github-actions bot commented Sep 17, 2023

Re-arch #4770

Re-arch #4770

Comments

collijk commented Jun 22, 2023 • edited Loading

Overview

Key Documents

The Motivation

Initial Planning

Primary Goals

Secondary goals

The Branches

Base Feature Branch

Hello World Branch

The Agent Subsystems

Configuration

What we want

System Status

Workspace

System Status

Memory

System Status

Planning/Prompt-Engineering

What we want

Planning Strategies

System Status

Resources

What we want

System Status

Abilities

What we want

System Status

Plugins

What is a Plugin

Usage in the existing system

What we want

System status

User Interfaces

The CLI app

Status

The CLI web-app

The Agent Run Loop

Status

Major Roadblocks

How can you help get the re-arch over the finish line

Good things to work on

Things that definitely need work but have a plan for already (or need more things to be finished first)

Things that will be rejected and make me mildly annoyed

ntindle commented Jun 24, 2023 • edited Loading

collijk commented Jun 26, 2023

github-actions bot commented Sep 6, 2023

github-actions bot commented Sep 17, 2023

collijk commented Jun 22, 2023 •

edited

Loading

ntindle commented Jun 24, 2023 •

edited

Loading