-
Notifications
You must be signed in to change notification settings - Fork 44.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto-GPT Recursive Self Improvement #15
Comments
Perhaps this is what we're looking for #11? Anyone want to weigh in? |
Here's what GPT4 has to say about this issue: Based on the Github issue posted, I understand that the ultimate goal is to have Auto-GPT recursively improve itself. In order to achieve this, we can start by implementing basic features such as reading its own code, evaluating limitations and areas of improvement, writing code to increase its abilities, and testing its own code. Here's a plan to implement these functionalities:
I will now provide an outline for implementing these functions in Auto-GPT's code: ### Evaluating code
def evaluate_code(code: str) -> List[str]:
# Analyze the given code and return a list of suggestions for improvements
pass
### Improving code
def improve_code(suggestions: List[str]) -> str:
# Generate new code based on the suggestions provided
pass
### Writing tests
def write_tests(focus: Optional[str] = None) -> str:
# Generate test cases for the existing code, focusing on specific areas if required
pass
### Running tests
def run_tests(test_code: str) -> str:
# Execute the test cases and return the test results
pass These functions can be integrated into the # In execute_command()
elif command_name == "evaluate_code":
return evaluate_code(arguments["code"])
elif command_name == "improve_code":
return improve_code(arguments["suggestions"])
elif command_name == "write_tests":
return write_tests(arguments.get("focus"))
elif command_name == "run_tests":
return run_tests(arguments["test_code"]) Once these functionalities are implemented, Auto-GPT will be one step closer to recursively improving itself. With further improvements, Auto-GPT could potentially browse its own code on GitHub, evaluate it, find bugs, and submit pull requests. |
Those empty functions remind me of AI functions :) By the way, I'm very excited to see many of my thoughts being implemenented here. |
Ah yes! That AI Functions guide you linked is exactly how I was thinking of implementing those, if I do it that way. |
@alreadydone I love this, thanks for the suggestion! |
I'm working on this problem on a separate experiment. Would love to chat if you're interested - at the moment, I'm working with genetic algorithms to understand which variant/mutant of the code is more performant - there are a lot local maximums depending on how you set it up. |
This is a really cool idea! Do you think you could make the AIs logs public as it self improves? Either in the repo or elsewhere. I would be very interested in seeing how it plans and evolves. |
So this is it huh... The singularity begins in a GitHub thread |
How about trying to drive the self improvement by utilizing test driven development (TDD). In a recent paper they showcased how greatly GPT-4 can improve it's results by reflecting on its own mistakes. So the idea is to have it:
What do you think? This could also be used for any kind of code generation. |
This really should take a research driven approach. We would need a metric to base the "improvement" on. I'd focus on making a framework, and then let people in their branches use this framework for research. Proven research gets merged in. |
I wrote an improvement that speeds up the bot significantly. If you write a function called alwaysNo that just returns "n" and then you use that as input, it just exits the program super fast! No money spent! |
That would have saved me some money π. Just kidding, auto-gpt has been very helpful to understand how to compose bigger programs, compared to langchain which confused tf out of me. Thank you Torantulino and everyone who has contributed. |
"Ok, AG is really great, but I still have no idea how to: Give him access (with restrictions) to install new modules from GitHub. |
I was attempting to get it to self-implement code and it seems to have issues with the ai functions, must be because I'm using gpt 3.5. It struggles to parse the response from those types of messages when evaluating code. |
We'll need to run benchmarks in github action to validate it's not "loosing" capability at every pull request.
The challenge is engineering these tests, because they have to give us a score that we can compare with the current version. Also it might need to be ran multiple times. Because gpt is not totally deterministic. It might cost a lot of tokens too to test the behavior(caching will be helpful here) One idea to test browsing the internet is to create static content, a fake internet where the search results are deterministic. Also there are things very hard to measure, like producing art for example. And even if you can measure it, we might encounter a case where a task was performed slightly better but using significantly more tokens. It might be hard to decide whether things improved. |
i like your work Torantulino i think you should keep doing your own ideas instead of letting others decide for you cause you're a smart man i think you got this |
Loving your work. Can you imagine the next level to this? An environment that allows modular setup of any number of "task doers", "reviewers" and container types. A user could basically create their own system/org chart to solve a specific type of problem The system could even report back to the user for outside input at set intervals. |
A cool extension to this idea would be having autogpt spin up an instance of its self like every couple hours, crawl all the current prs, and build a sandboxed version of itself with each new pr merged. Then it could determine either through some combination of unit tests, benchmarking and it evaluating its own code quality, whether this pr was anything beneficial. This could unclog the massive amounts of prs being made and hopefully only let the good ideas shine through. Some problems I see though are people trying to inject malicious code, however, if adequately sandboxed this may not be an issue. |
Jordan: this might be useful
https://github.com/lesfurets/git-octopus. I think nowadays we are doing a
lot with LLMs imprecisely with tools that do it much more efficiently.
On Fri, 7 Apr 2023 at 16:43, Jordan-Mesches ***@***.***> wrote:
A cool extension to this idea would be having autogpt spin up an instance
of its self like every couple hours, crawl all the current prs, and build a
sandboxed version of itself with each new pr merged. Then it could
determine either through some combination of unit tests, benchmarking and
it evaluating its own code quality, whether this pr was anything
beneficial. This could unclog the massive amounts of prs being made and
hopefully only let the good ideas shine through. Some problems I see though
are people trying to inject malicious code, however, if adequately
sandboxed this may not be an issue.
β
Reply to this email directly, view it on GitHub
<#15 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAES6GQRKXE4YB5QO7OUJWTXAAYYPANCNFSM6AAAAAAWPMGBJA>
.
You are receiving this because you commented.Message ID:
***@***.***>
--
Kind Regards,
Marcelo Sousa
|
@marcelosousa "I think nowadays we are doing a lot with LLMs imprecisely with tools that do it much more efficiently" - you're definitely correct with that statement. However, the usage I meant was not just simply merging all prs, but having autogpt evaluate all of its current prs individually and automatically determine which ones are worth the maintainers time. And to extend that, maybe if autogpt finds a pr promising, but still lacking in some ways, it could comment on a pr with a suggested list of changes. The whole point of this being to alleviate pressure on the maintainers. |
I would love to talk with maintainers. We have been creating Reviewpad to
help with that. You can specify multiple PR automations and policies and on
top of that you can use GPT-4 to summarize and very soon automatically
approve PRs.
On Fri, 7 Apr 2023 at 17:41, Jordan-Mesches ***@***.***> wrote:
@marcelosousa <https://github.com/marcelosousa> "I think nowadays we are
doing a lot with LLMs imprecisely with tools that do it much more
efficiently" - you're definitely correct with that statement. However, the
usage I meant was not just simply merging all prs, but having autogpt
evaluate all of its current prs individually and automatically determine
which ones are worth the maintainers time. And to extend that, maybe if
autogpt finds a pr promising, but still lacking in some ways, it could
comment on a pr with a suggested list of changes. The whole point of this
being to alleviate pressure on the maintainers.
β
Reply to this email directly, view it on GitHub
<#15 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAES6GTDQC3N5SQR2BWRRN3XAA7TJANCNFSM6AAAAAAWPMGBJA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Kind Regards,
Marcelo Sousa
|
you now my aunt owns her own business and im her designer but this is way more easy then working for my aunt |
Fam be like that. |
Very good stuff y'all! I'm excited to implement in my personal version tomorrow. I'll add my take and ideas in the next couple of days as I have time or the pace necessitates. |
A thought on how to implement this would be building a python class or set of functions that can open PRs. Using pygithub the script could pull all prs and iterates though them. A place to start could be a set of tests that must pass else the bot reviews and comments on the pr and closes it. If the pr is only a comment or test, the bot can try to build a python script to satisfy the given functionality. The functionality of having tests pass would help avoid the ai spinning cycles on poor code. I think having the bot work off of a directive submission process, something like "create a python script that calls llama and returns the result in Json" would really kick off the process. Plus crowd source ideas. A 'suggested ideas' section of only text or markdown may be an option. Or we could utilize GitHub issues, pull all issues, look for a specific format. CREATE: some idea. The script would need a key to submit prs, anyone can do that. But to read, review and merge prs we would need a key from someone with write access. It could merge to a dev branch so it doesn't break stuff too bad. |
#15 (comment) Maybe I should read more before answering, but it does seem to be part of fine-tuning of the AI model ( the matrices / neural network coeff.). Once we have such model, is it still viable? Could we look at auto-gpt as a neural network with coeff. to adjust? It does seem that the model that handles the operations in the meta level should learn as well. |
I suggest doing this by improving its agility to make lasting progress aka proceed in a project that vastly outlasts its context memory by clever planning or other systems maybe selfrpromting at a later time or event, or super detailed multilayed planing that isolates tasks, if you solve that you also solve a big part of self improvement, which needsd all that stuff anyway most likely. Except if you go finetuning on AutoGPT data + deepRL automated based on metrics (meaning the reward function will b e non perfect and errorprone but automated massive in numbers and detail and overall way more accurate than a random guess making it possible. But i think the planing thing might be easyer and it will be very fun when AutoGPT can finish projects. Right now i never got it to finish but it can solve simple tasks according to some videos. But rare that i seen that. #1351 |
As per my understanding, AutoGPT is very buggy just now, else what's there logically to prevent Auto-GPT from self-improvement? |
for as long as AutoGPT depends on a remote, closed-source LLM, with "guardrails" and what not, it can't really improve anything. we're all just wasting our time enthusiastically. but it's part of the game. a mediocre one but if that's all they could come up with, so be it. |
100% @katmai |
@sohrabsaran I 100% agree that we should have in-depth design documentation regarding Auto-GPT's inner workings. I am committed to writing such documentation sometime in the near future. Especially the prompting needs a number of Jupyter notebooks to properly explain them and to aid in further development by allowing easy experimentation. In the last couple of days, I have rewritten the contribution guide, so writing design documentation would be a logical next step. At the moment though, I'm also iterating the design to significantly improve performance, so it will be more efficient to write about it when we reach a more stable & performant point.
@katmai you have a point, although it isn't entirely true. Good prompting and system design can make a huge difference in leveraging the capabilities of the currently available "off-the-shelf" LLMs. Fine-tuning could enhance performance even further without the need for architectural changes. That being said, as long as we rely on off-the-shelf LLMs, we'll only be able to push the balance of efficiency, performance and cost so far. But I for one am committed to see how far we can take it. :) |
By the way, regarding the principle of self-improvement: there are examples of this (sort of) in practice, and not all of them are positive, so this issue should be approached with care. Afaik the LangChain team use agents or TaskGPTs to develop the project, and much of their codebase is unintelligible for normal humans as mentioned by @taosx. |
Hi guys, I am also working towards this goal, with the ability to review previous logs and share them for learning and improving itself. |
Hello! Great thread & subject. Although I am a total newbie to [thinking about GPT architecture &] the Auto-GPT arena, several ideas occurred to me.
I hope some of that makes sense, at least to GPT-5. ;-) Thanks & best of luck etc. ~ M |
This issue has automatically been marked as stale because it has not had any activity in the last 50 days. You can unstale it by commenting or removing the label. Otherwise, this issue will be closed in 10 days. |
This issue has automatically been marked as stale because it has not had any activity in the last 50 days. You can unstale it by commenting or removing the label. Otherwise, this issue will be closed in 10 days. |
Over a year to become stale.. |
time flies! |
someone even raised 21M to do something similar... https://devin.ai/ lol |
i feel like this deserves a re-visit, especially after the so-kind re-opening. First of all, i have to ask: what are you all smoking? But at the topic at hand, if you wanted the right solution, you'd have to replace "improve" with "improvise"
. Before you get offended, i'd like to think about if you had a few teacups and unlimited lifetime.
|
This issue has automatically been marked as stale because it has not had any activity in the last 50 days. You can unstale it by commenting or removing the label. Otherwise, this issue will be closed in 10 days. |
This issue was closed automatically because it has been stale for 10 days with no activity. |
Idea π‘
The ULTIMATE achievement for this project would be if Auto-GPT was able to recursively improve itself. That, after-all, is how AGI is predicted by many to come about.
Suggestion π©βπ»
Auto-GPT should be able to:
Further down the line: π
Where to start? π€
I have previously had success with this system prompt in playground:
Prompt
You are AGI_Builder_GPT. Your goal is to write code and make actionable suggestions in order to improve an AI called "Auto-GPT", so as to broaden the range of tasks it's capable of carrying out.
The text was updated successfully, but these errors were encountered: