-
Notifications
You must be signed in to change notification settings - Fork 44.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor/remove abstract singleton as voice base parent #4931
Refactor/remove abstract singleton as voice base parent #4931
Conversation
✅ Deploy Preview for auto-gpt-docs ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #4931 +/- ##
==========================================
+ Coverage 50.73% 51.35% +0.62%
==========================================
Files 128 119 -9
Lines 5517 4899 -618
Branches 759 642 -117
==========================================
- Hits 2799 2516 -283
+ Misses 2507 2201 -306
+ Partials 211 182 -29
☔ View full report in Codecov by Sentry. |
This pull request has conflicts with the base branch, please resolve those so we can evaluate the pull request. |
@dayofthedave would you mind testing this, to make sure the speech function still fulfils its purpose? |
I'm not quite sure what function speech is expected to fulfill. For me personally, I use it to monitor the agent in continuous mode so I don't have to keep my eyes glued to the screen, but I can still know if it gets stuck or starts doing something I don't want. Would others agree that this is the purpose of speech, or is there something else I should be testing? Speech seems to be somewhat broken on master, and I see the same behavior on this branch. I haven't worked with Auto-GPT in a while, so I'm not sure how it's expected to function now, but it seems to be locked in Google TTS mode. Perhaps I'm doing something wrong, but I can't get any other TTS engines to work anymore via the TEXT_TO_SPEECH_PROVIDER setting in .env. Auto-GPT always ignores this setting and uses Google TTS. Question: Is multiple voice support still needed? I've been submitting PRs to enhance this feature, but it no longer appears to be a feature. At one point, Auto-GPT was spinning up helper agents all the time to complete certain tasks, and I had these helper agents speaking in different voices on MacOS, StreamElements and ElevenLabs. I haven't been able to trigger this helper agent behavior for quite a while now, however, so presumably it's no longer a thing, and there's no need to continue working on multiple voice support. |
I'd say that's the most important use case, yes.
Thank you for bringing that to our attention. PR to fix it: #5005
Not strictly needed. But very nice to have? Definitely! :)
The sub-agent implementation was removed because it didn't perform very well, and there were numerous problems with it. It will soon be replaced by a much better, more capable implementation. |
Sounds great! I'm hoping to get more time in the near future to spend on
this project.
Also, on the multiple voice support, it sounds a bit silly, but it was
really cool to hear different agents speaking in different voices. Really
seemed to give them different personalities, and the interplay between them
was fun to listen to. Can't wait for this feature to return!
…On Tue, Jul 18, 2023, 9:48 AM Reinier van der Leer ***@***.***> wrote:
I'm not quite sure what function speech is expected to fulfill. For me
personally, I use it to monitor the agent in continuous mode so I don't
have to keep my eyes glued to the screen, but I can still know if it gets
stuck or starts doing something I don't want. Would others agree that this
is the purpose of speech, or is there something else I should be testing?
I'd say that's the most important use case, yes.
Speech seems to be somewhat broken on master, and I see the same behavior
on this branch. I haven't worked with Auto-GPT in a while, so I'm not sure
how it's expected to function now, but it seems to be locked in Google TTS
mode. Perhaps I'm doing something wrong, but I can't get any other TTS
engines to work anymore via the TEXT_TO_SPEECH_PROVIDER setting in .env.
Auto-GPT always ignores this setting and uses Google TTS.
Thank you for bringing that to our attention. PR to fix it: #5005
<#5005>
Question: Is multiple voice support still needed?
Not strictly needed. But very nice to have? Definitely! :)
I've been submitting PRs to enhance this feature, but it no longer appears
to be a feature. At one point, Auto-GPT was spinning up helper agents all
the time to complete certain tasks, and I had these helper agents speaking
in different voices on MacOS, StreamElements and ElevenLabs. I haven't been
able to trigger this helper agent behavior for quite a while now, however,
so presumably it's no longer a thing, and there's no need to continue
working on multiple voice support.
The sub-agent implementation was removed because it didn't perform very
well, and there were numerous problems with it. It will soon be replaced by
a much better, more capable implementation.
—
Reply to this email directly, view it on GitHub
<#4931 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAKYBEAMUA6AVO4F7RPCGYDXQ2H3NANCNFSM6AAAAAA2DXJ2KY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
…t-singleton-as-voice-base-parent
Conflicts have been resolved! 🎉 A maintainer will review the pull request shortly. |
Background
This is one of several PRs aimed at removing global state from the system by removing globally invoked singletons (see #4227 #4737 #4901 #4900).
In the existing implementation we stand up a particular instance of a VoiceBase everytime we invoke the
say_text
function (which is invoked in multiple places) with branching statements based on configuration values.This PR adds a TTS provider class that loads and holds onto the configured TTS service (thus removing a need for a singleton) and then handles the invocation logic in a single place. A new method is added to the logger class (which also serves as our de facto UI through the
typewriter_log
method) that allows the invocation of tts even when we're not doing the typewriter_log (needed to preserve backwards compatibility with current UI functionality, otherwise the system will say "speak" every time the assistant thoughts come back with something to say).Changes
say_text
command in a class that holds on to the configured providerlogger.typewriter_log
orlogger.say
(since the logger is our current UI, unfortunately).AbstractSingleton
as a base class forVoiceBase
AbstractSingleton
🎉Documentation
Test Plan
PR Quality Checklist