-
Notifications
You must be signed in to change notification settings - Fork 44.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(agent): Refactor & improve create_chat_completion
#7082
Conversation
- Include all decoding errors when raising a ValueError on decode failure - Use errors returned by `return_errors` instead of an error buffer - Fix check for decode failure
…hat_completion` - Rearrange to reduce complexity, improve separation/abstraction of concerns, and allow multiple points of failure during parsing - Move conversion from `ChatMessage` to `openai.types.ChatCompletionMessageParam` to `_get_chat_completion_args` - Move token usage and cost tracking boilerplate code to `_create_chat_completion` - Move tool call conversion/parsing to `_parse_assistant_tool_calls` (new)
…t_completion` - Amend `model_providers.schema`: change type of `arguments` from `str` to `dict[str, Any]` on `AssistantFunctionCall` and `AssistantFunctionCallDict` - Implement robust and transparent parsing in `OpenAIProvider._parse_assistant_tool_calls` - Remove now unnecessary `json_loads` calls throughout codebase
✅ Deploy Preview for auto-gpt-docs canceled.
|
create_chat_completion
create_chat_completion
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #7082 +/- ##
==========================================
- Coverage 45.44% 45.24% -0.20%
==========================================
Files 139 139
Lines 6569 6595 +26
Branches 924 932 +8
==========================================
- Hits 2985 2984 -1
- Misses 3434 3461 +27
Partials 150 150
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the changes are good; moving json_loads
inside the provider makes it easier to use.
Regarding complexity: using create_chat_completion
with large parsing function and all the mixins made it difficult to understand the order of execution in the agent.
Maybe we could introduce some higher-level wrapper that would handle different llm apis? And it could accept parsing function and handle retries instead of model_provider
s.
completion_kwargs = self._get_completion_kwargs(model_name, functions, **kwargs) | ||
tool_calls_compat_mode = functions and "tools" not in completion_kwargs | ||
if "messages" in completion_kwargs: | ||
model_prompt += completion_kwargs["messages"] | ||
del completion_kwargs["messages"] | ||
openai_messages, completion_kwargs = self._get_chat_completion_args( | ||
model_prompt, model_name, functions, **kwargs | ||
) | ||
tool_calls_compat_mode = bool(functions and "tools" not in completion_kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be out of scope but can we make it possible to use create_chat_completion
without providing model_name
- can be default or chosen by config, agent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but I'd rather implement that at a higher level. I want to have an LLM call multiplexer (sort of wrapper function) through which it's possible to use all available models/providers. That way it would be easy to use multiple providers throughout the application.
Maybe we could introduce some higher-level wrapper that would handle different llm apis?
Yes :)
And it could accept parsing function and handle retries instead of
model_provider
s.
That would be a nice clean solution, but some ModelProvider
s will still need internal retry mechanisms to ensure reliable output. OpenAI's function call argument parsing is one example, and I think we'll find more once we have a working llamafile integration.
When there are multiple points at which parsing can fail, it's most (time+cost) efficient to parse with a fail-soft strategy: running as many of the parsing steps as possible before stopping and regenerating the response with feedback collected from failed parsing steps. This is only possible if the ModelProvider
itself implements the retry mechanism, because it is forbidden to return broken.
An additional potential benefit of implementing the retry mechanism in the ModelProvider
is that we can tailor the feedback prompt to the LLM that is being used.
What we could implement at a higher level is more high-level parsing functionality. For example, pass in a prompt and a Pydantic model, and it ensures that the output is converted into that Pydantic model.
json_loads
OpenAIProvider.create_chat_completion
create_chat_completion