-
-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Function calling with stream vs without stream, arguments=None when stream option is enabled #9693
Comments
@K-Mistele can you take a look into this? |
I’ve been debugging the issue on my own and think I've identified the solution. After testing the API, I noticed that it currently generates tool_calls where the function name and arguments are in separate yield statements, which is causing issues. Here’s an example of the current output: Current Output:
In this example, the function name is yielded separately from its arguments. However, for functionality like chatbot integration and API calls—where multiple frameworks expect the tool_call to be complete in a single field—it would be more efficient if both the name and arguments were generated in the same yield statement. Expected Behavior: The API should generate tool_calls with the function name and arguments combined, so the function can be utilized directly without additional processing. Here’s an example of the ideal output:
|
hi @ankush13r! You are correct in that the function name and function arguments are handled in separate here's an example request you can make with postman or something similar to illustrate what the streamed Server-sent events will look like according to OpenAI's standard: {
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": "Can you tell me the weather in dallas in fahrenheit?"
}
],
"stream": true,
"temperature": 0.7,
"tools": [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "The city to find the weather for, e.g. 'San Francisco'"
},
"state": {
"type": "string",
"description": "the two-letter abbreviation for the state that the city is in, e.g. 'CA' which would mean 'California'"
},
"unit": {
"type": "string",
"description": "The unit to fetch the temperature in",
"enum": [
"celsius",
"fahrenheit"
]
}
}
}
}
}
]
} Here is what this request generates from OpenAI using streaming: Long list of Server-sent events from OpenAI
There are a couple important things to observe here:
This is the OpenAI standard for server-sent events for tool streaming, and this is the standard that vLLM follows. A function's name is always streamed before argument deltas arrive, and argument deltas will never be streamed in the same event as the function's name. Multiple argument deltas will be received that must be concatenated; the entire arguments stream (should) never be received all at once. When you're receiving deltas from vLLM, are these (below) the only deltas that you are receiving before the stream ends, or are you receiving additional deltas with arguments diffs like shown above? ChoiceDelta(content='', function_call=None, refusal=None, role='assistant', tool_calls=None)
ChoiceDeltaToolCall(index=0, id='chatcmpl-tool-ac7886c6cea04451b439d4e24b21ab7a', function=ChoiceDeltaToolCallFunction(arguments=None, name='sum'), type='function')
ChoiceDelta(content='', function_call=None, refusal=None, role=None, tool_calls=None) If these are the only deltas you receive, that probably indicates a bug, since you should receive argument deltas as well. If you do receive additional deltas, you just need to handle concatenating and parsing them as described above & in the docs example that I linked to. Can you please share your entire vLLM start command and the entire request and all received deltas so that I can help you debug it? You should be able to see an example of how this works, including delta processing for arguments, in this example from the vLLM docs. I actually created this demo with hermes, so it should work for testing your purposes. |
Now I see that the arguments are being yielded separately. However, I found a bug in the Hermes parser during debugging, which causes it to return a response without arguments. Below is an example of the output received:
Debug Findings:
Proposed Solution: The solution that mitigates this bug is to add a check to verify that delta_text exists within cur_arguments_json before attempting to find its index and check if current_tool_call is not None. Here’s the current and modified code: function_name: Union[str, None] = current_tool_call.get("name")
cur_arguments = current_tool_call.get("arguments")
# get the location where previous args differ from current
args_delta_start_loc = cur_arguments_json.index(delta_text) \
+ len(delta_text)
arguments_delta = cur_arguments_json[:args_delta_start_loc] https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/tool_parsers/hermes_tool_parser.py#L227C51-L227C72 Updated Code:
This fix both bugs. However, it still produces reponses with
To prevent empty responses, the solution is to check if
Let me know if you think this should fix the bug or if the issue lies with the model's response generation. I'm open to collaborating to resolve the bug and can make pull request. |
Can you please share the request you're using (messages, tools, vLLM config) so that I can try to reproduce the issue? It's not impossible that there's a bug in the Hermes tool parser, but it has been used and tested pretty robustly so I'm curious what's different about this and I'd like to be able to step through the streaming parsing. |
I'm sending you the configuration here. The model I'm using is our own, and we can't publish it yet since it's still in testing. However, I tried to reproduce the bug with a Hermes model. Vllm config (I’m running with Singularity, but I believe Docker or running directly would have the same effect):
Python OpenAi client:
The error occurs if the generated text by the model follows this format, where arguments appear first and the name is at the end:
Thanks |
Okay, I can see a couple places that there would be a problem.
{"properties": {"name": {"title": "Name", "type": "string"}, "arguments": {"title": "Arguments", "type": "object"}} which is {"name": <function-name>, "arguments": <args-dict>} You can see that Qwen adopted this format as well in their Therefore, Alternatively, if you positively need the Hopefully that helps! |
Thank you; |
Hi @K-Mistele, I updated my model’s chat_template, and it has resolved the previous error related to arguments. But, didn't solve the error of 'NoneType':
The error occurs in the following lines, where it attempts to call the A solution would be to check if |
Working on this now. I am actually not sure what the cause of this issue is, because not only does it not occur frequently for me, I can't actually reproduce the issue at all. I can understand logically how it might happen based on the references that you provided in the source, but I've never seen this happen before. I wonder if this has to do with a tool call being generated slightly differently in some circumstances (e.g. extra whitespace where none was expected) resulting in this edge case being tripped. If you could share your configuration so that I can reproduce the issue, then I'm happy to try and reproduce it on my end. Otherwise, I think the best path forward will be for me to open a PR with the patched tool parser, and then to assess if it fixes your issue, you can load the tool parser as a plugin at runtime (see tool parser plugins in the docs) instead of using the hermes one in the current version. |
Please check #9908 :) |
Your current environment
Dockerfile: vllm/vllm-openai:v0.6.3
Parameters:
--enable-auto-tool-choice --tool-call-parser hermes
Model Input Dumps
No response
🐛 Describe the bug
I'm using the VLLM library with a Docker container as a REST API, specifically the
v1/chat/completion/
endpoint with the OpenAI client.When I run chat completions without streaming, it returns
tool_calls
with the tool name and its arguments as expected. However, when I enable the streaming option, it only returns the tool name, with arguments set toNone
. I'm not sure why this is happening.I've tried searching for related issues but haven’t found anything helpful.
Have tried
stream_options={"include_usage": True}
and it gives same output.Model generate this output:
Output:
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: