-
-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Streaming output error of tool calling has still not been resolved. #10589
Comments
tool-parser-plugin
and registered the parser as hermes_patched
, but still have the same problem.
I got the same problem, ERROR 11-23 01:34:01 hermes_tool_parser.py:338] Error trying to handle streaming tool call. besides, when i print the tool_calls function.arguments in strem mode like: for chunk in create:
try:
print(chunk.choices[0].delta.tool_calls[0].function.arguments,end="")
except Exception as e:
pass
the output like : None{"args":"entity": ""\u5e7f\u4e1c\u641c\u4e00\u641c\u79d1\u6280\u6709\u9650\u516c\u53f8, "func": ""get_company_funding{"args": {"entity": "\u5e7f\u4e1c\u641c\u4e00\u641c\u79d1\u6280\u6709\u9650\u516c\u53f8"}, "func": "get_company_funding"} It is very unfriendly for me to parse json |
cc @K-Mistele |
Supplementary note. Before using a new parser, LLM displayed the tool calling arguments in the content of response instead of
When I modify the code using the above method, LLM directly response the error. Hope this problem can be solved as soon as possible, thank you! |
During the startup of the api server the setup function is called multiple times (every 5s). So the longer the longer the startup time (generally for larger models) the more consumers are contending for the output. This can then lead to race condition where the order of the answer token is wrong. Introduce here: vllm-project#9973 References: vllm-project#10376 vllm-project#10589 vllm-project#10782 Signed-off-by: Jannis Schönleber <[email protected]>
During the startup of the api server the setup function is called multiple times (every 5s). So the longer the longer the startup time (generally for larger models) the more consumers are contending for the output. This can then lead to race condition where the order of the answer token is wrong. Introduce here: vllm-project#9973 References: vllm-project#10376 vllm-project#10589 vllm-project#10782 Signed-off-by: Jannis Schönleber <[email protected]>
During the startup of the api server the setup function is called multiple times (every 5s). So the longer the longer the startup time (generally for larger models) the more consumers are contending for the output. This can then lead to race condition where the order of the answer token is wrong. Introduce here: vllm-project#9973 References: vllm-project#10376 vllm-project#10589 vllm-project#10782 Signed-off-by: Jannis Schönleber <[email protected]>
During the startup of the api server the setup function is called multiple times (every 5s). So the longer the longer the startup time (generally for larger models) the more consumers are contending for the output. This can then lead to race condition where the order of the answer token is wrong. Introduce here: vllm-project#9973 References: vllm-project#10376 vllm-project#10589 vllm-project#10782 Signed-off-by: Jannis Schönleber <[email protected]>
@cedonley @joennlae I have read [Bugfix] Multiple fixes to tool streaming when using auto tool choice. and [Bugfix] fix race condition that leads to wrong order of token returned and installed the latest version of vllm using the instructions from here:
The version is But vllm server still has the same error:
Did I do something wrong? Or is this bug not resolved yet? |
The PR is not yet merged, but based on the error in the logs, I believe it may be resolved once the PR is merged, as the error raised is one of several that I resolved in my commits. |
Thanks for your reply, I will wait for the PR to be merged! |
This:
Doesn't look like the model is trying to generate a valid tool call; the structure is off. Possibly a chat template issue, or a precision loss with the AWQ Quant? Can you share your chat template and the tools you're passing to the model? Hard to debug without these. |
Hi @K-Mistele They are not printing the full tool call, only the arguments. The issues are (outlined in my PR mentioned earlier) is:
|
Looking forward to this being merged, also blocked by this. Thanks @cedonley ! |
fyi it is likely that the version of vLLM that you pulled from the nightly build did not contain these fixes, as both the linked PRs are still open and have not been merged into |
I think it makes the most sense to either test with those, or wait for them to be merged and then test with the nightly build, before trying to debug other issues here since otherwise we could be either (a) debugging an issue that has already been solved, or (b) trying to debug a compound issue which would be quite difficult |
@cedonley @joennlae I used the branch you provided to build vllm, but still got the same error. Error messages:
In addition, vllm may still send back a response after an error occurs. I used the same code to retrieve Retrieval results in
Retrieval results in branch:
Hope you can solve this problem, thank you! |
Here is another related error. I originally only added the tool message ( In order to let LLM know the history of tool calls, I added an assistant message containing [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "幫我把音檔轉成逐字稿。檔名: http://example.com/audio.mp3"
},
{
"role": "assistant",
"tool_calls": [
{
"id": "chatcmpl-tool-cb9685ce207a4bcfa461eafea3d6e801",
"function": {
"arguments": "{\"AudioPath\": \"http://example.com/audio.mp3\", \"language\": \"zh\", \"function_one_name_called\": true}",
"name": "voice_to_text"
},
"type": "function"
}
]
},
{
"role": "tool",
"tool_call_id": "5097b499-6da1-4343-ae3c-4134b660e065",
"name": "voice_to_text",
"content": "Testing"
}
] It works when
Version: |
I've written a short script that tries to replicate your tool call. Note that you need to add a final "assistant" response to your example messages that shows the assistant's typical reply to the tool information. I start my server with the following arguments:
The script outputs both the stream=False followed by a separate stream=True call for the same request:
As you see above, I'm not seeing any issues when using with my PR. If you don't get this output from the script, try to run the script when you start vLLM with debug logs enabled and provide the relevant debug lines.
|
During the startup of the api server the setup function is called multiple times (every 5s). So the longer the longer the startup time (generally for larger models) the more consumers are contending for the output. This can then lead to race condition where the order of the answer token is wrong. Introduce here: vllm-project#9973 References: vllm-project#10376 vllm-project#10589 vllm-project#10782 Signed-off-by: Jannis Schönleber <[email protected]>
I used the hermes_tool_parser.py as
tool-parser-plugin
and registered the parser ashermes_patched
, but still have the same problem.Already referred to #9874 #10395 #10398
Here is how I start vllm service with the latest package:
I also tried using Docker image
v0.6.3.post1
v0.6.4
v0.6.4.post1
Originally posted by @Sala8888 in #10398 (comment)
The text was updated successfully, but these errors were encountered: