Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Streaming output error of tool calling has still not been resolved. #10589

Closed
Sala8888 opened this issue Nov 23, 2024 · 14 comments · Fixed by #10979
Closed

[Bug] Streaming output error of tool calling has still not been resolved. #10589

Sala8888 opened this issue Nov 23, 2024 · 14 comments · Fixed by #10979

Comments

@Sala8888
Copy link

Sala8888 commented Nov 23, 2024

I used the hermes_tool_parser.py as tool-parser-plugin and registered the parser as hermes_patched, but still have the same problem.

Already referred to #9874 #10395 #10398

Traceback (most recent call last):
  File "/app/hermes_tool_parser.py", line 228, in extract_tool_calls_streaming
    function_name: Union[str, None] = current_tool_call.get("name")
                                      ^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'get'
Error trying to handle streaming tool call.
Traceback (most recent call last):
  File "/app/hermes_tool_parser.py", line 292, in extract_tool_calls_streaming
    args_delta_start_loc = cur_arguments_json.index(delta_text) \
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: substring not found

Here is how I start vllm service with the latest package:

python3 -m vllm.entrypoints.openai.api_server \
--model /app/Qwen2.5-72B-Instruct-AWQ \
--port 7415 \
--tensor-parallel-size 2 \
--gpu-memory-utilization 0.95 \
--max-model-len 64000 \
--enforce-eager \
--disable_custom_all_reduce \
--enable-auto-tool-choice \
--tool-parser-plugin /app/hermes_tool_parser.py \
--tool-call-parser hermes_patched  \
--chat-template /app/qwen.jinja

I also tried using Docker image v0.6.3.post1 v0.6.4 v0.6.4.post1

Originally posted by @Sala8888 in #10398 (comment)

@Sala8888 Sala8888 changed the title I used the [hermes_tool_parser.py](https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/tool_parsers/hermes_tool_parser.py) as tool-parser-plugin and registered the parser as hermes_patched, but still have the same problem. [Bug] Streaming output error of tool calling has still not been resolved. Nov 23, 2024
@sycamore792
Copy link

I got the same problem,

ERROR 11-23 01:34:01 hermes_tool_parser.py:338] Error trying to handle streaming tool call.
ERROR 11-23 01:34:01 hermes_tool_parser.py:338] Traceback (most recent call last):
ERROR 11-23 01:34:01 hermes_tool_parser.py:338] File "/home/sycamore/.conda/envs/llm_env/lib/python3.10/site-packages/vllm/entrypoints/openai/tool_parsers/hermes_tool_parser.py", line 227, in extract_tool_calls_streaming
ERROR 11-23 01:34:01 hermes_tool_parser.py:338] function_name: Union[str, None] = current_tool_call.get("name")
ERROR 11-23 01:34:01 hermes_tool_parser.py:338] AttributeError: 'NoneType' object has no attribute 'get'
ERROR 11-23 01:34:01 hermes_tool_parser.py:338] Error trying to handle streaming tool call.
ERROR 11-23 01:34:01 hermes_tool_parser.py:338] Traceback (most recent call last):
ERROR 11-23 01:34:01 hermes_tool_parser.py:338] File "/home/sycamore/.conda/envs/llm_env/lib/python3.10/site-packages/vllm/entrypoints/openai/tool_parsers/hermes_tool_parser.py", line 291, in extract_tool_calls_streaming
ERROR 11-23 01:34:01 hermes_tool_parser.py:338] args_delta_start_loc = cur_arguments_json.index(delta_text)
ERROR 11-23 01:34:01 hermes_tool_parser.py:338] ValueError: substring not found
ERROR 11-23 01:34:02 hermes_tool_parser.py:338] Error trying to handle streaming tool call.
ERROR 11-23 01:34:02 hermes_tool_parser.py:338] Traceback (most recent call last):
ERROR 11-23 01:34:02 hermes_tool_parser.py:338] File "/home/sycamore/.conda/envs/llm_env/lib/python3.10/site-packages/vllm/entrypoints/openai/tool_parsers/hermes_tool_parser.py", line 291, in extract_tool_calls_streaming
ERROR 11-23 01:34:02 hermes_tool_parser.py:338] args_delta_start_loc = cur_arguments_json.index(delta_text)
ERROR 11-23 01:34:02 hermes_tool_parser.py:338] ValueError: substring not found

besides, when i print the tool_calls function.arguments in strem mode like:

for chunk in create:
    try:
        print(chunk.choices[0].delta.tool_calls[0].function.arguments,end="")
    except Exception as e:
        pass
        

the output like :

None{"args":"entity": ""\u5e7f\u4e1c\u641c\u4e00\u641c\u79d1\u6280\u6709\u9650\u516c\u53f8, "func": ""get_company_funding{"args": {"entity": "\u5e7f\u4e1c\u641c\u4e00\u641c\u79d1\u6280\u6709\u9650\u516c\u53f8"}, "func": "get_company_funding"}

It is very unfriendly for me to parse json

@DarkLight1337
Copy link
Member

cc @K-Mistele

@Sala8888
Copy link
Author

Sala8888 commented Nov 23, 2024

Supplementary note.

Before using a new parser, LLM displayed the tool calling arguments in the content of response instead of tool_call.
At the same time, the vllm server also reported the same error.
Here are some examples of response:

現在我將查詢 DB 中的所有 controller 信息。
<tool_call>
{"name": "text_to_sql", "arguments": {"retrieve_steps": "查全部的controller", "columns": "*", "analysis_method": "summary the data and get max", "db_name": "21", "plot": false, "return_sql": false}}}
</tool_call>
請稍等,我會盡快完成分析並提供給您。

<tool_response>
{"name": "voice_to_text", "arguments": {"AudioPath": "["/path/example.mp3"]", "KeywordsPath": "None", "language": "zh"}
</tool_response>

When I modify the code using the above method, LLM directly response the error.

Hope this problem can be solved as soon as possible, thank you!

joennlae added a commit to 44ai-labs/vllm that referenced this issue Dec 1, 2024
During the startup of the api server the setup function is called
multiple times (every 5s). So the longer the longer the startup time
(generally for larger models) the more consumers are contending for the
output. This can then lead to race condition where the order of the
answer token is wrong.

Introduce here: vllm-project#9973

References:
vllm-project#10376
vllm-project#10589
vllm-project#10782

Signed-off-by: Jannis Schönleber <[email protected]>
joennlae added a commit to 44ai-labs/vllm that referenced this issue Dec 1, 2024
During the startup of the api server the setup function is called
multiple times (every 5s). So the longer the longer the startup time
(generally for larger models) the more consumers are contending for the
output. This can then lead to race condition where the order of the
answer token is wrong.

Introduce here: vllm-project#9973

References:
vllm-project#10376
vllm-project#10589
vllm-project#10782

Signed-off-by: Jannis Schönleber <[email protected]>
joennlae added a commit to 44ai-labs/vllm that referenced this issue Dec 1, 2024
During the startup of the api server the setup function is called
multiple times (every 5s). So the longer the longer the startup time
(generally for larger models) the more consumers are contending for the
output. This can then lead to race condition where the order of the
answer token is wrong.

Introduce here: vllm-project#9973

References:
vllm-project#10376
vllm-project#10589
vllm-project#10782

Signed-off-by: Jannis Schönleber <[email protected]>
joennlae added a commit to 44ai-labs/vllm that referenced this issue Dec 1, 2024
During the startup of the api server the setup function is called
multiple times (every 5s). So the longer the longer the startup time
(generally for larger models) the more consumers are contending for the
output. This can then lead to race condition where the order of the
answer token is wrong.

Introduce here: vllm-project#9973

References:
vllm-project#10376
vllm-project#10589
vllm-project#10782

Signed-off-by: Jannis Schönleber <[email protected]>
@Sala8888
Copy link
Author

Sala8888 commented Dec 2, 2024

@cedonley @joennlae
Thank you for your fix, but the problem is still not solved.

I have read [Bugfix] Multiple fixes to tool streaming when using auto tool choice. and [Bugfix] fix race condition that leads to wrong order of token returned and installed the latest version of vllm using the instructions from here:

pip install https://vllm-wheels.s3.us-west-2.amazonaws.com/nightly/vllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl

The version is 0.6.4.post2.dev202+ge25810ae

But vllm server still has the same error:

INFO 12-02 12:13:00 engine.py:267] Added request chatcmpl-906ae28088d14caca3e3a355ac0a3036.
INFO 12-02 12:13:00 metrics.py:460] Avg prompt throughput: 171.5 tokens/s, Avg generation throughput: 4.2 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.8%, CPU KV cache usage: 0.0%.
ERROR 12-02 12:13:02 hermes_tool_parser.py:337] Error trying to handle streaming tool call.
ERROR 12-02 12:13:02 hermes_tool_parser.py:337] Traceback (most recent call last):
ERROR 12-02 12:13:02 hermes_tool_parser.py:337]   File "/opt/conda/lib/python3.10/site-packages/vllm/entrypoints/openai/tool_parsers/hermes_tool_parser.py", line 228, in extract_tool_calls_streaming
ERROR 12-02 12:13:02 hermes_tool_parser.py:337]     function_name: Union[str, None] = current_tool_call.get("name")
ERROR 12-02 12:13:02 hermes_tool_parser.py:337] AttributeError: 'NoneType' object has no attribute 'get'
INFO 12-02 12:13:05 metrics.py:460] Avg prompt throughput: 296.4 tokens/s, Avg generation throughput: 15.3 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.8%, CPU KV cache usage: 0.0%.
INFO 12-02 12:13:20 metrics.py:460] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 8.4 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.

Did I do something wrong? Or is this bug not resolved yet?

@cedonley
Copy link
Contributor

cedonley commented Dec 2, 2024

The PR is not yet merged, but based on the error in the logs, I believe it may be resolved once the PR is merged, as the error raised is one of several that I resolved in my commits.

@Sala8888
Copy link
Author

Sala8888 commented Dec 2, 2024

Thanks for your reply, I will wait for the PR to be merged!

@K-Mistele
Copy link
Contributor

I got the same problem,

ERROR 11-23 01:34:01 hermes_tool_parser.py:338] Error trying to handle streaming tool call. ERROR 11-23 01:34:01 hermes_tool_parser.py:338] Traceback (most recent call last): ERROR 11-23 01:34:01 hermes_tool_parser.py:338] File "/home/sycamore/.conda/envs/llm_env/lib/python3.10/site-packages/vllm/entrypoints/openai/tool_parsers/hermes_tool_parser.py", line 227, in extract_tool_calls_streaming ERROR 11-23 01:34:01 hermes_tool_parser.py:338] function_name: Union[str, None] = current_tool_call.get("name") ERROR 11-23 01:34:01 hermes_tool_parser.py:338] AttributeError: 'NoneType' object has no attribute 'get' ERROR 11-23 01:34:01 hermes_tool_parser.py:338] Error trying to handle streaming tool call. ERROR 11-23 01:34:01 hermes_tool_parser.py:338] Traceback (most recent call last): ERROR 11-23 01:34:01 hermes_tool_parser.py:338] File "/home/sycamore/.conda/envs/llm_env/lib/python3.10/site-packages/vllm/entrypoints/openai/tool_parsers/hermes_tool_parser.py", line 291, in extract_tool_calls_streaming ERROR 11-23 01:34:01 hermes_tool_parser.py:338] args_delta_start_loc = cur_arguments_json.index(delta_text) ERROR 11-23 01:34:01 hermes_tool_parser.py:338] ValueError: substring not found ERROR 11-23 01:34:02 hermes_tool_parser.py:338] Error trying to handle streaming tool call. ERROR 11-23 01:34:02 hermes_tool_parser.py:338] Traceback (most recent call last): ERROR 11-23 01:34:02 hermes_tool_parser.py:338] File "/home/sycamore/.conda/envs/llm_env/lib/python3.10/site-packages/vllm/entrypoints/openai/tool_parsers/hermes_tool_parser.py", line 291, in extract_tool_calls_streaming ERROR 11-23 01:34:02 hermes_tool_parser.py:338] args_delta_start_loc = cur_arguments_json.index(delta_text) ERROR 11-23 01:34:02 hermes_tool_parser.py:338] ValueError: substring not found

besides, when i print the tool_calls function.arguments in strem mode like:

for chunk in create:
    try:
        print(chunk.choices[0].delta.tool_calls[0].function.arguments,end="")
    except Exception as e:
        pass
        

the output like :

None{"args":"entity": ""\u5e7f\u4e1c\u641c\u4e00\u641c\u79d1\u6280\u6709\u9650\u516c\u53f8, "func": ""get_company_funding{"args": {"entity": "\u5e7f\u4e1c\u641c\u4e00\u641c\u79d1\u6280\u6709\u9650\u516c\u53f8"}, "func": "get_company_funding"}

It is very unfriendly for me to parse json

This:

None{"args":"entity": ""\u5e7f\u4e1c\u641c\u4e00\u641c\u79d1\u6280\u6709\u9650\u516c\u53f8, "func": ""get_company_funding{"args": {"entity": "\u5e7f\u4e1c\u641c\u4e00\u641c\u79d1\u6280\u6709\u9650\u516c\u53f8"}, "func": "get_company_funding"}

Doesn't look like the model is trying to generate a valid tool call; the structure is off. Possibly a chat template issue, or a precision loss with the AWQ Quant?

Can you share your chat template and the tools you're passing to the model? Hard to debug without these.

@cedonley
Copy link
Contributor

cedonley commented Dec 2, 2024

Hi @K-Mistele They are not printing the full tool call, only the arguments. The issues are (outlined in my PR mentioned earlier) is:

  1. We're extensively using json.loads/dumps() calls, which explains why the model returns UTF-8 without the tool parsing, but is encoding non-ascii characters within the tool call. This isn't a showstopper except that we're mixing and matching ensure_ascii=False w/ the default (True) in various parsers, which breaks non-ascii arguments at times given that the "diffs" won't match (note the extra quotes and the missing {'s that invalidate this JSON...this is from the diffs getting misaligned).

  2. The first argument looks to be "args": {"entity". The arg name is very short, and likely runs into the issue mentioned in my PR where the chunks come back as '"ar' then 'gs": {"ent'. What happens in this case before my fix is that the first argument can be corrupted. I'm not 100% sure if that's what is happening to Sala8888, but it's very common prior to my fix when the first argument is short...

  3. What's worse is you can see the full (and correct) arguments get repeated after the first occurrence (right after the first appearance of get_company_funding in the string. This is because the first instance was corrupted from what was in the "expected" string, so the diff calculation is completely wrong, causing the full arguments to be returned in the "closing" part of the tool parser loop.

@rmalde
Copy link

rmalde commented Dec 2, 2024

Looking forward to this being merged, also blocked by this. Thanks @cedonley !

@K-Mistele
Copy link
Contributor

@cedonley @joennlae Thank you for your fix, but the problem is still not solved.

I have read [Bugfix] Multiple fixes to tool streaming when using auto tool choice. and [Bugfix] fix race condition that leads to wrong order of token returned and installed the latest version of vllm using the instructions from here:

pip install https://vllm-wheels.s3.us-west-2.amazonaws.com/nightly/vllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl

The version is 0.6.4.post2.dev202+ge25810ae

But vllm server still has the same error:

INFO 12-02 12:13:00 engine.py:267] Added request chatcmpl-906ae28088d14caca3e3a355ac0a3036.
INFO 12-02 12:13:00 metrics.py:460] Avg prompt throughput: 171.5 tokens/s, Avg generation throughput: 4.2 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.8%, CPU KV cache usage: 0.0%.
ERROR 12-02 12:13:02 hermes_tool_parser.py:337] Error trying to handle streaming tool call.
ERROR 12-02 12:13:02 hermes_tool_parser.py:337] Traceback (most recent call last):
ERROR 12-02 12:13:02 hermes_tool_parser.py:337]   File "/opt/conda/lib/python3.10/site-packages/vllm/entrypoints/openai/tool_parsers/hermes_tool_parser.py", line 228, in extract_tool_calls_streaming
ERROR 12-02 12:13:02 hermes_tool_parser.py:337]     function_name: Union[str, None] = current_tool_call.get("name")
ERROR 12-02 12:13:02 hermes_tool_parser.py:337] AttributeError: 'NoneType' object has no attribute 'get'
INFO 12-02 12:13:05 metrics.py:460] Avg prompt throughput: 296.4 tokens/s, Avg generation throughput: 15.3 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.8%, CPU KV cache usage: 0.0%.
INFO 12-02 12:13:20 metrics.py:460] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 8.4 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.

Did I do something wrong? Or is this bug not resolved yet?

fyi it is likely that the version of vLLM that you pulled from the nightly build did not contain these fixes, as both the linked PRs are still open and have not been merged into main. You might try pulling the PRs themselves and checking that way.

@K-Mistele
Copy link
Contributor

I think it makes the most sense to either test with those, or wait for them to be merged and then test with the nightly build, before trying to debug other issues here since otherwise we could be either (a) debugging an issue that has already been solved, or (b) trying to debug a compound issue which would be quite difficult

@Sala8888
Copy link
Author

Sala8888 commented Dec 4, 2024

@cedonley @joennlae I used the branch you provided to build vllm, but still got the same error.
The branch I use: cedonley:fix_toolstream_truncate, 44ai-labs:fix-racecondition-generation

Error messages:

ERROR 12-04 14:00:42 hermes_tool_parser.py:337] Error trying to handle streaming tool call.
ERROR 12-04 14:00:42 hermes_tool_parser.py:337] Traceback (most recent call last):
ERROR 12-04 14:00:42 hermes_tool_parser.py:337]   File "/app/vllm/vllm/entrypoints/openai/tool_parsers/hermes_tool_parser.py", line 228, in extract_tool_calls_streaming
ERROR 12-04 14:00:42 hermes_tool_parser.py:337]     function_name: Union[str, None] = current_tool_call.get("name")
ERROR 12-04 14:00:42 hermes_tool_parser.py:337]                                       ^^^^^^^^^^^^^^^^^^^^^
ERROR 12-04 14:00:42 hermes_tool_parser.py:337] AttributeError: 'NoneType' object has no attribute 'get'

In addition, vllm may still send back a response after an error occurs. I used the same code to retrieve tool_calls information. The format and content retrieved in v0.6.3post1 were correct. However, in the branch you provided, there was a problem with the retrieved format, causing json decode errors.

Retrieval results in v0.6.3post1:

{'tool_calls': [{'index': 0, 'id': 'chatcmpl-tool-8882ac18eed149139d5f950e4c01390e', 'type': 'function', 'function': {'name': 'voice_to_text', 'arguments': '{"AudioPath": "http://example.com/audio.mp3", "language": "zh", "function_one_name_called": true}'}}]}

Retrieval results in branch:

{'tool_calls': [{'index': 0, 'id': 'chatcmpl-tool-d462cd97e6c940f2be6c33ee015f8eab', 'type': 'function', 'function': {'name': 'voice_to_text', 'arguments': ' true}'}}]}

Hope you can solve this problem, thank you!

@Sala8888
Copy link
Author

Sala8888 commented Dec 4, 2024

Here is another related error.

I originally only added the tool message (role: tool) in messages after getting the tool's response, and then entered the next iteration of LLM, so the messages contained: system-user-tool

In order to let LLM know the history of tool calls, I added an assistant message containing tool_calls information before the tool message, so the messages contained: system-user-assistant-tool.
Example:

[
    {
        "role": "system",
        "content": "You are a helpful assistant."
    },
    {
        "role": "user",
        "content": "幫我把音檔轉成逐字稿。檔名: http://example.com/audio.mp3"
    },
    {
        "role": "assistant",
        "tool_calls": [
            {
                "id": "chatcmpl-tool-cb9685ce207a4bcfa461eafea3d6e801",
                "function": {
                    "arguments": "{\"AudioPath\": \"http://example.com/audio.mp3\", \"language\": \"zh\", \"function_one_name_called\": true}",
                    "name": "voice_to_text"
                },
                "type": "function"
            }
        ]
    },
    {
        "role": "tool",
        "tool_call_id": "5097b499-6da1-4343-ae3c-4134b660e065",
        "name": "voice_to_text",
        "content": "Testing"
    }
]

It works when stream=False, but when stream=True the error occurs as follows:

ERROR 12-04 07:30:15 serving_chat.py:156] Error in applying chat template from request
ERROR 12-04 07:30:15 serving_chat.py:156] Traceback (most recent call last):
ERROR 12-04 07:30:15 serving_chat.py:156]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/serving_chat.py", line 124, in create_chat_completion
ERROR 12-04 07:30:15 serving_chat.py:156]     conversation, mm_data_future = parse_chat_messages_futures(
ERROR 12-04 07:30:15 serving_chat.py:156]                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 12-04 07:30:15 serving_chat.py:156]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/chat_utils.py", line 529, in parse_chat_messages_futures
ERROR 12-04 07:30:15 serving_chat.py:156]     sub_messages = _parse_chat_message_content(msg, mm_tracker)
ERROR 12-04 07:30:15 serving_chat.py:156]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 12-04 07:30:15 serving_chat.py:156]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/chat_utils.py", line 475, in _parse_chat_message_content
ERROR 12-04 07:30:15 serving_chat.py:156]     result_msg["tool_calls"] = list(parsed_msg["tool_calls"])
ERROR 12-04 07:30:15 serving_chat.py:156]                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 12-04 07:30:15 serving_chat.py:156] pydantic_core._pydantic_core.ValidationError: 1 validation error for ValidatorIterator
ERROR 12-04 07:30:15 serving_chat.py:156] 0.index
ERROR 12-04 07:30:15 serving_chat.py:156]   Extra inputs are not permitted [type=extra_forbidden, input_value=0, input_type=int]
ERROR 12-04 07:30:15 serving_chat.py:156]     For further information visit https://errors.pydantic.dev/2.9/v/extra_forbidden
INFO:     192.168.54.92:39236 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request

Version: v0.6.3post1

@cedonley
Copy link
Contributor

cedonley commented Dec 4, 2024

@Sala8888

I've written a short script that tries to replicate your tool call. Note that you need to add a final "assistant" response to your example messages that shows the assistant's typical reply to the tool information.

gist with python test script

I start my server with the following arguments:

python -m vllm.entrypoints.openai.api_server --model /ai/models/Qwen2.5-72B-Instruct-AWQ --host 0.0.0.0 --port 5005 --served-model-name qwen2.5-large --disable-log-requests --kv-cache-dtype auto --enable-auto-tool-choice --tool-call-parser hermes -tp 2 --gpu-memory-utilization 0.95 --distributed-executor-backend ray --enable-prefix-caching  --enable-chunked-prefill

The script outputs both the stream=False followed by a separate stream=True call for the same request:

❯ python test_vllm_bug10831.py
stream=False results:
[ChatCompletionMessageToolCall(id='chatcmpl-tool-66fe6d006e4f484aa27a32c0c3078d06', function=Function(arguments='{"AudioPath": "https://example.com/real_audio.mp3", "language": "zh", "function_one_name_called": true}', name='voice_to_text'), type='function')]



streamed tool call id: chatcmpl-tool-1b5f49b78bfe4095bdcf4788c1ecb307
streamed tool call name: voice_to_text
streamed tool call arguments: {"AudioPath": "https://example.com/real_audio.mp3", "language": "zh", "function_one_name_called": true}

As you see above, I'm not seeing any issues when using with my PR.

If you don't get this output from the script, try to run the script when you start vLLM with debug logs enabled and provide the relevant debug lines.

export VLLM_LOGGING_LEVEL=DEBUG

joennlae added a commit to 44ai-labs/vllm that referenced this issue Dec 15, 2024
During the startup of the api server the setup function is called
multiple times (every 5s). So the longer the longer the startup time
(generally for larger models) the more consumers are contending for the
output. This can then lead to race condition where the order of the
answer token is wrong.

Introduce here: vllm-project#9973

References:
vllm-project#10376
vllm-project#10589
vllm-project#10782

Signed-off-by: Jannis Schönleber <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants