-
-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Hermes tool choice can not supprot format 'string' #11250
Comments
Can you show the error log? |
I use the export VLLM_TRACE_FUNCTION=1 and export VLLM_LOGGING_LEVEL=DEBUG parameter, but found that the function call does not have any error message, but Tool choice just suggests that the error is unavailable |
Do you get a similar issue in the latest release version of vLLM? (v0.6.4) |
I am using v0.6.4.post1 |
The output of |
My info grabbed the wrong machine, I'll reupload it Collecting environment information... OS: Ubuntu 22.04.4 LTS (x86_64) Python version: 3.11.11 (main, Dec 11 2024, 16:28:39) [GCC 11.2.0] (64-bit runtime) Nvidia driver version: 550.120 CPU: Versions of relevant libraries: Legend: X = Self NIC Legend: NIC0: mlx5_0 NCCL_P2P_DISABLE=1 |
Thanks for providing this info! @K-Mistele can you help look into this? |
Your current environment
The output of `python collect_env.py`
Model Input Dumps
vllm serve /model/models/calme-3.2-instruct-78b/ --guided-decoding-backend xgrammar --block-size 32 --max-num-seqs 100 --port xxxxxxxxxx --api-key xxxxxxxxxxxxxxxx -tp 8 --served-model-name Qwen2.5-72B-Instruct --dtype float16 --max-model-len 65536 --enable-chunked-prefill false --seed 818 --multi-step-stream-outputs true --enable-auto-tool-choice --tool-call-parser hermes --tokenizer-pool-size 50
🐛 Describe the bug
INFO 12-17 04:39:28 llm_engine.py:446] init engine (profile, create kv cache, warmup model) took 119.34 seconds
INFO 12-17 04:39:28 api_server.py:578] Using supplied chat template:
INFO 12-17 04:39:28 api_server.py:578] None
INFO 12-17 04:39:28 serving_chat.py:74] "auto" tool choice has been enabled please note that while the parallel_tool_calls client option is preset for compatibility reasons, it will be ignored.
INFO 12-17 04:39:28 launcher.py:19] Available routes are:
INFO 12-17 04:39:28 launcher.py:27] Route: /openapi.json, Methods: GET, HEAD
INFO 12-17 04:39:28 launcher.py:27] Route: /docs, Methods: GET, HEAD
INFO 12-17 04:39:28 launcher.py:27] Route: /docs/oauth2-redirect, Methods: GET, HEAD
INFO 12-17 04:39:28 launcher.py:27] Route: /redoc, Methods: GET, HEAD
INFO 12-17 04:39:28 launcher.py:27] Route: /health, Methods: GET
INFO 12-17 04:39:28 launcher.py:27] Route: /tokenize, Methods: POST
INFO 12-17 04:39:28 launcher.py:27] Route: /detokenize, Methods: POST
INFO 12-17 04:39:28 launcher.py:27] Route: /v1/models, Methods: GET
INFO 12-17 04:39:28 launcher.py:27] Route: /version, Methods: GET
INFO 12-17 04:39:28 launcher.py:27] Route: /v1/chat/completions, Methods: POST
INFO 12-17 04:39:28 launcher.py:27] Route: /v1/completions, Methods: POST
INFO 12-17 04:39:28 launcher.py:27] Route: /v1/embeddings, Methods: POST
INFO 12-17 04:39:28 launcher.py:27] Route: /score, Methods: POST
INFO 12-17 04:39:28 launcher.py:27] Route: /v1/score, Methods: POST
INFO: Started server process [2694250]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:xxxxxxxxxxxx (Press CTRL+C to quit)
INFO 12-17 04:39:51 chat_utils.py:331] Detected the chat template content format to be 'string'. You can set
--chat-template-content-format
to override this.INFO: 192.254.90.4:56388 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO: 192.254.90.4:56388 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
INFO: 192.254.90.4:56388 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
This is a fine-tuned model based on Qwen 2.5-72B, which is currently ranked #1 on huggingface, I would like to use it, but I found that an exception occurs in the handling of tool choice, I tried contacting the author of the big model, and it told me that this model supports Hermes's chat-template format, but I found that after running the VLLM I got this error warning, I also tried to use the --chat-template-content-format string method but it still failed, I hope to get help, thanks a lot!
The big model author replied to me with the link: https://huggingface.co/MaziyarPanahi/calme-3.2-instruct-78b/discussions/8
Translated with DeepL.com (free version)
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: