-
-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix] Hermes tool parser fails to check for & handle None
values in some cases
#9908
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
Signed-off-by: Kyle Mistele <[email protected]>
ab05e8c
to
9ada2b4
Compare
Hello @K-Mistele thanks for the ping on the two issues #9693 and #9674 Here is what i tried
The patch did not fix the issue, while streaming tools won't worked here is the new error
I confirm that normal discussion works and tools without streaming works |
Thanks for this @githebs - I will take another crack at it and see if I can't resolve that issue, and failing that, reproducing on screenshare would be awesome. I'm beginning to suspect this is related to the Qwen models doing something slightly different than hermes; I need to dig into their chat template a little more deeply. I've seen a couple open bug issues about things related to this and they're all related to Qwen models, rather than hermes. I'll follow up as soon as I have an update :) |
Just in case it might help i've included a draft of a fastapi endpoint to test vllm with Qwen and that's what made me discover the issue I might be totally wrong in the implementation import logging
import json
from datetime import datetime
from typing import Dict, Any, List, Optional, AsyncGenerator
import httpx
from fastapi import APIRouter, HTTPException, Request
from fastapi.responses import StreamingResponse
from pydantic import BaseModel
from pathlib import Path
# Ensure logs directory exists
Path("logs").mkdir(exist_ok=True)
# API Constants
VLLM_API_BASE = "http://192.168.1.100:9000/v1/chat/completions"
DEFAULT_MODEL = "/models/Qwen-Qwen2.5-7B-Instruct"
# Define tools at module level
DEFAULT_TOOLS = [
{
"type": "function",
"function": {
"name": "get_public_ip",
"description": "Get the public IP address of the server",
"parameters": {
"type": "object",
"properties": {},
"required": []
}
}
},
{
"type": "function",
"function": {
"name": "add_numbers",
"description": "Add two numbers together",
"parameters": {
"type": "object",
"properties": {
"a": {"type": "number", "description": "First number"},
"b": {"type": "number", "description": "Second number"}
},
"required": ["a", "b"]
}
}
}
]
router = APIRouter()
class ChatRequest(BaseModel):
messages: List[Dict[str, str]] = [
{"role": "assistant", "content": "You are a useful assistant."},
{"role": "user", "content": "hello"}
]
temperature: Optional[float] = 0.7
tools: Optional[List[Dict]] = DEFAULT_TOOLS
class RequestLogger:
def __init__(self):
self.start_time = datetime.now()
# Create a unique log file for this request using detailed timestamp
log_dir = Path("logs")
log_dir.mkdir(exist_ok=True)
# Format: chat_YYYYMMDD_HHMMSS_microseconds.log
self.request_id = self.start_time.strftime("%Y%m%d_%H%M%S_%f")
self.file_handler = logging.FileHandler(
f'logs/chat_{self.request_id}.log'
)
# Configure formatter
formatter = logging.Formatter(
'%(asctime)s - %(levelname)s - %(message)s'
)
self.file_handler.setFormatter(formatter)
# Create a logger specific to this request
self.logger = logging.getLogger(f'chat.{self.request_id}')
self.logger.setLevel(logging.INFO)
self.logger.addHandler(self.file_handler)
# Prevent propagation to root logger
self.logger.propagate = False
def __call__(self, msg: str, data: Any = None, level: str = 'info'):
elapsed = (datetime.now() - self.start_time).total_seconds()
log_entry = {
'timestamp': datetime.now().isoformat(),
'request_id': self.request_id,
'elapsed_seconds': round(elapsed, 3),
'message': msg,
'data': data
}
getattr(self.logger, level)(
json.dumps(log_entry, ensure_ascii=False)
)
def __del__(self):
# Clean up handlers when the logger is destroyed
if hasattr(self, 'file_handler'):
self.file_handler.close()
self.logger.removeHandler(self.file_handler)
def add_numbers(a: float, b: float) -> float:
"""Simple addition function."""
return a + b
def get_public_ip() -> str:
"""Get the public IP address using ifconfig.me."""
try:
response = httpx.get("https://ifconfig.me", timeout=5)
return response.text.strip()
except httpx.RequestError as e:
return f"Failed to get IP: {str(e)}"
async def stream_response(payload: dict, log: RequestLogger) -> AsyncGenerator[str, None]:
"""Handle streaming response from vLLM."""
async with httpx.AsyncClient() as client:
try:
async with client.stream(
'POST',
VLLM_API_BASE,
json=payload,
headers={"Content-Type": "application/json"},
timeout=30.0
) as response:
if response.status_code != 200:
error_msg = f"vLLM API error: {response.status_code}"
log(error_msg, level='error')
yield f"data: {json.dumps({'error': error_msg})}\n\n"
return
async for line in response.aiter_lines():
if not line or not line.startswith('data: '):
continue
line = line.removeprefix('data: ')
if line.strip() == '[DONE]':
log("Stream completed")
yield 'data: [DONE]\n\n'
break
try:
parsed = json.loads(line)
log("Streaming chunk", parsed)
# Handle tool calls in streaming response
if 'choices' in parsed and parsed['choices']:
choice = parsed['choices'][0]
if 'delta' in choice and 'tool_calls' in choice['delta']:
tool_call = choice['delta']['tool_calls'][0]
if ('function' in tool_call and
'name' in tool_call['function'] and
'arguments' in tool_call['function']):
func_name = tool_call['function']['name']
args = json.loads(tool_call['function']['arguments'])
if func_name == 'add_numbers':
result = add_numbers(args['a'], args['b'])
yield f'data: {json.dumps({"choices": [{"delta": {"content": str(result)}}]})}\n\n'
continue
yield f'data: {line}\n\n'
except json.JSONDecodeError as e:
log(f"Failed to parse streaming response: {str(e)}", level='error')
continue
except httpx.RequestError as e:
error_msg = f"Streaming request failed: {str(e)}"
log(error_msg, level='error')
yield f"data: {json.dumps({'error': error_msg})}\n\n"
log("Stream connection closed")
@router.post("/chat/completions")
async def chat_completion(request: ChatRequest, raw_request: Request):
log = RequestLogger()
# Parse raw request to check for streaming
request_body = await raw_request.json()
is_streaming = request_body.get('stream', False)
# Use default tools if none provided
tools = request.tools if request.tools is not None else DEFAULT_TOOLS
log("Received chat completion request", {
"messages": request.messages,
"temperature": request.temperature,
"tools": tools,
"stream": is_streaming
})
try:
payload = {
"model": DEFAULT_MODEL,
"messages": request.messages,
"temperature": request.temperature,
"stream": is_streaming,
"tools": tools
}
log("Sending request to vLLM", payload)
if is_streaming:
return StreamingResponse(
stream_response(payload, log),
media_type='text/event-stream'
)
# Non-streaming response handling
async with httpx.AsyncClient() as client:
response = await client.post(
VLLM_API_BASE,
headers={"Content-Type": "application/json"},
json=payload,
timeout=30.0
)
if not response.content:
log("Empty response from vLLM API", level='error')
raise HTTPException(status_code=500, detail="Empty response from API")
result = response.json()
log("Received response from vLLM", result)
# Handle tool calls if present
if "choices" in result and result["choices"] and "message" in result["choices"][0]:
message = result["choices"][0]["message"]
if "tool_calls" in message and message["tool_calls"]:
log("Processing tool calls", message["tool_calls"])
# Add assistant's tool call message to conversation
messages = request.messages + [{
"role": "assistant",
"content": None,
"tool_calls": message["tool_calls"]
}]
for tool_call in message["tool_calls"]:
try:
func_name = tool_call["function"]["name"]
args = json.loads(tool_call["function"]["arguments"])
if func_name == "get_public_ip":
tool_result = get_public_ip()
elif func_name == "add_numbers":
tool_result = add_numbers(args["a"], args["b"])
else:
raise ValueError(f"Unknown tool: {func_name}")
log(f"Tool execution result for {func_name}", tool_result)
# Add tool result to messages
messages.append({
"role": "tool",
"tool_call_id": tool_call["id"],
"name": func_name,
"content": str(tool_result)
})
except Exception as e:
error_msg = f"Tool execution failed: {str(e)}"
log(error_msg, level='error')
raise HTTPException(status_code=500, detail=error_msg)
# Make another call to vLLM with tool results
final_payload = {
"model": DEFAULT_MODEL,
"messages": messages,
"temperature": request.temperature,
"stream": is_streaming,
"tools": tools
}
final_response = await client.post(
VLLM_API_BASE,
headers={"Content-Type": "application/json"},
json=final_payload,
timeout=30.0
)
result = final_response.json()
log("Received final response from vLLM", result)
return result
except httpx.RequestError as e:
error_msg = f"Request to vLLM failed: {str(e)}"
log(error_msg, level='error')
raise HTTPException(status_code=500, detail=error_msg)
except Exception as e:
error_msg = f"Unexpected error: {str(e)}"
log(error_msg, level='error')
raise HTTPException(status_code=500, detail=error_msg) |
Out of curiosity, are you trying to get the model to call the function with the empty arguments object? I wonder if that's the issue; the parser may not handle that case properly due to how the streaming extraction works. If you're trying that, can you try to get it to call a version of the function with non-empty arguments and see if that resolve the issue? I will dig in more in the meantime |
I re-implemented from scratch with and without the openai package and i can confirm that all parser versions fail if the tool doesn't have arguments i can reproduce the bug each time and it goes away (with streaming) if arguments are present |
ok, check out #10398 and see if this fixes it for you |
This PR patches a bug in the hermes tool parser where certain edge conditions (which are not fully understood, as I have not been able to replicate them) can result in values being set to
None
which causes downstream issues due to a lack of handlingNone
in those values.While I have not been able to replicate the issue, @ankush13r was kind enough to point me at the specific lines causing the issue. Appropriate checks & handling have been added.
cc @DarkLight1337 @githebs @ankush13r
FIX #9693
FIX #9874
Recommended debugging process for @githebs @ankush13r to determine if this fixes the behavior you are experiencing:
Copy the entire updated
HermesToolParser
from this PR onto your local disk.change the plugin name in the ToolParserManager decorator from this:
@ToolParserManager.register_module("hermes")
to this:
@ToolParserManager.register_module("hermes_patched")
This prevents the patched version that you'll be testing with, from conflicting with the parser that vLLM comes packaged with.
Run vLLM with the following configuration for tools:
Try to reproduce the issue however you encountered it. If you do not have the same issue, then this PR fixes
[Bug]: Function calling with stream vs without stream, arguments=None when stream option is enabled #9693
and [Bug]: Function calling with Qwen & Streaming ('NoneType' object has no attribute 'get') #9874.
If you encounter the same issue, then there is a different problem
Once this has been tested by @ankush13r and @githebs and they confirm that this fixes their issues, i will move this from a draft to an open PR.
PR Checklist (Click to Expand)
Thank you for your contribution to vLLM! Before submitting the pull request, please ensure the PR meets the following criteria. This helps vLLM maintain the code quality and improve the efficiency of the review process.
PR Title and Classification
Only specific types of PRs will be reviewed. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:
[Bugfix]
for bug fixes.[CI/Build]
for build or continuous integration improvements.[Doc]
for documentation fixes and improvements.[Model]
for adding a new model or improving an existing model. Model name should appear in the title.[Frontend]
For changes on the vLLM frontend (e.g., OpenAI API server,LLM
class, etc.)[Kernel]
for changes affecting CUDA kernels or other compute kernels.[Core]
for changes in the core vLLM logic (e.g.,LLMEngine
,AsyncLLMEngine
,Scheduler
, etc.)[Hardware][Vendor]
for hardware-specific changes. Vendor name should appear in the prefix (e.g.,[Hardware][AMD]
).[Misc]
for PRs that do not fit the above categories. Please use this sparingly.Note: If the PR spans more than one category, please include all relevant prefixes.
Code Quality
The PR need to meet the following code quality standards:
format.sh
to format your code.docs/source/
if the PR modifies the user-facing behaviors of vLLM. It helps vLLM user understand and utilize the new features or changes.Adding or changing kernels
Each custom kernel needs a schema and one or more implementations to be registered with PyTorch.
Tensors
require meta-functions. Meta-functions should be implemented and registered in python so that dynamic dims can be handled automatically. See above documents for a description of meta-functions.torch.libary.opcheck()
to test the function registration and meta-function for any registered ops. Seetests/kernels
for examples.Notes for Large Changes
Please keep the changes as concise as possible. For major architectural changes (>500 LOC excluding kernel/data/config/test), we would expect a GitHub issue (RFC) discussing the technical design and justification. Otherwise, we will tag it with
rfc-required
and might not go through the PR.What to Expect for the Reviews
The goal of the vLLM team is to be a transparent reviewing machine. We would like to make the review process transparent and efficient and make sure no contributor feel confused or frustrated. However, the vLLM team is small, so we need to prioritize some PRs over others. Here is what you can expect from the review process:
action-required
label on the PR if there are changes required. The contributor should address the comments and ping the reviewer to re-review the PR.Thank You
Finally, thank you for taking the time to read these guidelines and for your interest in contributing to vLLM. Your contributions make vLLM a great tool for everyone!