-
-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Guided Decoding Broken in Streaming mode #10376
Labels
bug
Something isn't working
Comments
joennlae
added a commit
to 44ai-labs/vllm
that referenced
this issue
Dec 1, 2024
During the startup of the api server the setup function is called multiple times (every 5s). So the longer the longer the startup time (generally for larger models) the more consumers are contending for the output. This can then lead to race condition where the order of the answer token is wrong. Introduce here: vllm-project#9973 References: vllm-project#10376 vllm-project#10589 vllm-project#10782 Signed-off-by: Jannis Schönleber <[email protected]>
joennlae
added a commit
to 44ai-labs/vllm
that referenced
this issue
Dec 1, 2024
During the startup of the api server the setup function is called multiple times (every 5s). So the longer the longer the startup time (generally for larger models) the more consumers are contending for the output. This can then lead to race condition where the order of the answer token is wrong. Introduce here: vllm-project#9973 References: vllm-project#10376 vllm-project#10589 vllm-project#10782 Signed-off-by: Jannis Schönleber <[email protected]>
joennlae
added a commit
to 44ai-labs/vllm
that referenced
this issue
Dec 1, 2024
During the startup of the api server the setup function is called multiple times (every 5s). So the longer the longer the startup time (generally for larger models) the more consumers are contending for the output. This can then lead to race condition where the order of the answer token is wrong. Introduce here: vllm-project#9973 References: vllm-project#10376 vllm-project#10589 vllm-project#10782 Signed-off-by: Jannis Schönleber <[email protected]>
joennlae
added a commit
to 44ai-labs/vllm
that referenced
this issue
Dec 1, 2024
During the startup of the api server the setup function is called multiple times (every 5s). So the longer the longer the startup time (generally for larger models) the more consumers are contending for the output. This can then lead to race condition where the order of the answer token is wrong. Introduce here: vllm-project#9973 References: vllm-project#10376 vllm-project#10589 vllm-project#10782 Signed-off-by: Jannis Schönleber <[email protected]>
joennlae
added a commit
to 44ai-labs/vllm
that referenced
this issue
Dec 15, 2024
During the startup of the api server the setup function is called multiple times (every 5s). So the longer the longer the startup time (generally for larger models) the more consumers are contending for the output. This can then lead to race condition where the order of the answer token is wrong. Introduce here: vllm-project#9973 References: vllm-project#10376 vllm-project#10589 vllm-project#10782 Signed-off-by: Jannis Schönleber <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Your current environment
The output of `python collect_env.py`
Model Input Dumps
No response
🐛 Describe the bug
Guided decoding broken in streaming mode after this commit 04cef2c
Previous commits are working fine. Non-streaming mode works fine as well.
Dataset to test: https://raw.githubusercontent.com/JC1DA/SharedData/refs/heads/main/gsm8k_luca_input_prompts/dataset.json
Test script:
Result:
Commit 6e056bc: failed 1 / 1318
Commit 04cef2c: failed 263 / 1318
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: