[Bugfix] fix race condition that leads to wrong order of token returned #10802

joennlae · 2024-12-01T00:46:22Z

The setup function is called multiple times (every 5 seconds) during the API server's startup. The longer the startup time (generally for larger models), the more consumers are contending for the output. This can lead to a race condition where the order of the answer token is wrong.

Here is where the setup function is called:

vllm/vllm/entrypoints/openai/api_server.py

Lines 205 to 217 in c11f172

    
           try: 
        
               while True: 
        
                   try: 
        
                       await mq_engine_client.setup() 
        
                       break 
        
                   except TimeoutError: 
        
                       if (not engine_process.is_alive() 
        
                               or not engine_alive.value): 
        
                           raise RuntimeError( 
        
                               "Engine process failed to start. See stack " 
        
                               "trace for the root cause.") from None 
        
               yield mq_engine_client  # type: ignore[misc]

Introduced here: #9973

References:
#10376
#10589
#10782

github-actions · 2024-12-01T00:46:34Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

During the startup of the api server the setup function is called multiple times (every 5s). So the longer the longer the startup time (generally for larger models) the more consumers are contending for the output. This can then lead to race condition where the order of the answer token is wrong. Introduce here: vllm-project#9973 References: vllm-project#10376 vllm-project#10589 vllm-project#10782 Signed-off-by: Jannis Schönleber <[email protected]>

joennlae force-pushed the fix-racecondition-generation branch from 90944bc to c8f5a34 Compare December 1, 2024 00:49

Sala8888 mentioned this pull request Dec 2, 2024

[Bug] Streaming output error of tool calling has still not been resolved. #10589

Closed

DarkLight1337 requested a review from robertgshaw2-neuralmagic December 2, 2024 18:00

joennlae force-pushed the fix-racecondition-generation branch from 5221600 to ccdb92f Compare December 15, 2024 21:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] fix race condition that leads to wrong order of token returned #10802

[Bugfix] fix race condition that leads to wrong order of token returned #10802

joennlae commented Dec 1, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Dec 1, 2024

	try:
	while True:
	try:
	await mq_engine_client.setup()
	break
	except TimeoutError:
	if (not engine_process.is_alive()
	or not engine_alive.value):
	raise RuntimeError(
	"Engine process failed to start. See stack "
	"trace for the root cause.") from None

	yield mq_engine_client # type: ignore[misc]

[Bugfix] fix race condition that leads to wrong order of token returned #10802

Are you sure you want to change the base?

[Bugfix] fix race condition that leads to wrong order of token returned #10802

Conversation

joennlae commented Dec 1, 2024 • edited by github-actions bot Loading

github-actions bot commented Dec 1, 2024

joennlae commented Dec 1, 2024 •

edited by github-actions bot

Loading