Skip to content

Commit

Permalink
[Bugfix] fix race condition that leads to wrong order of token returned
Browse files Browse the repository at this point in the history
During the startup of the api server the setup function is called
multiple times (every 5s). So the longer the longer the startup time
(generally for larger models) the more consumers are contending for the
output. This can then lead to race condition where the order of the
answer token is wrong.

Introduce here: #9973

References:
#10376
#10589
#10782

Signed-off-by: Jannis Schönleber <[email protected]>
  • Loading branch information
joennlae committed Dec 1, 2024
1 parent 7e4bbda commit 90944bc
Showing 1 changed file with 9 additions and 3 deletions.
12 changes: 9 additions & 3 deletions vllm/engine/multiprocessing/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -255,7 +255,12 @@ async def setup(self):
"""Setup the client before it starts sending server requests."""

# Start output_loop
self.output_loop = asyncio.create_task(self.run_output_handler_loop())
if self.output_loop is None:
# only generate once to avoid multiple concurrent output_loops
# this will lead to race conditions and wrong orders of tokens returned

Check failure on line 260 in vllm/engine/multiprocessing/client.py

View workflow job for this annotation

GitHub Actions / ruff (3.12)

Ruff (E501)

vllm/engine/multiprocessing/client.py:260:81: E501 Line too long (83 > 80)
# by the engine
# setup will be called multiple times during the startup of the engine

Check failure on line 262 in vllm/engine/multiprocessing/client.py

View workflow job for this annotation

GitHub Actions / ruff (3.12)

Ruff (E501)

vllm/engine/multiprocessing/client.py:262:81: E501 Line too long (82 > 80)
self.output_loop = asyncio.create_task(self.run_output_handler_loop())

Check failure on line 263 in vllm/engine/multiprocessing/client.py

View workflow job for this annotation

GitHub Actions / ruff (3.12)

Ruff (E501)

vllm/engine/multiprocessing/client.py:263:81: E501 Line too long (82 > 80)

with self.get_data_socket() as socket:
# Wait until server is ready.
Expand All @@ -264,8 +269,9 @@ async def setup(self):
self.tracing_flag = response.tracing_enabled

# Start health_loop.
self.health_loop = asyncio.create_task(
self.run_heartbeat_loop(timeout=VLLM_RPC_TIMEOUT))
if self.health_loop is None:
self.health_loop = asyncio.create_task(
self.run_heartbeat_loop(timeout=VLLM_RPC_TIMEOUT))

def close(self):
"""Destroy the ZeroMQ Context."""
Expand Down

0 comments on commit 90944bc

Please sign in to comment.