Skip to content

Commit

Permalink
bug fix for issue 9688
Browse files Browse the repository at this point in the history
  • Loading branch information
weilong.yu committed Dec 13, 2024
1 parent 00c1bde commit 348855f
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 3 deletions.
4 changes: 4 additions & 0 deletions vllm/worker/model_runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -1327,6 +1327,10 @@ def profile_run(self) -> None:

self.execute_model(model_input, kv_caches, intermediate_tensors)
torch.cuda.synchronize()
# Cleanup
if self.lora_config:
assert self.lora_manager is not None
self.remove_all_loras(()

Check failure on line 1333 in vllm/worker/model_runner.py

View workflow job for this annotation

GitHub Actions / mypy (3.10)

'(' was never closed [syntax]

Check failure on line 1333 in vllm/worker/model_runner.py

View workflow job for this annotation

GitHub Actions / mypy (3.11)

'(' was never closed [syntax]
return

Check failure on line 1334 in vllm/worker/model_runner.py

View workflow job for this annotation

GitHub Actions / mypy (3.9)

invalid syntax [syntax]

Check failure on line 1334 in vllm/worker/model_runner.py

View workflow job for this annotation

GitHub Actions / ruff (3.12)

Ruff

vllm/worker/model_runner.py:1333:37: SyntaxError: Expected ')', found newline

def remove_all_loras(self):
Expand Down
3 changes: 0 additions & 3 deletions vllm/worker/worker.py
Original file line number Diff line number Diff line change
Expand Up @@ -252,9 +252,6 @@ def determine_num_available_blocks(self) -> Tuple[int, int]:
available_kv_cache_memory / (1024**3),
self.cache_config.gpu_memory_utilization)

# Final cleanup
if self.model_runner.lora_manager:
self.model_runner.remove_all_loras()
gc.collect()

return num_gpu_blocks, num_cpu_blocks
Expand Down

0 comments on commit 348855f

Please sign in to comment.