Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Improve V1 startup error handling #11109

Open
1 task done
robertgshaw2-neuralmagic opened this issue Dec 11, 2024 · 0 comments · May be fixed by #11420
Open
1 task done

[Feature]: Improve V1 startup error handling #11109

robertgshaw2-neuralmagic opened this issue Dec 11, 2024 · 0 comments · May be fixed by #11420
Labels

Comments

@robertgshaw2-neuralmagic
Copy link
Collaborator

robertgshaw2-neuralmagic commented Dec 11, 2024

🚀 The feature, motivation and pitch

Improve startup the error handling during startup for VLLM V0 and V1.

Right now, the flow for V1 is:

  • Start EngineCore process in the background. The model is loaded here. Its pretty common for this to fail (e.g. someone puts a too big model onto a GPU that doesn't have enough RAM, they have some config that is bad. When this happens, we log an error (which is in the logs) and throw an exception so the process dies.
  • The main process LLMEngine or AsyncLLM detects that the EngineCore has died and shuts itself down with a message like EngineCoreProc failed to start. This error occurs and prints a big stack trace that the user sees (which means the root cause is hidden. (Here's where it happens: https://github.com/vllm-project/vllm/blob/main/vllm/v1/engine/core.py#L178)

It would be a much better user experience if we presented the root cause error more clearly at the bottom of the stack trace. Im not too sure if there is a clean way to do this in python other than catching and IPCing the exception from the EngineCore before shutdown and raising it from the main process, but wanted to look into it

Alternatives

No response

Additional context

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant