Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Usage]: Is pipeline parallelism supported on machines that are not in the same local network? #11285

Open
1 task done
oldcpple opened this issue Dec 18, 2024 · 1 comment
Labels
usage How to use vllm

Comments

@oldcpple
Copy link

oldcpple commented Dec 18, 2024

How would you like to use vllm

Hi there, since the communications between nodes are done by NCCL(which typically relies on RDMA I guess), I wonder if I can setup an inference pipeline with machines from different networks, for example, one on Google Cloud and another on AWS Cloud, through vLLM's pipeline parallelism?
Thanks a lot if anyone can answer this.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@oldcpple oldcpple added the usage How to use vllm label Dec 18, 2024
@noooop
Copy link
Contributor

noooop commented Dec 18, 2024

Do you really want to do this?

A typical vllm step takes about 20ms, and copying an intermediate result (a large tenser) over the network is very slow.

And now vllm is scheduled synchronously, so the delay in network transmission of intermediate results will greatly reduce GPU utilization, increase latency, and reduce throughput.

You can pay attention to progress of Disaggregated prefilling

It seems to be asynchronous, that Awesome

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
usage How to use vllm
Projects
None yet
Development

No branches or pull requests

2 participants