[Usage]: Is pipeline parallelism supported on machines that are not in the same local network? #11285

oldcpple · 2024-12-18T06:44:08Z

How would you like to use vllm

Hi there, since the communications between nodes are done by NCCL(which typically relies on RDMA I guess), I wonder if I can setup an inference pipeline with machines from different networks, for example, one on Google Cloud and another on AWS Cloud, through vLLM's pipeline parallelism？
Thanks a lot if anyone can answer this.

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

noooop · 2024-12-18T07:51:35Z

Do you really want to do this?

A typical vllm step takes about 20ms, and copying an intermediate result (a large tenser) over the network is very slow.

And now vllm is scheduled synchronously, so the delay in network transmission of intermediate results will greatly reduce GPU utilization, increase latency, and reduce throughput.

You can pay attention to progress of Disaggregated prefilling

It seems to be asynchronous, that Awesome

oldcpple added the usage How to use vllm label Dec 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Usage]: Is pipeline parallelism supported on machines that are not in the same local network? #11285

[Usage]: Is pipeline parallelism supported on machines that are not in the same local network? #11285

oldcpple commented Dec 18, 2024 •

edited

Loading

noooop commented Dec 18, 2024

[Usage]: Is pipeline parallelism supported on machines that are not in the same local network? #11285

[Usage]: Is pipeline parallelism supported on machines that are not in the same local network? #11285

Comments

oldcpple commented Dec 18, 2024 • edited Loading

How would you like to use vllm

Before submitting a new issue...

noooop commented Dec 18, 2024

oldcpple commented Dec 18, 2024 •

edited

Loading