Is it safe to start p2p send/recv on a communicator while another communicator is being initialized in another thread? #1082

SpiritedAwayCN · 2023-11-21T07:36:27Z

My requirement is: The following tasks can be done asynchronously in a single process

An expensive batch p2p communication over an old communicator
Creating a new communication group with some of old processes removed and new processes joining, which will be used later.

Since these two tasks take considerable overhead, I want them to be executed asynchronously (or even simultaneously). Is it possible to assign them to different threads? Thanks!

sjeaugey · 2023-11-21T09:20:37Z

That's a good question. In general I'd think it should work, but there may be CUDA calls during ncclCommInitRank which could cause an implicit inter-device synchronization. If that is the case, then you could end up with a deadlock if:

the p2p communication launches on GPU A but not on GPU B
the init is blocking the launch on GPU B, waiting for GPU A to complete its CUDA work, including the NCCL operation which is stuck.

SpiritedAwayCN · 2023-11-21T09:43:53Z

That's a good question. In general I'd think it should work, but there may be CUDA calls during ncclCommInitRank which could cause an implicit inter-device synchronization. If that is the case, then you could end up with a deadlock if:

the p2p communication launches on GPU A but not on GPU B

the init is blocking the launch on GPU B, waiting for GPU A to complete its CUDA work, including the NCCL operation which is stuck.

Thank you for your reply! Unfortunately, my batch p2p communication is very complicated, so the deadlock case #1 usually occurs in practice (during the call of ncclCommInitRank). I'm a bit confused about why the initialization will cause inter-device synchronization, shouldn't this initialisation just set network related parameters? And whether there are alternatives to achieve my needs?

sjeaugey · 2023-11-21T10:25:25Z

I'm a bit confused about why the initialization will cause inter-device synchronization

In theory it should not, and in NCCL 2.19 we have replaced a lot of CUDA calls to cuMem*, so the situation should improve, but we might still have some calls causing syncs, in particular when we share buffers between CUDA devices and map them on remote GPUs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it safe to start p2p send/recv on a communicator while another communicator is being initialized in another thread? #1082

Is it safe to start p2p send/recv on a communicator while another communicator is being initialized in another thread? #1082

SpiritedAwayCN commented Nov 21, 2023

sjeaugey commented Nov 21, 2023

SpiritedAwayCN commented Nov 21, 2023

sjeaugey commented Nov 21, 2023

Is it safe to start p2p send/recv on a communicator while another communicator is being initialized in another thread? #1082

Is it safe to start p2p send/recv on a communicator while another communicator is being initialized in another thread? #1082

Comments

SpiritedAwayCN commented Nov 21, 2023

sjeaugey commented Nov 21, 2023

SpiritedAwayCN commented Nov 21, 2023

sjeaugey commented Nov 21, 2023