-
Notifications
You must be signed in to change notification settings - Fork 837
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NCCL allreduce hangs when cudaFreeHost #48
Comments
I forgot to add some info: 1] I'm using NCCL 1.1.1. |
Closed
Hi Fmana, any update? |
Should be fixed since NCCL 2.1. Please re-open if still a problem. |
minsii
added a commit
to minsii/nccl
that referenced
this issue
Mar 22, 2024
Summary: Pull Request resolved: facebookresearch#48 Differential Revision: D55174222
minsii
added a commit
to minsii/nccl
that referenced
this issue
Mar 22, 2024
Summary: Pull Request resolved: facebookresearch#48 Differential Revision: D55174222
minsii
added a commit
to minsii/nccl
that referenced
this issue
Mar 24, 2024
…VIDIA#48) Summary: Pull Request resolved: facebookresearch#48 Differential Revision: D55174222
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi Nickel team,
I have introduced your library into a my application. The integration was done into a multi-thread scenario. Each thread uses allreduce and in principle the allreduce is called into a loop.
The first part of the body of the loop is used to compute intermediate data then at the end
of the body of the loop I jump into allreduce.
It works perfectly but from time to time I fall into deadlock. Attaching the process with gdb,
I can see that (N-1) threads are into the cudaStreamSynchronize() (each allreduce has its own
custom cuda stream) while 1 thread is into cuFreeHost() (I use GPU malloc for GPU&CPU memory
allocator).
What's happening there is that during the first part of the body of the loop a thread needs to reallocate
some memory before doing its processing while the others (N-1) threads make their own processing
and jump into Nickel allreduce.
This creates from time to time some deadlock condition. What I can guiess is that there is some timeing condition with which threads make action that produce deadlock.
This is not deterministic: the need to reallocate is deterministic after some iteration but not always produces deadlock.
Could you help me in some way?
Not clear if it is a Cuda issue, a Nickel/Cuda bug or a Cuda limitation.
Does any memory management action alloc/free CPU/GPU require that gpus are idle?
I use Nickel allreduce GPU-based sync methods. No CPU-based barrier introduced before entering into allreduce().
Do I need to add a CPU-based barrier? Any C/C++ safe-code to use in case?
Thanks a lot,
Franco
Next some details about gdb info:
(gdb) where
#0 0x00007fffc6bffa11 in clock_gettime ()
#1 0x0000003ab7a03e46 in clock_gettime () from /lib64/librt.so.1
#2 0x00007fc415a821de in ?? () from /usr/lib64/libcuda.so.1
#3 0x00007fc4154377ab in ?? () from /usr/lib64/libcuda.so.1
#4 0x00007fc41538ffde in ?? () from /usr/lib64/libcuda.so.1
#5 0x00007fc415412916 in ?? () from /usr/lib64/libcuda.so.1
#6 0x00007fc415412fa8 in ?? () from /usr/lib64/libcuda.so.1
#7 0x00007fc4153793fc in ?? () from /usr/lib64/libcuda.so.1
#8 0x00007fc415347392 in cuMemFreeHost () from /usr/lib64/libcuda.so.1
#9 0x00007fc41ac6284d in ?? () from /usr/local/cuda-7.5//lib64/libcudart.so.7.5
#10 0x00007fc41ac4782c in ?? () from /usr/local/cuda-7.5//lib64/libcudart.so.7.5
(gdb) where
#0 0x00007fffc6bffa11 in clock_gettime ()
#1 0x0000003ab7a03e46 in clock_gettime () from /lib64/librt.so.1
#2 0x00007fc415a821de in ?? () from /usr/lib64/libcuda.so.1
#3 0x00007fc4154377ab in ?? () from /usr/lib64/libcuda.so.1
#4 0x00007fc415414e33 in ?? () from /usr/lib64/libcuda.so.1
#5 0x00007fc415414f89 in ?? () from /usr/lib64/libcuda.so.1
#6 0x00007fc415388c87 in ?? () from /usr/lib64/libcuda.so.1
#7 0x00007fc4153610c2 in cuStreamSynchronize () from /usr/lib64/libcuda.so.1
#8 0x00007fc41ac40d90 in ?? () from /usr/local/cuda-7.5//lib64/libcudart.so.7.5
#9 0x00007fc41ac781fd in cudaStreamSynchronize () from /usr/local/cuda-7.5//lib64/libcudart.so.7.5
The text was updated successfully, but these errors were encountered: