Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How does nccl regtister the memory region by ibv_reg_xxxx ? #1512

Closed
jinhao2 opened this issue Nov 12, 2024 · 1 comment
Closed

How does nccl regtister the memory region by ibv_reg_xxxx ? #1512

jinhao2 opened this issue Nov 12, 2024 · 1 comment

Comments

@jinhao2
Copy link

jinhao2 commented Nov 12, 2024

Debug the ncclSend/ncclRecv program, Only find the following stack, no other ibv_reg_dmabuf_mr being called.

#0 0x00007fffc8214f80 in ibv_reg_dmabuf_mr () from /lib/x86_64-linux-gnu/libibverbs.so
#1 0x00007fffd770e4de in wrap_direct_ibv_reg_dmabuf_mr (pd=0x5555755643f0, offset=0, length=0, iova=0, fd=-1, access=0) at misc/ibvwrap.cc:182
#2 0x00007fffd77533ed in ncclIbDmaBufSupport (dev=0) at transport/net_ib.cc:640
#3 0x00007fffd77535a5 in ncclIbGetProperties (dev=0, props=0x7fffffffda50) at transport/net_ib.cc:669
#4 0x00007fffd76b5771 in ncclNetCheckDeviceVersion (comm=0x555555f8a9f0, net=0x7ffff7f9ce40 , dev=0) at net.cc:549
#5 0x00007fffd76b5cac in ncclNetInit (comm=0x555555f8a9f0) at net.cc:608
#6 0x00007fffd76a003e in commAlloc (comm=0x555555f8a9f0, parent=0x0, ndev=2, rank=1) at init.cc:330
#7 0x00007fffd76a8a3f in ncclCommInitRankFunc (job_=0x555555f88ea0) at init.cc:1392
#8 0x00007fffd7698d2b in ncclAsyncLaunch (job=0x555555f88ea0, func=0x7fffd76a8418 <ncclCommInitRankFunc(ncclAsyncJob*)>, undo=0x0,
destructor=0x7fffd6e503e0 <__GI___libc_free>, comm=0x555555f8a9f0) at group.cc:37
#9 0x00007fffd76aa552 in ncclCommInitRankDev (newcomm=0x7fffffffdfa0, nranks=2, commId=..., myrank=1, cudaDev=0, config=0x7fffffffddf0) at init.cc:1667
#10 0x00007fffd76aa8f6 in ncclCommInitRank (newcomm=0x7fffffffdfa0, nranks=2, commId=..., myrank=1) at init.cc:1706

ibv_reg_dmabuf_mr is called wiht length =0, offset=0. I don't think it only check whether dmabuf is supported.
Then how can dmabuf be regeist as memory region?

Another question, dmabuf has same performance with nvidia-peermem, then there is no reason to install nvidia-peermem in latest Linux?

@jinhao2
Copy link
Author

jinhao2 commented Nov 18, 2024

#0 ncclIbRegMrDmaBuf (comm=0x7fffb0015000, data=0x7ffee6400000, size=4194304, type=1, offset=0, fd=-1, mhandle=0x7fffb0009e30) at transport/net_ib.cc:1534
#1 0x00007fffd7e0c118 in ncclIbRegMr (comm=0x7fffb0015000, data=0x7ffee6400000, size=4194304, type=1, mhandle=0x7fffb0009e30) at transport/net_ib.cc:1547
#2 0x00007fffd7df5fcc in sendProxyConnect (connection=0x7fffb0000d90, proxyState=0x55557521a380, reqBuff=0x7fffb000f150, reqSize=128,
respBuff=0x7fffb000f1e0, respSize=21040, done=0x7fffc2bf899c) at transport/net.cc:796
#3 0x00007fffd7cc4454 in proxyProgressAsync (op=0x7fffb0000ba0, proxyState=0x55557521a380, asyncOpCount=0x7fffc2bf89f4, peer=0x7fffc2bf8c70,
connectionPool=0x7fffc2bf8a50) at proxy.cc:1304
#4 0x00007fffd7cc5378 in ncclProxyService (_args=0x55557521a380) at proxy.cc:1481
#5 0x00007fffd6e9ca94 in start_thread (arg=) at ./nptl/pthread_create.c:447
#6 0x00007fffd6f29c3c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:78

here, registered DMABuff as memory region.

@jinhao2 jinhao2 closed this as completed Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant