You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Debug the ncclSend/ncclRecv program, Only find the following stack, no other ibv_reg_dmabuf_mr being called.
#0 0x00007fffc8214f80 in ibv_reg_dmabuf_mr () from /lib/x86_64-linux-gnu/libibverbs.so #1 0x00007fffd770e4de in wrap_direct_ibv_reg_dmabuf_mr (pd=0x5555755643f0, offset=0, length=0, iova=0, fd=-1, access=0) at misc/ibvwrap.cc:182 #2 0x00007fffd77533ed in ncclIbDmaBufSupport (dev=0) at transport/net_ib.cc:640 #3 0x00007fffd77535a5 in ncclIbGetProperties (dev=0, props=0x7fffffffda50) at transport/net_ib.cc:669 #4 0x00007fffd76b5771 in ncclNetCheckDeviceVersion (comm=0x555555f8a9f0, net=0x7ffff7f9ce40 , dev=0) at net.cc:549 #5 0x00007fffd76b5cac in ncclNetInit (comm=0x555555f8a9f0) at net.cc:608 #6 0x00007fffd76a003e in commAlloc (comm=0x555555f8a9f0, parent=0x0, ndev=2, rank=1) at init.cc:330 #7 0x00007fffd76a8a3f in ncclCommInitRankFunc (job_=0x555555f88ea0) at init.cc:1392 #8 0x00007fffd7698d2b in ncclAsyncLaunch (job=0x555555f88ea0, func=0x7fffd76a8418 <ncclCommInitRankFunc(ncclAsyncJob*)>, undo=0x0,
destructor=0x7fffd6e503e0 <__GI___libc_free>, comm=0x555555f8a9f0) at group.cc:37 #9 0x00007fffd76aa552 in ncclCommInitRankDev (newcomm=0x7fffffffdfa0, nranks=2, commId=..., myrank=1, cudaDev=0, config=0x7fffffffddf0) at init.cc:1667 #10 0x00007fffd76aa8f6 in ncclCommInitRank (newcomm=0x7fffffffdfa0, nranks=2, commId=..., myrank=1) at init.cc:1706
ibv_reg_dmabuf_mr is called wiht length =0, offset=0. I don't think it only check whether dmabuf is supported.
Then how can dmabuf be regeist as memory region?
Another question, dmabuf has same performance with nvidia-peermem, then there is no reason to install nvidia-peermem in latest Linux?
The text was updated successfully, but these errors were encountered:
Debug the ncclSend/ncclRecv program, Only find the following stack, no other ibv_reg_dmabuf_mr being called.
#0 0x00007fffc8214f80 in ibv_reg_dmabuf_mr () from /lib/x86_64-linux-gnu/libibverbs.so
#1 0x00007fffd770e4de in wrap_direct_ibv_reg_dmabuf_mr (pd=0x5555755643f0, offset=0, length=0, iova=0, fd=-1, access=0) at misc/ibvwrap.cc:182
#2 0x00007fffd77533ed in ncclIbDmaBufSupport (dev=0) at transport/net_ib.cc:640
#3 0x00007fffd77535a5 in ncclIbGetProperties (dev=0, props=0x7fffffffda50) at transport/net_ib.cc:669
#4 0x00007fffd76b5771 in ncclNetCheckDeviceVersion (comm=0x555555f8a9f0, net=0x7ffff7f9ce40 , dev=0) at net.cc:549
#5 0x00007fffd76b5cac in ncclNetInit (comm=0x555555f8a9f0) at net.cc:608
#6 0x00007fffd76a003e in commAlloc (comm=0x555555f8a9f0, parent=0x0, ndev=2, rank=1) at init.cc:330
#7 0x00007fffd76a8a3f in ncclCommInitRankFunc (job_=0x555555f88ea0) at init.cc:1392
#8 0x00007fffd7698d2b in ncclAsyncLaunch (job=0x555555f88ea0, func=0x7fffd76a8418 <ncclCommInitRankFunc(ncclAsyncJob*)>, undo=0x0,
destructor=0x7fffd6e503e0 <__GI___libc_free>, comm=0x555555f8a9f0) at group.cc:37
#9 0x00007fffd76aa552 in ncclCommInitRankDev (newcomm=0x7fffffffdfa0, nranks=2, commId=..., myrank=1, cudaDev=0, config=0x7fffffffddf0) at init.cc:1667
#10 0x00007fffd76aa8f6 in ncclCommInitRank (newcomm=0x7fffffffdfa0, nranks=2, commId=..., myrank=1) at init.cc:1706
ibv_reg_dmabuf_mr is called wiht length =0, offset=0. I don't think it only check whether dmabuf is supported.
Then how can dmabuf be regeist as memory region?
Another question, dmabuf has same performance with nvidia-peermem, then there is no reason to install nvidia-peermem in latest Linux?
The text was updated successfully, but these errors were encountered: