Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cannot use two roce nics #1527

Open
gongysh2004 opened this issue Nov 29, 2024 · 1 comment
Open

cannot use two roce nics #1527

gongysh2004 opened this issue Nov 29, 2024 · 1 comment

Comments

@gongysh2004
Copy link

I have two servers each with 8 gpu cards, two roce cards:mlx5_0:1, mlx5_2:1 and one ethernet card:ens5f0np0. the two ROCE cards connecte with the same fast switch.
my test command is:

  1. with one roce card: mlx5_0:1, it is ok.
export ibcards="mlx5_0:1"
mpirun  --allow-run-as-root --map-by node  -np 2 \
  -H node6-1:8,node6-2:8    -x NCCL_IB_DISABLE=0 -x NCCL_IB_HCA=$ibcards \
  -x NCCL_DEBUG=INFO  -x NCCL_DEBUG_SUBSYS=INIT,ENV,GRAPH -x PATH -x NCCL_SOCKET_IFNAME=ens5f0np0 \
  -mca pml ob1 -mca btl ^openib --mca btl_tcp_if_include  ens5f0np0 \
  build/all_reduce_perf -b 2M -e 4096M -f 2 -n 20 -g 8 
  1. with one roce card:mlx5_2:1, it is ok too.
export ibcards="mlx5_2:1"
mpirun  --allow-run-as-root --map-by node  -np 2 \
  -H node6-1:8,node6-2:8    -x NCCL_IB_DISABLE=0 -x NCCL_IB_HCA=$ibcards \
  -x NCCL_DEBUG=INFO  -x NCCL_DEBUG_SUBSYS=INIT,ENV,GRAPH -x PATH -x NCCL_SOCKET_IFNAME=ens5f0np0 \
  -mca pml ob1 -mca btl ^openib --mca btl_tcp_if_include  ens5f0np0 \
  build/all_reduce_perf -b 2M -e 4096M -f 2 -n 20 -g 8 
  1. with two roce cards, it fails.
export ibcards="mlx5_0:1,mlx5_2:1"
mpirun  --allow-run-as-root --map-by node  -np 2 \
  -H node6-1:8,node6-2:8    -x NCCL_IB_DISABLE=0 -x NCCL_IB_HCA=$ibcards \
  -x NCCL_DEBUG=INFO  -x NCCL_DEBUG_SUBSYS=INIT,ENV,GRAPH,NET -x PATH -x NCCL_SOCKET_IFNAME=ens5f0np0 \
  -mca pml ob1 -mca btl ^openib --mca btl_tcp_if_include  ens5f0np0 \
  build/all_reduce_perf -b 2M -e 4096M -f 2 -n 20 -g 8 

log2.txt

how to use two roce nics ? I run across the same issue when I am running deepspeed training. thanks.

@kiskra-nvidia
Copy link
Member

Are you sure that the two RoCE NICs can communicate with each other (i.e., that sending via mlx5_0 and receiving via mlx5_2 works)? You should verify that with low-level tests such as ib-write-bw (see https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/troubleshooting.html#networking-issues), not only between the two nodes but also between two NICs on the same node, because it looks like NCCL thinks that communicating between the GPUs attached to two different CPUs on the same node will be faster over the network than using shared memory. You can try running with NCCL_CROSS_NIC=0 to avoid cross-NIC traffic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants