cannot use two roce nics #1527

gongysh2004 · 2024-11-29T05:32:02Z

I have two servers each with 8 gpu cards, two roce cards:mlx5_0:1, mlx5_2:1 and one ethernet card:ens5f0np0. the two ROCE cards connecte with the same fast switch.
my test command is:

with one roce card: mlx5_0:1, it is ok.

export ibcards="mlx5_0:1"
mpirun  --allow-run-as-root --map-by node  -np 2 \
  -H node6-1:8,node6-2:8    -x NCCL_IB_DISABLE=0 -x NCCL_IB_HCA=$ibcards \
  -x NCCL_DEBUG=INFO  -x NCCL_DEBUG_SUBSYS=INIT,ENV,GRAPH -x PATH -x NCCL_SOCKET_IFNAME=ens5f0np0 \
  -mca pml ob1 -mca btl ^openib --mca btl_tcp_if_include  ens5f0np0 \
  build/all_reduce_perf -b 2M -e 4096M -f 2 -n 20 -g 8

with one roce card:mlx5_2:1, it is ok too.

export ibcards="mlx5_2:1"
mpirun  --allow-run-as-root --map-by node  -np 2 \
  -H node6-1:8,node6-2:8    -x NCCL_IB_DISABLE=0 -x NCCL_IB_HCA=$ibcards \
  -x NCCL_DEBUG=INFO  -x NCCL_DEBUG_SUBSYS=INIT,ENV,GRAPH -x PATH -x NCCL_SOCKET_IFNAME=ens5f0np0 \
  -mca pml ob1 -mca btl ^openib --mca btl_tcp_if_include  ens5f0np0 \
  build/all_reduce_perf -b 2M -e 4096M -f 2 -n 20 -g 8

with two roce cards, it fails.

export ibcards="mlx5_0:1,mlx5_2:1"
mpirun  --allow-run-as-root --map-by node  -np 2 \
  -H node6-1:8,node6-2:8    -x NCCL_IB_DISABLE=0 -x NCCL_IB_HCA=$ibcards \
  -x NCCL_DEBUG=INFO  -x NCCL_DEBUG_SUBSYS=INIT,ENV,GRAPH,NET -x PATH -x NCCL_SOCKET_IFNAME=ens5f0np0 \
  -mca pml ob1 -mca btl ^openib --mca btl_tcp_if_include  ens5f0np0 \
  build/all_reduce_perf -b 2M -e 4096M -f 2 -n 20 -g 8

log2.txt

how to use two roce nics ? I run across the same issue when I am running deepspeed training. thanks.

The text was updated successfully, but these errors were encountered:

kiskra-nvidia · 2024-12-02T22:14:55Z

Are you sure that the two RoCE NICs can communicate with each other (i.e., that sending via mlx5_0 and receiving via mlx5_2 works)? You should verify that with low-level tests such as ib-write-bw (see https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/troubleshooting.html#networking-issues), not only between the two nodes but also between two NICs on the same node, because it looks like NCCL thinks that communicating between the GPUs attached to two different CPUs on the same node will be faster over the network than using shared memory. You can try running with NCCL_CROSS_NIC=0 to avoid cross-NIC traffic.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cannot use two roce nics #1527

cannot use two roce nics #1527

gongysh2004 commented Nov 29, 2024

kiskra-nvidia commented Dec 2, 2024

cannot use two roce nics #1527

cannot use two roce nics #1527

Comments

gongysh2004 commented Nov 29, 2024

kiskra-nvidia commented Dec 2, 2024