You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,when I conducted the alltoall test, I found that the bandwidth performance of 8 nodes was unstable, and this situation became more obvious as the number of nodes increased. Is this a normal phenomenon?
When I was using 8 nodes, there was a bandwidth drop when the data size was 1G
Hi,when I conducted the alltoall test, I found that the bandwidth performance of 8 nodes was unstable, and this situation became more obvious as the number of nodes increased. Is this a normal phenomenon?
When I was using 8 nodes, there was a bandwidth drop when the data size was 1G
When I use 16 nodes, the bandwidth drops quite a bit
The test command is as follows
Other information:
Network Configuration: 8 *400Gb
Cuda Version : 12.4
Driver Version: 550.54.15
Network Type: RoCE
If I increase the number of QPS, I can alleviate this phenomenon slightly.
I would like to know why this happened, and look forward to your reply!
The text was updated successfully, but these errors were encountered: