You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Does app pass without Unitrace?
Yes, it does, the app without Unitrace finishes with 0 return status.
Does it fail even with smaller number of ranks?
I tested with mpiexec -n 2 -ppn 2 and get this error:
/run_mpi.sh: line 7: 169430 Segmentation fault (core dumped) python bin/sr.py
[INFO] Log is stored in /home/test10/results.169391.0.csv
[INFO] Timeline is stored in /home/test10/run_mpi.sh.169391.0.json
hostname: rank 0 exited with code 139
hostname: rank 1 died from signal 15
The run_mpi.sh contains the entire app command. This is the mpiexec instruction with unitrace included:
I launched unitrace in a mpiexec command:
mpiexec -n 12 -ppn 12 --pmi=pmix ~/pti-gpu/tools/unitrace/build/unitrace --separate-tiles --chrome-device-logging --ccl-summary-report --output-dir-path /home/test --output /home/test/test.csv python bin/sr.py
This is executed in a single node, 12 processes are created, but when they finishes I got this error from one process and the entire mpiexec fails:
hostname: rank 0 died from signal 15
I got this error in unitrace too #25, is this error the cause of signal 15?
The text was updated successfully, but these errors were encountered: