forked from mosaicml/composer
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Added timeout to the hparams for initialize_process_group. The default of 30 minutes was too long for failing tests, which prevents one from getting any meaningful log. In cleanup(), using os.killpg to terminate DDP subprocesses instead of just subprocess.kill(). It appears that sometimes zombie processes would be still there (e.g. from ddp / dataloader workers) In cleanup(), attempting a sigterm before resorting to a sigkill() 5 seconds later. Graceful cleanup is preferred! Directing output from stdout and stderr to tempfiles instead of subprocess.PIPE, which can hang if a subprocess generates significant output
- Loading branch information
1 parent
26920d4
commit 016c2f5
Showing
2 changed files
with
77 additions
and
32 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters