You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You'll find the child process stuck, loop is not forwarding and no error would be raised. The child process just awaits, if you stop the program manually, you'll see:
KeyboardInterrupt Traceback (most recent call last)
Cell In[1], line 39
36 print("epoch%s!!!" % i)
37 with Pool(num_processes) as pool:
38 # targets = pool.starmap(load_targets, [(sample_name, cool_dir, test_regions) for sample_name in test_samples])
---> 39 targets = pool.map(load_targets, [(sample_name, cool_dir, test_regions) for sample_name in test_samples])
40 # print(targets)
41 targets = torch.cat(targets)
File ~/miniconda3/envs/env/lib/python3.9/multiprocessing/pool.py:364, in Pool.map(self, func, iterable, chunksize)
359 def map(self, func, iterable, chunksize=None):
360 '''
361 Apply `func` to each element in `iterable`, collecting the results
362 in a list that is returned.
363 '''
--> 364 return self._map_async(func, iterable, mapstar, chunksize).get()
File ~/miniconda3/envs/env/lib/python3.9/multiprocessing/pool.py:765, in ApplyResult.get(self, timeout)
764 def get(self, timeout=None):
--> 765 self.wait(timeout)
766 if not self.ready():
767 raise TimeoutError
File ~/miniconda3/envs/env/lib/python3.9/multiprocessing/pool.py:762, in ApplyResult.wait(self, timeout)
761 def wait(self, timeout=None):
--> 762 self._event.wait(timeout)
File ~/miniconda3/envs/env/lib/python3.9/threading.py:581, in Event.wait(self, timeout)
579 signaled = self._flag
580 if not signaled:
--> 581 signaled = self._cond.wait(timeout)
582 return signaled
File ~/miniconda3/envs/env/lib/python3.9/threading.py:312, in Condition.wait(self, timeout)
310 try: # restore state no matter what (e.g., KeyboardInterrupt)
311 if timeout is None:
--> 312 waiter.acquire()
313 gotit = True
314 else:
KeyboardInterrupt:
Interestingly, this only happens if you want to manipulate the results returned from pool. If you don't use these line:
targets = torch.cat(targets)
# or
targets = torch.stack(targets)
# or some other manipulation I havn't tested
Then the loop will keep going forward and successfully ends.
I've tried several ways to figure it out, including delete the variables after using them, trying to close the file handle of cool file (although they are actually closed automatically), put the code into a if __name__ == "__main__": block, changing the method of multiprocessing from fork to spawn, use a multiprocessing lock, clone the tensors in targets before manipulating... But they are not the point. The only thing I know is that manipulation of results blocks (at least one) new child process to fetch data from cool file. As pdb cannot be used in child process, I don't know how to debug.
Luckily, I found that solution at last. Instead of using multiprocessing or torch.multiprocessing, from pathos.multiprocessing import ProcessingPool as Pool prevents the problem. However, exploring the cause of this phenomenon is beyond my ability. I just hope this will be helpful for those trying to process cool files using multiprocessing.
The text was updated successfully, but these errors were encountered:
I'm using the latest version of cooler(0.9.2). For clarity, the num_processes is set to 1, the problem is the same.
And the output:
You'll find the child process stuck, loop is not forwarding and no error would be raised. The child process just awaits, if you stop the program manually, you'll see:
Interestingly, this only happens if you want to manipulate the results returned from pool. If you don't use these line:
Then the loop will keep going forward and successfully ends.
I've tried several ways to figure it out, including delete the variables after using them, trying to close the file handle of cool file (although they are actually closed automatically), put the code into a
if __name__ == "__main__":
block, changing the method of multiprocessing from fork to spawn, use a multiprocessing lock, clone the tensors in targets before manipulating... But they are not the point. The only thing I know is that manipulation of results blocks (at least one) new child process to fetch data from cool file. As pdb cannot be used in child process, I don't know how to debug.Luckily, I found that solution at last. Instead of using
multiprocessing
ortorch.multiprocessing
,from pathos.multiprocessing import ProcessingPool as Pool
prevents the problem. However, exploring the cause of this phenomenon is beyond my ability. I just hope this will be helpful for those trying to process cool files using multiprocessing.The text was updated successfully, but these errors were encountered: