Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconcile PythonLayer with Multi-GPU #2936

Closed
ronghanghu opened this issue Aug 16, 2015 · 17 comments
Closed

Reconcile PythonLayer with Multi-GPU #2936

ronghanghu opened this issue Aug 16, 2015 · 17 comments

Comments

@ronghanghu
Copy link
Member

Right now PythonLayer cannot be parallelized in MultiGPU, which is a bug. Python Global Interpreter Lock (GIL) only allows one thread at a time to run in the python interpreter. This issue is reported in #2923.

We should find a solution to to this bug, or only allow shared PythonLayer and serializing PythonLayer forward (currently via share_in_parallel: true) it in MultiGPU (then we need a backward lock, too? but it is more complicated than that #2923 (comment)) Perhaps we may use multiprocessing module to fix this bug? But on second thought it doesn't seem like a good idea either.

@thatguymike
Copy link
Contributor

This is not going to work that cleanly. Multi-process means multiple gpu
contexts and a different multi-gpu design leading to some perf loss.
Moreover, data exchange in general becomes much more complex.

Still amazes me that python doesn't support threading in a reasonable way.
On Aug 16, 2015 12:29 AM, "Ronghang Hu" [email protected] wrote:

Right now PythonLayer cannot be parallelized, which is a bug. Python
Global Interpreter Lock (GIL) only allows one thread at a time to run in
the python interpreter. This issue is reported in #2923
#2923.

We may use multiprocessing
https://docs.python.org/2/library/multiprocessing.html module to fix
this bug.


Reply to this email directly or view it on GitHub
#2936.

@ronghanghu
Copy link
Member Author

This is not going to work that cleanly. Multi-process means multiple gpu
contexts and a different multi-gpu design leading to some perf loss.
Moreover, data exchange in general becomes much more complex.

Yes, I totally agree with you. On second thought maybe Python's multiprocessing module is not suitable here.

@ronghanghu ronghanghu changed the title Use multiprocessing module to parallelize PythonLayer in Multi-GPU PythonLayer crash in Multi-GPU due to GIL Aug 16, 2015
@jmschrei
Copy link

Joblib allows you to use a threading backend when used in conjunction with Cython and releasing the GIL. Maybe that work be appropriate here?

@bhack
Copy link
Contributor

bhack commented Aug 16, 2015

A manual little example to workaround GIL http://pieceofpy.com/2010/02/26/boost-python-threads-and-releasing-the-gil/

@ronghanghu ronghanghu changed the title PythonLayer crash in Multi-GPU due to GIL Reconcile PythonLayer with Multi-GPU Aug 16, 2015
@ronghanghu
Copy link
Member Author

The straightforward (and perhaps naive) workaround seems to be to serialize all calls to any Python code, i.e. only allow having one PythonLayer execution at a time to behave like before. This can be done via a global python lock, perhaps as a static member variable in PythonLayer class.

From my perspective, I think serializing python execution is reasonable for CPU Bound PythonLayer since Python interpretation is always serialized in GIL and never actually run in parallel. One may argue for the case of IO Bound PythonLayer (such as using PythonLayer as a data layer), but then I would not even expect to run into this case since people should set share_in_parallel: true to share it among workers to load data sequentially (avoiding loading all the same data for each solver). And after all, even for IO Bound PythonLayer with set share_in_parallel: false, serialized execution is still better than a crash.

Any comment?

@seanbell
Copy link

In the short term until this gets sorted out, perhaps setting share_in_parallel to false on the PythonLayer should print a fatal message explaining that the unshared case is not implemented yet.

@ronghanghu
Copy link
Member Author

But even for the shared case, backward is not working correctly #2923 (comment), as in #2903 I only expected a PythonLayer to be shared when used as a data layer and performs no backward.

@seanbell
Copy link

You could also give a fatal message for that case too.

I agree that this is tricky and don't have a solution, but failing cleanly is a reasonable placeholder.

@BlGene
Copy link
Contributor

BlGene commented Aug 20, 2015

This is probably a stupid idea, but would it be possible to start several python process, each with their own GIL? My bad.

@bhack
Copy link
Contributor

bhack commented Aug 20, 2015

@BlGene see @thatguymike and @ronghanghu comments on python multiprocess

@ronghanghu
Copy link
Member Author

I'm leaving this issue open for further discussion. #2939 provides a workaround for the crash but it is indeed not very satisfactory, and I don't like #2939 either. On the other hand, multi-process is a hacky way to go. Everyone are welcome to share if they have good ideas/solutions to this issue.

@alessandroferrari
Copy link

What about to implement a method pyforward and pybackward in Net class C++ definition, such that it just releases GIL by means of a ScopedGILRelease and immediately after calls the forward (or backward) methods. Then, simply expose pyforward and pybackward functions to boost interface.

Otherwise, a PyNet C++ class that inherit from Net, and override forward and backward doing simply GIL realease and call super.forward() (or super.backward()).

At least for the most used time consuming/commonly used methods is important to release GIL, otherwise making caffe not suited to production multi-threaded python applications.

@naibaf7
Copy link
Member

naibaf7 commented Jun 23, 2016

@ronghanghu
I do release the GIL for the forward/backward passes in the OpenCL branch and it seems to work fine so far, also with using multiple GPUs on multiple threads (although for data parallel computation and not multi-GPU training).
For the python layer, the GIL gets re-acquired and it passes the python runtests like that.

@alessandroferrari
Copy link

I have done a pull request for fix it #4360

@cypof
Copy link
Member

cypof commented Jan 17, 2017

Each net can run in it's own fork now, there is an example in /python/train.py. #4563

@DenisKuplyakov
Copy link

I can't understand why this was closed? #4563 doesn't release GIL. It only implements multiple GPU training. I wan't to have multiple workers with shared memory and single GPU thread to stream forward passes only, but everything gets blocked during forward pass. What's wrong with #4360, except code style?

@soulslicer
Copy link

Sorry, so is there any way to create a custom data layer with multiple GPU?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.