-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fetch hangs when remote process is terminated #2793
Comments
essentially a dup of #217 |
Actually, I think #217 is pretty broad about system provided fault tolerant programming models, whereas this is a specific issue. Anything waiting on a RemoteRef should probably get an error code or an exception. This makes it easy to write simple fault tolerant code. I notice that |
I am not sure this is a dupe; my calls to pmap() worked flawlessly in julia v0.1.2 but now, using the exact same code (with v0.2.0) frequently hang with the same error. This may be an I/O or serialization bug. |
David, if you can come up with a reduced test case where this error happens during pmap it would be great to file an issue for it. Thanks. |
Hi Jeff, my apologies -- my error seems to be the result of an uncaught exception on one of the remote processes (I think it's DataFrame related -- not an I/O bug). So my original claim is bogus, although the original issue file filed by Tanmay still holds (and frequently causes me problems unless I make sure to catch all remotely-executed errors). |
That sort of error should be propagated and reported more cleanly, so still somewhat of an issue – although a different one. |
If a remote julia process is killed, fetch on a pending RemoteRef does not return control though it detects an error as an end of stream exception.
This can be simulated with following commands:
At this point fetch would be waiting for the remote process.
Kill the remote julia process to simulate an abnormal termination.
The following should be displayed at the REPL, but the fetch call would not return.
It must be interrupted with Ctrl+C for the control to come back.
The text was updated successfully, but these errors were encountered: