-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[MXNET-558] Fix 'AttributeError: '_thread._local' object has no attribute 'value'' on distributed processing applications #11332
Conversation
I haven't reproduced the original problem, but what about defining the function and executing it in a separate thread or process (just by using the builtin |
reproduce the issue, and is fixed with this patch |
Just a question, with this public release will one need still to provide a prefix as a solution or this is taken care of automatically? Thank you very much for this!! |
@anirudh2290 it looks like you refactored a lot of this scope code last month, can you review this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for catching and fixing this. Great work!
Would you also be able to add these checks in register.py
for NameManager and Attrscope:
https://github.com/apache/incubator-mxnet/pull/10833/files#diff-768ec5f1dc18b9993c01568d669d2405
python/mxnet/symbol/symbol.py
Outdated
@@ -2451,7 +2451,8 @@ def var(name, attr=None, shape=None, lr_mult=None, wd_mult=None, dtype=None, | |||
handle = SymbolHandle() | |||
check_call(_LIB.MXSymbolCreateVariable(c_str(name), ctypes.byref(handle))) | |||
ret = Symbol(handle) | |||
attr = AttrScope._current.value.get(attr) | |||
with AttrScope(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can include above not hasattr
logic here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That logic is already included in the context
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can avoid a dict copy this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair point, I just thought it was cleaner that way and had better separation of concerns. I will update.
thread = threading.Thread(target=f) | ||
thread.start() | ||
thread.join() | ||
assert status[0], "Failed to create a layer within a thread" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add the test for running the following functions inside a thread:
def g():
data = mx.sym.Variable('data', attr={'a': 'b'})
def f():
a = mx.sym.var("a")
b = mx.sym.var("b")
a_ = mx.nd.ones((2, 2))
c_ = a_.copy()
func1 = (a + b).bind(mx.cpu(), args={'a': a_, 'b': c_})
func1.forward()[0].wait_to_read()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do 👍
@piiswrong can you also take a look... |
…bute 'value'' on distributed processing applications (apache#11332) * add scope to NameManager * add AttrScope scope * adding test * update NameManager * Trigger build * Trigger build * Add attribute checks for register module
Description
Using distributed processing frameworks, django or ray, users encountered errors when trying to run mxnet in separate threads.
#11331 and dmlc/gluon-cv#156
Calling the
_current.value.get
within the context of the respective object solved the issue.Does anybody have a suggestion as to how to test this without introducing a dependency on ray?
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.