Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] I think we should hybridize the segmentation models. #547

Open
chinakook opened this issue Jan 2, 2019 · 4 comments
Open
Labels
enhancement New feature or request

Comments

@chinakook
Copy link
Member

The op F.contrib.BilinearResize2D need to explicitly know the destination size, so the segmentation models is not hybridized now in order to dynamically determine the destination size.
I think we should hybridize the models later to improve performance and reduce memory consuming.

@zhreshold
Copy link
Member

Agreed, @zhanghang1989 we might be able to modify the operator by allowing it to take NDArray shapes rather than arguments only.

@bryanyzhu
Copy link
Collaborator

@chinakook @zhreshold Right now, the training of semantic segmentation depends on sync batchnorm, which utilizes dataparallel,

https://github.com/dmlc/gluon-cv/blob/master/scripts/segmentation/train.py#L163-L164

And our implementation of dataparallel depends on thread,

https://github.com/dmlc/gluon-cv/blob/master/gluoncv/utils/parallel.py#L3

I searched online for a while, and it seems MXNet is not thread-safe. So the training of semantic segmentation models is not hybridizable, as long as we need syncbn. It will throw a bug something like this,

AttributeError: 'NoneType' object has no attribute '*exit*'

The inference of semantic segmentation is hybridizable though.

@zhreshold
Copy link
Member

Maybe we can do local distributed training style instead of multithreading

@zhanghang1989
Copy link
Contributor

Maybe we can do local distributed training style instead of multithreading

The SyncBN is implemented in operator level, which does not support distributed training. The synchronization happens within the operator and bypass the engine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants