Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

good job!i have a question #2

Open
lmk123568 opened this issue Apr 20, 2021 · 12 comments
Open

good job!i have a question #2

lmk123568 opened this issue Apr 20, 2021 · 12 comments

Comments

@lmk123568
Copy link

看了源代码,meta-acon要自动学习self.p,self.q,以及生成β的conv每层权重?

@nmaac
Copy link
Owner

nmaac commented Apr 20, 2021

没错。p1, p2 和 conv 权重都是可学习参数。

@xuhao-anhe
Copy link

你好,我想咨询一下meta-acon里面的r具体作用是啥,可以随便改变默认值吗

@nmaac
Copy link
Owner

nmaac commented Apr 22, 2021

r是channel放缩系数,可以根据你的需要调整,对准确率影响不大。这是个很常用的减少参数量的技巧,早在2016年的HyperNetworks就已经是常规做法。

这里可以引用原文的解释来回答:“a one-layered hypernetwork would have Nz × Nin × fsize × Nout × fsize learnable
parameters which is usually much bigger than a two-layered hypernetwork does.”

@lucasjinreal
Copy link

lucasjinreal commented Apr 25, 2021

@nmaac Did u tested how many speed drop if using MetaAcon compare with normal activation without learnable params? I saw there is no such comparsion in paper but seems will introduce latency increase

@nmaac
Copy link
Owner

nmaac commented Apr 25, 2021

@jinfagang It depends on hardware platform, normally 10%-20% latency increment.

@lucasjinreal
Copy link

@nmaac Oh....

@nmaac
Copy link
Owner

nmaac commented Apr 25, 2021

@jinfagang But ACON is a good choice which has the same speed with Swish, and they have the same speed with ReLU if using hard-sigmoid to implement :)

@lucasjinreal
Copy link

@nmaac You mean ACON-ABC?

@nmaac
Copy link
Owner

nmaac commented Apr 25, 2021

@jinfagang I suggest ACON-C which improves the performance with a negligible overhead and shows a good accuracy-speed tradeoff.

@yxNONG
Copy link

yxNONG commented Jun 7, 2021

@nmaac I had a question about beta in MetaACON, in paper it mention when beta goes infinite, the loss function will change to the max(x1, x2). However, in MetaACON, beta is generaed by sigmoid function which means that the range of beta is (0, 1).
Is there any reason for this choice?

@nmaac
Copy link
Owner

nmaac commented Jun 7, 2021

@yxNONG MetaAcon uses a small network to generate beta, in this work we try some network examples which show sigmoid has good performance. More choices and designs of this small network is not the focus of this work but is a promising future direction.

@yxNONG
Copy link

yxNONG commented Jun 7, 2021

@nmaac got it, i will try the ReLU and identities which is much more make sense to me, thanks for your reply!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants