Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could you please share training hyper-parameters? #9

Open
stevehuanghe opened this issue May 12, 2021 · 1 comment
Open

Could you please share training hyper-parameters? #9

stevehuanghe opened this issue May 12, 2021 · 1 comment

Comments

@stevehuanghe
Copy link

Hello,

This work is really inspiring, and thanks for sharing the code. Meanwhile, could you please also share the training hyper-parameters (e.g., learning rate, optimizer, warmup lr, warmup epochs, etc.)? I would really like to train the model to get a deeper understanding of the model.

Thanks,
Steve

@giladsharir
Copy link
Collaborator

giladsharir commented May 13, 2021

Hi,
thanks taking interest in this work.
The training hyper-parameters are (for stam_16) batch size 64, AdamW optimizer with weight decay 1e-3, 100 epochs with cosine annealing schedule and learning rate warm up (first 10 epochs). Base learning rate of 1e-5. And using model EMA.
For stam_64, same as above, except batch size: 16, and learning rate: 2.5e-6
The models were trained on single 8xV100 machine.
Hope you find this useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants