Could you please share training hyper-parameters? #9

stevehuanghe · 2021-05-12T22:53:13Z

Hello,

This work is really inspiring, and thanks for sharing the code. Meanwhile, could you please also share the training hyper-parameters (e.g., learning rate, optimizer, warmup lr, warmup epochs, etc.)? I would really like to train the model to get a deeper understanding of the model.

Thanks,
Steve

giladsharir · 2021-05-13T08:35:57Z

Hi,
thanks taking interest in this work.
The training hyper-parameters are (for stam_16) batch size 64, AdamW optimizer with weight decay 1e-3, 100 epochs with cosine annealing schedule and learning rate warm up (first 10 epochs). Base learning rate of 1e-5. And using model EMA.
For stam_64, same as above, except batch size: 16, and learning rate: 2.5e-6
The models were trained on single 8xV100 machine.
Hope you find this useful.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Could you please share training hyper-parameters? #9

Could you please share training hyper-parameters? #9

stevehuanghe commented May 12, 2021

giladsharir commented May 13, 2021 •

edited

Loading

Could you please share training hyper-parameters? #9

Could you please share training hyper-parameters? #9

Comments

stevehuanghe commented May 12, 2021

giladsharir commented May 13, 2021 • edited Loading

giladsharir commented May 13, 2021 •

edited

Loading