-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Different hyper-parameters used for different models in image task. #34
Comments
@MostafaDehghani for clarity on the image configs. |
@mlpen We had extensive hp search for every single model to make sure that we have best possible results from each. Especially for the CIFAR task, given that the results of different models are close, we wanted to make sure we have a rather large grid for searching the hp for each model separately. So you can see different values for number of layers, number of heads, etc. And basically we prioritized getting best possible result over keeping the number of trainale parameters similar across models. |
Was the hyperparameters from the hp search used to make Table 1, or was the hp search done after Table 1? I'm confused because article says all Transformer models used the same fixed hyperparameters, but the result of hp search gave different hyperparameters " The large search space motivates us to follow a set of fixed hyperparameters (number of layers, heads, embedding dimensions, etc) for all models. " |
Hi,
I found that different hyper-parameters (number of layers, dimension, etc.) are used for different models.
Can you clarify how the baselines are compared?
For example,
https://github.com/google-research/long-range-arena/blob/main/lra_benchmarks/image/configs/cifar10/longformer_base.py
https://github.com/google-research/long-range-arena/blob/main/lra_benchmarks/image/configs/cifar10/performer_base.py
https://github.com/google-research/long-range-arena/blob/main/lra_benchmarks/image/configs/cifar10/reformer_base.py
The text was updated successfully, but these errors were encountered: