Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different hyper-parameters used for different models in image task. #34

Open
mlpen opened this issue Aug 26, 2021 · 3 comments
Open

Different hyper-parameters used for different models in image task. #34

mlpen opened this issue Aug 26, 2021 · 3 comments

Comments

@mlpen
Copy link

mlpen commented Aug 26, 2021

Hi,

I found that different hyper-parameters (number of layers, dimension, etc.) are used for different models.
Can you clarify how the baselines are compared?

For example,
https://github.com/google-research/long-range-arena/blob/main/lra_benchmarks/image/configs/cifar10/longformer_base.py

config.model_type = "longformer"
config.model.num_layers = 4
config.model.emb_dim = 128
config.model.qkv_dim = 64
config.model.mlp_dim = 128
config.model.num_heads = 4
config.model.classifier_pool = "MEAN"

https://github.com/google-research/long-range-arena/blob/main/lra_benchmarks/image/configs/cifar10/performer_base.py

config.model_type = "performer"
config.model.num_layers = 1
config.model.emb_dim = 128
config.model.qkv_dim = 64
config.model.mlp_dim = 128
config.model.num_heads = 8
config.model.classifier_pool = "CLS"

https://github.com/google-research/long-range-arena/blob/main/lra_benchmarks/image/configs/cifar10/reformer_base.py

config.model_type = "reformer"
config.model.num_layers = 4
config.model.emb_dim = 64
config.model.qkv_dim = 32
config.model.mlp_dim = 64
config.model.num_heads = 8
config.model.classifier_pool = "CLS"
@vanzytay
Copy link
Collaborator

@MostafaDehghani for clarity on the image configs.

@MostafaDehghani
Copy link
Collaborator

@mlpen We had extensive hp search for every single model to make sure that we have best possible results from each. Especially for the CIFAR task, given that the results of different models are close, we wanted to make sure we have a rather large grid for searching the hp for each model separately. So you can see different values for number of layers, number of heads, etc. And basically we prioritized getting best possible result over keeping the number of trainale parameters similar across models.
Hope this answers your question. Let us know if you have any issue reproducing the result or if by any chance you ended up with an hp for any of these models that gives you better results than what we reported in the paper.

@alexmathfb
Copy link

alexmathfb commented Aug 29, 2021

We had extensive hp search for every single model to make sure that we have best possible results from each.

Was the hyperparameters from the hp search used to make Table 1, or was the hp search done after Table 1? I'm confused because article says all Transformer models used the same fixed hyperparameters, but the result of hp search gave different hyperparameters

" The large search space motivates us to follow a set of fixed hyperparameters (number of layers, heads, embedding dimensions, etc) for all models. "

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants