-
Notifications
You must be signed in to change notification settings - Fork 287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can I train OpenFlamingo without LIAON dataset? #215
Comments
Hi, thanks for your question! The code is not currently configured like this, but it wouldn't be hard to implement (similar to #145). If you'd like to contribute a PR, this would make a great first issue! Regarding nan losses: great question. This bit of code originally sought to catch cases where the mmc4 sequence looks like "text text ". In this case, all labels are masked to -100, since there are no text tokens after image tokens. We later updated data.py upstream to prevent these sequences from being sampled, so that issue is resolved. There may still be nan cases from training at larger scales than 9B. We have not worked with those scales yet to observe them. |
Thanks for your kind reply. I am planning to contribute a PR to make this project more complete :). I will close this issue after I finish the PR. |
Good point, we need to train on smaller dataset. Wish we can get an example workflow. |
Thanks for your great job. I wonder whether we can train open flamingo with MMC4 dataset only and I wonder why the loss from MMC4 dataset could be
nan
.Thanks for your explanation.
The text was updated successfully, but these errors were encountered: