-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem in position embedding #4
Comments
Good catch! The blow up curves your are seeing are similar to the ones we were seeing before we introduced qk norm for the smaller models. Will do some testing with this fix on my end as well. Would you like to open a PR? |
Wow, amazing catch! We really appreciate this. |
We've added your name to the README because this is a very substantial bug catch. It's pretty interesting that our first 1B/7B runs do pretty well even without proper posembeds, but we should fix this going forward. |
Great code base by the way. It's a pleasure to read. |
looking into a way to implement this directly with the xformers api. thanks so much @jmercat ! |
Line 129 in 9b3ca53
|
The problem actually seems to be upstream in xformers. Opened an issue here: facebookresearch/xformers#841 |
add figure 1
open_lm/open_lm/model.py
Line 129 in 619a8b3
It seems to me that the rotary position embedding is being applied on the head dimension (dim -2) of the vectors q, k instead of the sequence dimension (dim 1).
I think the head and sequence dimensions should be swapped before calling position embedding .
(see https://github.com/facebookresearch/xformers/blob/748c159096d4f9fcfe3eaf22801e5aed4777210b/xformers/components/positional_embedding/rotary.py#L85)
What I'm proposing is simply to re-write RotaryWithCast as follow:
The text was updated successfully, but these errors were encountered: