Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NaN #4

Open
jon-hotaisle opened this issue Sep 23, 2024 · 4 comments
Open

NaN #4

jon-hotaisle opened this issue Sep 23, 2024 · 4 comments

Comments

@jon-hotaisle
Copy link

Just doing a bit of debugging.

"val loss" output nan, so I figured start there...

val loss 1 nan
val loss 2 nan

2024-09-23 at 20 40 10 png

But digging higher up, val_num_batches is set to 20, so I'm not sure how this is turning into nan so easily. Feels like something else is up...

@jon-hotaisle
Copy link
Author

@anthonix bump.

@anthonix
Copy link
Owner

Will try and reproduce -- on the list of things to do when I have some spare cycles

@jon-hotaisle
Copy link
Author

Oh, I'm blind (and probably dumb). val_loss must be 0, hence the nan. so it must be something in that gpt2_validate() returning all zeros.

@anthonix
Copy link
Owner

In the mean time, can you verify some other training works, like AMD's tinyllama code they recently released? Or their JAX GPT2 training?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants