Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how much time to evaluate? #11

Open
YueFan1014 opened this issue Jul 25, 2023 · 6 comments
Open

how much time to evaluate? #11

YueFan1014 opened this issue Jul 25, 2023 · 6 comments

Comments

@YueFan1014
Copy link

Hello, I am running
python pasture_runner.py -a src.models.agent_fbe_owl -n 8 --arch B32 --center on a single RTX4090, and after three hours no results are produced. I also encounter the problem raised in [https://github.com//issues/4], but the processes are still running on the GPU.

Therefore l am wondering how much time it usually takes to finish evaluation? Or is the program just stuck in somewhere thus producing no results? Thanks.

@sagadre
Copy link
Collaborator

sagadre commented Jul 25, 2023

Hi @YueFan1014 is anything getting written to a results/ folder? Are you able to follow the pointers here? Trying to understand if things are running slowly or not at all. Thanks!

@YueFan1014
Copy link
Author

Hi @YueFan1014 is anything getting written to a results/ folder? Are you able to follow the pointers here? Trying to understand if things are running slowly or not at all. Thanks!

A folder longtail_longtail_fbe_owl-b32-openai-center was created under results but it is still empty after 15 hours.

@sagadre
Copy link
Collaborator

sagadre commented Jul 26, 2023

Something appears to be locked up. can you try running with -n 1 instead of -n 8 especially if you are using only one GPU? In my experience spawning too many THOR processes on 1 GPU can lead to problems.

@YueFan1014
Copy link
Author

Something appears to be locked up. can you try running with -n 1 instead of -n 8 especially if you are using only one GPU? In my experience spawning too many THOR processes on 1 GPU can lead to problems.

Thanks, I will try it. Besides, how much time it approximately takes to finish python pasture_runner.py -a src.models.agent_fbe_owl -n 1 --arch B32 --center if deployed on a single GPU?

@hszhoushen
Copy link

Hi @YueFan1014 is anything getting written to a results/ folder? Are you able to follow the pointers here? Trying to understand if things are running slowly or not at all. Thanks!

A folder longtail_longtail_fbe_owl-b32-openai-center was created under results but it is still empty after 15 hours.

I used the same number of GPUs but the error (queue.Empty #4) still exists, how do I fix this?

@AmingWu
Copy link

AmingWu commented Jan 5, 2024

@hszhoushen , How long does you need to train? Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants