-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minimum requirements of VRAM #16
Comments
This might depend on your PyTorch/CUDA/cuDNN version. With the one I use, this line is necessary to prevent a CUDA error. If it works fine without it in your setup you could comment that line out, I don't think it will affect the predictions. See: https://pytorch.org/docs/1.7.1/_modules/torch/nn/modules/rnn.html#RNNBase.flatten_parameters I use 4 Titan Pascal. |
Thank you for quick reply! I have another question regarding your Dockerfile. |
Hello, We did not use a docker for training but uses a conda env to manage the dependencies. Let me know if you have any issues with the dependencies and I am happy to take a look. |
Hi, thank you for your message. I managed to create a docker image for training and maintain the dependencies, and apparently all of the modules provided in lav folder are working ok. However, while individual training logs in wandb seemed fine, when I use segmentation model that I trained myself to perform point painting or train full model, the segmentation performance decreases quite a lot probably due to seg_model.eval() call, like here Line 57 in dc9b4cf
I wonder if you have experienced the same issue when you trained your models provided in weights folder. This seems to be somewhat related to the switching behavior stuff of BatchNorm / Dropout between training and testing, but I couldn't figure it out yet. Do you have any idea about this? |
Hi Chen, thank you so much for sharing your amazing work!
I tried and was able to run your pretrained agent with weight checkpoints in weights folder.
Currently, I decided to reproduce your work from training steps described in TRAINING.md.
At this moment, I am at the step of training privileged motion planner and getting CUDA out of memory error with your dev_test version of dataset which I think is small enough to start with.
Apparently, when I ran
python -m lav.train_bev
, this line of code is consuming a lot of gpu memory and causing the above error.btw, my ubuntu machine has two little older Titan X with 12GB VRAM each.
I am wondering what the requirements of graphic cards' spec is to reproduce this work from scratch.
Is my pc not enough to do this or can you tell me about your machine's specification?
The text was updated successfully, but these errors were encountered: