You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Excuse me, but when the model inference on 1 * RTX4090, running python cli_demo_sat.py --from_pretrained cogcom-base-17b --local_tokenizer tokenizer --english --quant 4, the output will be CUDA out of memory. I wonder if it needs more GPU, or I need to add some arguments? Thank you!
The text was updated successfully, but these errors were encountered:
Excuse me, but when the model inference on 1 * RTX4090, running python cli_demo_sat.py --from_pretrained cogcom-base-17b --local_tokenizer tokenizer --english --quant 4, the output will be CUDA out of memory. I wonder if it needs more GPU, or I need to add some arguments? Thank you!
Hi, thanks for your interest! I am currently trying to investigate this quantization problem.
Excuse me, but when the model inference on 1 * RTX4090, running
python cli_demo_sat.py --from_pretrained cogcom-base-17b --local_tokenizer tokenizer --english --quant 4
, the output will be CUDA out of memory. I wonder if it needs more GPU, or I need to add some arguments? Thank you!The text was updated successfully, but these errors were encountered: