-
Notifications
You must be signed in to change notification settings - Fork 460
Issues: qwopqwop200/GPTQ-for-LLaMa
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Syntax changed in triton.testing.do_bench() causing error when running llama_inference.py
#285
opened Dec 10, 2023 by
prasanna
error: block with no terminator, has llvm.cond_br %5624, ^bb2, ^bb3
#283
opened Sep 19, 2023 by
Hukongtao
Transformers broke again (AttributeError: 'GPTQ' object has no attribute 'inp1')
#280
opened Jul 29, 2023 by
EyeDeck
Help: Quantized llama-7b model with custom prompt format produces only gibberish
#276
opened Jul 15, 2023 by
Glavin001
High PPL when groupsize != -1 for OPT model after replace linear layer with quantlinear.
#275
opened Jul 6, 2023 by
hyx1999
Proposed changes to reduce VRAM usage. Potentially quantize larger models on consumer hardware.
#269
opened Jun 25, 2023 by
sigmareaver
The detected CUDA version (12.1) mismatches the version that was used to compile PyTorch (11.7)
#262
opened Jun 15, 2023 by
siddhsql
[Question] What is the expected discrepancy between simulated and actually computed values?
#261
opened Jun 13, 2023 by
set-soft
Previous Next
ProTip!
Add no:assignee to see everything that’s not assigned.