Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Feedback for “Benchmarking NVIDIA TensorRT-LLM” #4222

Open
Bazza-63 opened this issue Dec 4, 2024 · 0 comments
Open

docs: Feedback for “Benchmarking NVIDIA TensorRT-LLM” #4222

Bazza-63 opened this issue Dec 4, 2024 · 0 comments
Labels
type: docs Improvements or additions to documentation

Comments

@Bazza-63
Copy link

Bazza-63 commented Dec 4, 2024

Would it be possible for you to compare TensorRT-LLM against MLC LLM with the TVM Unity compiler?

For me when using a custom, patched ROCm along with the appropriate flags turned on whilst compiling TVM Unity. I managed 120 tokens per second on Qwen 2.5 8B Q8_0. This was with an RX 7800 XT too.

If you're wondering where I got the custom ROCm from, you can find it here:

@github-project-automation github-project-automation bot moved this to Investigating in Jan & Cortex Dec 4, 2024
@imtuyethan imtuyethan added the type: docs Improvements or additions to documentation label Dec 5, 2024
@imtuyethan imtuyethan changed the title Feedback for “Benchmarking NVIDIA TensorRT-LLM” docs: Feedback for “Benchmarking NVIDIA TensorRT-LLM” Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: docs Improvements or additions to documentation
Projects
Status: Investigating
Development

No branches or pull requests

2 participants