docs: Feedback for “Benchmarking NVIDIA TensorRT-LLM” #4222

Bazza-63 · 2024-12-04T20:28:41Z

Would it be possible for you to compare TensorRT-LLM against MLC LLM with the TVM Unity compiler?

For me when using a custom, patched ROCm along with the appropriate flags turned on whilst compiling TVM Unity. I managed 120 tokens per second on Qwen 2.5 8B Q8_0. This was with an RX 7800 XT too.

If you're wondering where I got the custom ROCm from, you can find it here:

https://github.com/lamikr/rocm_sdk_builder

github-project-automation bot added this to Jan & Cortex Dec 4, 2024

github-project-automation bot moved this to Investigating in Jan & Cortex Dec 4, 2024

imtuyethan added the type: docs Improvements or additions to documentation label Dec 5, 2024

imtuyethan changed the title ~~Feedback for “Benchmarking NVIDIA TensorRT-LLM”~~ docs: Feedback for “Benchmarking NVIDIA TensorRT-LLM” Dec 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: Feedback for “Benchmarking NVIDIA TensorRT-LLM” #4222

docs: Feedback for “Benchmarking NVIDIA TensorRT-LLM” #4222

Bazza-63 commented Dec 4, 2024

docs: Feedback for “Benchmarking NVIDIA TensorRT-LLM” #4222

docs: Feedback for “Benchmarking NVIDIA TensorRT-LLM” #4222

Comments

Bazza-63 commented Dec 4, 2024