You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Would it be possible for you to compare TensorRT-LLM against MLC LLM with the TVM Unity compiler?
For me when using a custom, patched ROCm along with the appropriate flags turned on whilst compiling TVM Unity. I managed 120 tokens per second on Qwen 2.5 8B Q8_0. This was with an RX 7800 XT too.
If you're wondering where I got the custom ROCm from, you can find it here:
Would it be possible for you to compare TensorRT-LLM against MLC LLM with the TVM Unity compiler?
For me when using a custom, patched ROCm along with the appropriate flags turned on whilst compiling TVM Unity. I managed 120 tokens per second on Qwen 2.5 8B Q8_0. This was with an RX 7800 XT too.
If you're wondering where I got the custom ROCm from, you can find it here:
The text was updated successfully, but these errors were encountered: