You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Not a problem. An idea about how the TensorRT performance can be improved.
I checked Profile-Guided Optimization (PGO) and Post-Link Optimization (PLO) improvements on multiple projects. The results are available here. According to the tests, these optimizations can help with achieving better performance in many cases for many applications: compilers and interpreters, static analysis, databases, networking, etc. Since this, I think optimizing TensorRT (its C++ part) with PGO and PLO would be a good idea.
Describe the solution you'd like
I can suggest the following things:
Perform PGO benchmarks on TensorRT. If it shows improvements - add a note to the documentation about possible improvements in TensorRT performance with PGO.
Providing an easier way (e.g. a build option) to build scripts with PGO can be helpful for the end-users and maintainers since they will be able to optimize TensorRT according to their workloads.
Optimize pre-built TensorRT binaries
Additional context
As an additional optimization step after PGO, I can suggest Post-Link Optimization (PLO) with a tool like LLVM BOLT. I think it's still worth evaluating it only after the PGO integration into TensorRT.
Do you think this is more geared towards TensorRT itself or the PyTorch extension? This might be more relevant to open in https://github.com/nvidia/pytorch
Is your feature request related to a problem? Please describe.
Not a problem. An idea about how the TensorRT performance can be improved.
I checked Profile-Guided Optimization (PGO) and Post-Link Optimization (PLO) improvements on multiple projects. The results are available here. According to the tests, these optimizations can help with achieving better performance in many cases for many applications: compilers and interpreters, static analysis, databases, networking, etc. Since this, I think optimizing TensorRT (its C++ part) with PGO and PLO would be a good idea.
Describe the solution you'd like
I can suggest the following things:
Additional context
As an additional optimization step after PGO, I can suggest Post-Link Optimization (PLO) with a tool like LLVM BOLT. I think it's still worth evaluating it only after the PGO integration into TensorRT.
Here I collected several PGO-related links (more PGO-related materials available at https://github.com/zamazan4ik/awesome-pgo/).
Examples of how PGO optimization is integrated into other projects:
configure
scriptI have some examples of how PGO information looks in the documentation:
Regarding LLVM BOLT integration, I have the following examples:
The text was updated successfully, but these errors were encountered: