-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RTC Address Patching #802
Comments
|
This would be the preferred option. |
My primary concern for this is how we would represent the arrays as symbols.
So really we'd want something like I doubt CUDA would report this out of bounds, and prevent us from doing so. But there's always the potential that it would cause the compiler to optimise differently. |
Would be fairly trivial to expose CUjit options on deserialize, just need to forward them through the lower 2 items of that list. Have applied that here: https://github.com/Robadob/jitify/tree/expose_cujit It's niche enough, that they may request a usage example if a PR is created. Similarly, we'd require some bigger fgpu2 changes, to modify curve to instead deserialize with specific options. Worth doing a smaller proof of concept first. |
I started writing a proof of concept here. When executed, and the call to This isn't among the documented return values, so it's not clear if that's a documentation issue, I've missed something, or I did make a mistake. There's a sample which passes (different) Cujit options here, and I can't see any mistakes in what I've done at a glance. Googling one of the enums, the only relevant result (e.g. not documentation or headers) seems to be this comment on an old numba issue. Where someone suggests it's not clearly documented, but actually appears to refer to copying data from host to device. So it's not clear whether anyone else has used this particular cutjit functionality. Submitting a bug report, with the proof of concept (and stating it gives an undocumented return value), might get you a clearer explanation. |
Nope, just tested it no warning now but same unsupported return code. |
Same unsupported return code when doing this directly in |
I have asked internally about this and I'll let you know when I hear back. If my current approach doesn't work I'll file a bug internally so someone can explain what is happening. If it comes to that do you approve my sharing of the reproducer you discuss above? |
@mattmartineau Of course. No problem. |
This was confirmed by @mattmartineau as not possible. |
RTC implementation currently requires addresses of agent and message vectors to be passed at runtime to avoid recompiling as these addresses change. They change frequently via swaps or whenever they are resized. Baking the raw address into the binary would be much faster. It would allow the compiler to provide a direct lookup in memory and reduce latency of offsetting into the runtime address which must be either cached in registers or looked-up from memory.
This has not previously been explores as RTC compilation and module loading is slooooow. There are two separate issues however. Compilation can be speeded up (#602) AND the resulting ptx could be patched directly and module creation time to embed the address directly.
nvrtc
supports this patching with thecuModuleLoadDataEx
function which has anCUjit_option
. The option allows (amongst other things)CU_JIT_GLOBAL_SYMBOL_NAMES
andCU_JIT_GLOBAL_SYMBOL_ADDRESSES
. It effectively means that once you have ptx you can create a module by patching symbol addresses. E.g. We can store the compiled ptx and recreate the module with patched values whenever they change.Jitify supports the use of
CUjit_options
although the interface for accessing this is unclear.The text was updated successfully, but these errors were encountered: