RTC Address Patching #802

mondus · 2022-02-18T17:17:00Z

RTC implementation currently requires addresses of agent and message vectors to be passed at runtime to avoid recompiling as these addresses change. They change frequently via swaps or whenever they are resized. Baking the raw address into the binary would be much faster. It would allow the compiler to provide a direct lookup in memory and reduce latency of offsetting into the runtime address which must be either cached in registers or looked-up from memory.

This has not previously been explores as RTC compilation and module loading is slooooow. There are two separate issues however. Compilation can be speeded up (#602) AND the resulting ptx could be patched directly and module creation time to embed the address directly.

nvrtc supports this patching with the cuModuleLoadDataEx function which has an CUjit_option. The option allows (amongst other things) CU_JIT_GLOBAL_SYMBOL_NAMES and CU_JIT_GLOBAL_SYMBOL_ADDRESSES. It effectively means that once you have ptx you can create a module by patching symbol addresses. E.g. We can store the compiled ptx and recreate the module with patched values whenever they change.

Jitify supports the use of CUjit_options although the interface for accessing this is unclear.

The text was updated successfully, but these errors were encountered:

Robadob · 2022-02-18T17:19:36Z

Jitify supports the use of CUjit_options although the interface for accessing this is unclear.

CUDAKernel is an internal class, best I can see. You would need to reimplement the deserialize code I presume and call CUDAKernel yourself directory (or modify Jitify to expose the options and put in a pull request).

mondus · 2022-02-18T17:23:10Z

or modify Jitify to expose the options and put in a pull request

This would be the preferred option.

Robadob · 2022-02-18T17:23:12Z

My primary concern for this is how we would represent the arrays as symbols.

__device__ void *my_variable_array; Is still a symbol holding a pointer that has to be looked up.

So really we'd want something like __device__ char my_variable_array[1];, and then read 'out of bounds', knowing it will still be in bounds. As a device symbol array requires a fixed size. Or some other thing, where we make a false symbol point to memory we want, then we reference it and reinterpret the pointer as what we actually want.

I doubt CUDA would report this out of bounds, and prevent us from doing so. But there's always the potential that it would cause the compiler to optimise differently.

Robadob · 2022-02-21T16:21:00Z

CUDAKernel constructor takes the CUJit options.
KernelInstantiation creates the CUDAKernel
KernelInstantiation::deserialize() creates the KernelInstantiation.
We call that here and here.

Would be fairly trivial to expose CUjit options on deserialize, just need to forward them through the lower 2 items of that list. Have applied that here: https://github.com/Robadob/jitify/tree/expose_cujit

It's niche enough, that they may request a usage example if a PR is created.

Similarly, we'd require some bigger fgpu2 changes, to modify curve to instead deserialize with specific options. Worth doing a smaller proof of concept first.

Robadob · 2022-02-22T17:42:53Z

I started writing a proof of concept here.

When executed, and the call to KernelInstantiation::deserialize() is triggered, CUDA_ERROR_NOT_SUPPORTED is returned by cuModuleLoadDataEx().

This isn't among the documented return values, so it's not clear if that's a documentation issue, I've missed something, or I did make a mistake.

There's a sample which passes (different) Cujit options here, and I can't see any mistakes in what I've done at a glance.

Googling one of the enums, the only relevant result (e.g. not documentation or headers) seems to be this comment on an old numba issue. Where someone suggests it's not clearly documented, but actually appears to refer to copying data from host to device. So it's not clear whether anyone else has used this particular cutjit functionality.

Submitting a bug report, with the proof of concept (and stating it gives an undocumented return value), might get you a clearer explanation.

Robadob · 2022-02-23T18:50:31Z

~~One option I didn't try, is passing -rdc=true, and extern the device symbol. (Extern-ing without rdc is redundant, get a warning saying it will be treated as static)~~

Nope, just tested it no warning now but same unsupported return code.

mondus · 2022-02-24T09:19:39Z

Same unsupported return code when doing this directly in nvrtc and outside of jitify: https://github.com/mondus/CUJIT_Experiment

mattmartineau · 2022-05-27T10:27:24Z

I have asked internally about this and I'll let you know when I hear back.

If my current approach doesn't work I'll file a bug internally so someone can explain what is happening. If it comes to that do you approve my sharing of the reproducer you discuss above?

mondus · 2022-05-27T10:58:18Z

@mattmartineau Of course. No problem.

Robadob · 2023-07-03T10:51:48Z

This was confirmed by @mattmartineau as not possible.

Robadob added optimisation RTC labels Feb 21, 2022

Robadob closed this as completed Jul 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RTC Address Patching #802

RTC Address Patching #802

mondus commented Feb 18, 2022

Robadob commented Feb 18, 2022 •

edited

Loading

mondus commented Feb 18, 2022

Robadob commented Feb 18, 2022 •

edited

Loading

Robadob commented Feb 21, 2022 •

edited

Loading

Robadob commented Feb 22, 2022 •

edited

Loading

Robadob commented Feb 23, 2022 •

edited

Loading

mondus commented Feb 24, 2022

mattmartineau commented May 27, 2022

mondus commented May 27, 2022

Robadob commented Jul 3, 2023

RTC Address Patching #802

RTC Address Patching #802

Comments

mondus commented Feb 18, 2022

Robadob commented Feb 18, 2022 • edited Loading

mondus commented Feb 18, 2022

Robadob commented Feb 18, 2022 • edited Loading

Robadob commented Feb 21, 2022 • edited Loading

Robadob commented Feb 22, 2022 • edited Loading

Robadob commented Feb 23, 2022 • edited Loading

mondus commented Feb 24, 2022

mattmartineau commented May 27, 2022

mondus commented May 27, 2022

Robadob commented Jul 3, 2023

Robadob commented Feb 18, 2022 •

edited

Loading

Robadob commented Feb 18, 2022 •

edited

Loading

Robadob commented Feb 21, 2022 •

edited

Loading

Robadob commented Feb 22, 2022 •

edited

Loading

Robadob commented Feb 23, 2022 •

edited

Loading