-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reducing RTC compilation time #402
Comments
While waiting for the python test suite to run, i've manually looked through the header includes for RTC. Note that this was done mid namespaces branch, so filenames / location / includes may be a little incorrect / out of date when coming back to this.
|
Most messaging types do inherit from brute force.
e.g.
```c
class MsgSpatial2D::Out : public MsgBruteForce::Out { ```
Would be useful to trim the list of repeated includes, because the cost of
a double include shouldn't matter, it's ignored with `#ifdef`
…On Tue, 29 Jun 2021 at 14:32, Peter Heywood ***@***.***> wrote:
While waiting for the python test suite to run, i've manually looked
through the header includes for RTC.
Reducing some of these may improve rtc build times.
Note that this was done mid namespaces branch, so filenames / location /
includes may be a little incorrect / out of date when coming back to this.
- curve_rtc_dynamic.cu
- FGPUDeviceException.h
- "flamegpu/gpu/CUDAScanCompaction.h"
- Doesn't have any includes, doesn't *neeed* cuda currently?
(no cuda symbols in the header).
- "flamegpu/exception/FLAMEGPUDeviceException_device.h" (if
seatbelts)
- cuda_runtime - Doesn't seem required?
- device_launch_parameters - Doesn't seem required?
- <cstring>
- DeviceEnvironment.h
- <cstdint>
- <string>
- <cassert>
- _impl.cu
- flamegpu/runtime/DeviceAPI.h
+
- <cassert>
- <cstdint>
- <limits>
- "dynamic/curve_rtc_dynamic.h" ifdef __CUDACC_RTC__
- "flamegpu/runtime/cuRVE/curve.h" ifndef __CUDACC_RTC__
- "flamegpu/runtime/utility/AgentRandom.cuh"
- <curand_kernel.h>
- <cassert>
- "flamegpu/util/StaticAssert.h"
- <cstdint>
- "flamegpu/exception/FLAMEGPUDeviceException.h"
- <cuda_runtime.h> - contains cudaStream_t.
- <device_launch_parameters.h> - not required?
- <string>
- <type_traits>
- "flamegpu/gpu/CUDAScanCompaction.h"
- None
- "flamegpu/exception/FLAMEGPUDeviceException_device.h" if
SEATBELTS
- cuda_runtime - Doesn't seem required?
- device_launch_parameters - Doesn't seem required?
- <cstring>
- "flamegpu/runtime/utility/DeviceEnvironment.cuh"
- <cstdint>
- <string>
- <cassert>
- "flamegpu/gpu/CUDAScanCompaction.h"
- None
- "flamegpu/runtime/AgentFunction.cuh"
- <cuda_runtime.h> - __syncthreads() when built by cuda in a
templated method.
- <device_launch_parameters.h> - don't think this is needed
in header
- <curand_kernel.h> - needed.
- "flamegpu/defines.h"
- "flamegpu/exception/FLAMEGPUDeviceException.h"
- <cuda_runtime.h> - contains cudaStream_t.
- <device_launch_parameters.h> - not required?
- <string>
- <type_traits>
- "flamegpu/gpu/CUDAScanCompaction.h"
- None
- "flamegpu/exception/FLAMEGPUDeviceException_device.h" if
SEATBELTS
- cuda_runtime - Doesn't seem required?
- device_launch_parameters - Doesn't seem required?
- <cstring>
- "flamegpu/runtime/AgentFunction_shim.h"
- None
- "flamegpu/gpu/CUDAScanCompaction.h"
- None
- Uses Curve::NamespaceHash but doesn't directly include the
source of that?
- "flamegpu/runtime/AgentFunctionCondition.h"
- <cuda_runtime.h> - __syncthreads in templated method.
- <device_launch_parameters.h> - not needed?
- "flamegpu/runtime/DeviceAPI.h"
- Include cycle, but guarded.
- "flamegpu/runtime/AgentFunctionCondition_shim.h"
- None, just defines a macro (which needs
flamegpu::ReadOnlyDeviceAPI) + __device__ / __forceinline__
- "flamegpu/gpu/CUDAScanCompaction.h"
- None
- Uses Curve::NamespaceHash but not directly included.
- Curand type used, but not directly included.
- "flamegpu/runtime/messaging_device.h"
- "flamegpu/runtime/messaging/None/NoneDevice.h"
- "flamegpu/runtime/cuRVE/curve.h" - If __CUDACC_RTC__
- "flamegpu/runtime/messaging/None.h"
- No includes
-
"flamegpu/runtime/messaging/BruteForce/BruteForceDevice.h"
- "flamegpu/runtime/messaging/None.h"
- "flamegpu/runtime/messaging/BruteForce.h"
- "flamegpu/runtime/messaging/None.h" - not actually
inherrited from / required?
- "flamegpu/runtime/messaging/Spatial2D/Spatial2DDevice.h"
- "flamegpu/runtime/messaging/Spatial2D.h"
- "flamegpu/runtime/messaging/BruteForce.h" - not
actually inherrited from / required?
-
"flamegpu/runtime/messaging/BruteForce/BruteForceDevice.h"
- Above
- "flamegpu/runtime/messaging/Spatial3D/Spatial3DDevice.h"
- "flamegpu/runtime/messaging/Spatial3D.h"
- "flamegpu/runtime/messaging/BruteForce.h" - not
actually inherrited from / required?
- "flamegpu/runtime/messaging/Spatial2D/Spatial2DDevice.h"
- not required? only needs bruteDevice
-
"flamegpu/runtime/messaging/BruteForce/BruteForceDevice.h"
- "flamegpu/runtime/messaging/Array/ArrayDevice.h"
- "flamegpu/runtime/messaging/Array.h"
- #include "flamegpu/runtime/messaging/BruteForce.h" -
not actually required.
-
"flamegpu/runtime/messaging/BruteForce/BruteForceDevice.h"
- not actually required?
- "flamegpu/runtime/messaging/Array2D/Array2DDevice.h"
- "flamegpu/runtime/messaging/Array2D.h"
- "flamegpu/runtime/messaging/BruteForce.h" - not needed
-
"flamegpu/runtime/messaging/BruteForce/BruteForceDevice.h"
- not needed
- "flamegpu/runtime/messaging/Array3D/Array3DDevice.h"
- "flamegpu/runtime/messaging/Array3D.h"
- "flamegpu/runtime/messaging/BruteForce.h" - not req
- "flamegpu/runtime/messaging/Array2D.h"
-
"flamegpu/runtime/messaging/BruteForce/BruteForceDevice.h"
- not requied
- "flamegpu/runtime/messaging/Bucket/BucketDevice.h"
- "flamegpu/runtime/messaging/Bucket.h"
- #ifndef __CUDACC_RTC__
- <memory>
- <string>
- "flamegpu/runtime/cuRVE/curve.h"
- "flamegpu/runtime/messaging/None.h" - used for size
type
- "flamegpu/runtime/messaging/BruteForce.h"
- not required?
-
"flamegpu/runtime/messaging/BruteForce/BruteForceDevice.h"
- This one actually inherits.
- "flamegpu/defines.h"
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#402 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAFVGCUYAL3IAR5VWPDJXHLTVHDO3ANCNFSM4SZ5GHHA>
.
|
The inheritence is mostly in the Yes compile time shouldn't matter (significanlty) with the macro guards, but i'm assuming there will be still some cost associated (the file has to be parsed again line by line during preproc as a minimum), and when the two test suites take 1500s most of which is RTC compilation any saving will add up. |
As an incredibly rough check I've ran I then commented out all messaging other than None and Bruteforce from If we could save 30% from RTC time, then a running both test suites would be over 5 mins quicker, so from a development / testing perspective it seems worthwhile, and will help with large RTC agent models. Alternatively, given the message type is provided in a macro we could potentially only include the relevant message types in RTC to achieve this saving (if duplicate includes are not the issue, hard to know without trying) |
That's a surprising saving imo. We could possibly write something hacky to generate a compact header for RTC if really necessary. |
Couple of simple changes to Running the c++ test suite took 355s (compared to ~500s before?), 29s with RTC cache hits. Running pytest took 870s (compared to > 1000s before? one cuda contest per test so ~100s of context creation extra in python + other python overheads i guess?), 101s with RTC (and python?) cache hits. 144s with the |
My issue seems like a dupe of this: #602 |
RTC compilation takes ~ 5 seconds per agent function.
Currently each agent function is it's own jitify program, so template expansion etc happens every time.
Jitify/nvrtc appears to support passing multiple kernels (agent functions) to a single
program
compilation, and then calling individual kernels from that program.A quick test (adding multiple agent functions to the same string) shows that this has a negligable effect on compilation time, so by only creating one program (per agent?) this could masivley reduce compilation time (for non trivial models), however this will require big changes to RTC Curve due to per-agent-function specialisation within CURVE.
The text was updated successfully, but these errors were encountered: