-
Notifications
You must be signed in to change notification settings - Fork 16
How do I shield host code from the device compiler?
There are many cases where you will have a YAKL_INLINE
function with both host and device code. However, you will get compile-time warnings with CUDA and HIP and even errors with SYCL when the device compiler pass sees code that's valid only on the host such as manipulating CPU pointers (reference counting, allocation, and free, for instance). To avoid this, you need to shield the host code from the device compiler pass. There are also times you may wish to shield device code from a host compiler pass.
This is frequent, for instance, in classes that manage allocation, deallocation and optionally reference counting in copy and move constructors. When copying or moving the class on the host and destroying an object of the class, you often increment and decrement a reference counter. Also, you allocate on the host in the constructor and deallocate on the host in the destructor.
To hide allocation, deallocation, and reference counting code from the device compiler, please use the YAKL_EXECUTE_ON_HOST_ONLY( // code )
and YAKL_EXECUTE_ON_DEVICE_ONLY( // code )
macro functions. E.g.,
template <class T>
class Myclass {
int num;
T * data;
public:
YAKL_INLINE Myclass(int num) {
YAKL_EXECUTE_ON_HOST_ONLY( cudaMalloc( &data , num*sizeof(T) ); )
}
YAKL_INLINE ~Myclass() {
YAKL_EXECUTE_ON_HOST_ONLY( cudaFree( data ); )
}
};
You need to be careful when use these. On non-separate-memory backends (CPU and OpenMP CPU threading, for instance), code on host and device will always execute while for separate memory backends like CUDA, HIP, and SYCL, only one of them will execute per compiler pass. Therefore, on non-separate-memory backends, the following would lead to two stdout outputs on non-separate-memory backends but only one stdout output on separate memory backends:
YAKL_EXECUTE_ON_HOST_ONLY( std::cout << "Inform the user: " << __FILE__ << ": " << __LINE__ << std::endl; )
YAKL_EXECUTE_ON_DEVICE_ONLY( printf("Inform the user\n") )
In this case, it's not so bad, but if there were functions in there, on non-separate memory backends, the function would execute twice for non-separate-memory backends but only once for separate memory backends, which will result in inconsistent and likely incorrect behavior among the backends.
On some backends, you'll get an error when the code inside those macros contains a leading parenthesis, e.g., a reference counter increment:
YAKL_EXECUTE_ON_HOST_ONLY( (*refCount)++; )
In this case, you can work around those error sometimes by placing curly braces around the statement, e.g.:
YAKL_EXECUTE_ON_HOST_ONLY( { (*refCount)++; } )