-
Notifications
You must be signed in to change notification settings - Fork 16
Using and Compiling with YAKL
To use YAKL in a source file, you must include the YAKL.h
header file. This includes nearly all of the YAKL library.
#include "YAKL.h"
int main(int argc, char **argv) {
MPI_Init(&argc , &argv);
yakl::init();
{
...
}
yakl::finalize();
MPI_Finalize();
}
It is important that yakl::init()
be called before any YAKL operations are performed (creating Array
objects, invoking parallel_for
s, etc.). It is also important that all operations be completed and all Array
objects be free'd before calling yakl::finalize()
. If you are using MPI, please be sure to initialize MPI before initializing YAKL. This is because YAKL needs the MPI rank to ensure that only rank 0
writes to stdout when informing the user.
The YAKL.h
file excludes the following, which you will need to include separately if you want to use that functionality:
-
#include "YAKL_netcdf.h"
: For NetCDF I/O -
#include "YAKL_pnetcdf.h"
: For Parallel NetCDF I/O -
#include "YAKL_fft.h"
: For simple and small FFTs performed onSArray
andFSArray
objects -
#include "YAKL_tridiagonal.h"
: For small tridiagonal solves onSArray
andFSArray
objects -
#include "YAKL_pentadiagonal.h"
: For small pentadiagonal solves onSArray
andFSArray
objects
It is highly recommended to use the YAKL CMake integration via add_subdirectory(/path/to/yakl /path/to/binary/location)
.
To use YAKL in another CMake project, the following is a template workflow to use in your CMakeLists.txt, assuming a target called TARGET
and a yakl repo in the directory ${YAKL_HOME}
.
# YAKL_ARCH can be CUDA, HIP, SYCL, OPENMP, or empty (for serial CPU backend)
set(YAKL_ARCH "CUDA")
# Set YAKL_[ARCH_NAME]_FLAGS where ARCH_NAME is CUDA, HIP, SYCL, OPENMP, or CXX (for CPU targets)
set(YAKL_CUDA_FLAGS "-O3 -arch sm_70 -ccbin mpic++")
# YAKL_F90_FLAGS will be used for compiling internal YAKL Fortran 90 files
# Currently only for gator_mod.F90, Fortran hooks to the YAKL pool allocator
# and YAKL init() and finalize()
set(YAKL_F90_FLAGS "-O3")
# Add the YAKL library and perform other needed target tasks
# This provides a CMake status message giving the flags YAKL is using for all YAKL source files
# This will also set ${YAKL_COMPILER_FLAGS} in the parent scope for the user to check
add_subdirectory(${YAKL_HOME} ./yakl)
# Set YAKL properties on all C++ source files in a target,
# link yakl into that target, and set the appropriate C++ standard
include(${YAKL_HOME}/yakl_utils.cmake)
yakl_process_target(TARGET)
YAKL's yakl_process_target()
macro processes the target's C++ source files, it will automatically link the yakl
library target into the TARGET
you pass in, and it will set the appropriate C++ standard. For the CUDA backend, YAKL will set all C++ source files as CUDA files for CMake compilation.
Important:
-
All of the processed target's C++ files are processed with YAKL's flags and other tasks. If this is not desirable, and you'd rather deal with a list of C++ source files, you can use
yakl_process_cxx_source_files("${files_list}")
, where${files_list}
is a list of C++ source files you want to process with YAKL flags and cmake attributes. Be sure not to forget the quotations around the variable, or the list may not properly make its way to the macro. When usingyakl_process_cxx_source_files
, you need to also calltarget_link_libraries()
yourself and set the appropriate C++ standard yourself. -
The only flags that make their way into YAKL source files come from the CMake variable
YAKL_<LANG>_FLAGS
(where<LANG>
the same asYAKL_ARCH=="<LANG"
except whenYAKL_ARCH
isn't defined, in which case<LANG>==CXX
). If you have set theCXXFLAGS
environment variable, then those will also be included for all backends except CUDA, in which case the environment variable,CUDAFLAGS
will be included. YAKL does not use theCMAKE_CXX_FLAGS
flags.
The following compiler flags will also affect YAKL behavior:
-
-DYAKL_DEBUG
: Turns on YAKL's debugging, including checks for array index bounds, invalid host and device data accesses, use of uninitializedArray
objects, and more. This is valid for all hardware backends, and it does significantly slow down the code. -
-DYAKL_MANAGED_MEMORY
: Leads to CUDA, HIP, and SYCL backends using Managed memory allocations. If-D_OPENACC
is also specified, then managed memory allocations will also tell the OpenACC runtime to ignore memory addresses in the ranges allocated. if-D_OPENMP45
is also specified, then the same thing is done for the OpenMP runtime. This causes those runtimes to avoid moving the data, since the Managed memory runtime will automaticall do this. -
-DHAVE_MPI
: Highly recommended for all MPI applications to allow YAKL to only inform the user from the master task (rank0
) -
-DMEMORY_DEBUG
: Turns on memory debugging printing to help the user identify memory issues. This will do a lot of output, and it is recommended to only do this on runs with one MPI task. -
-DYAKL_PROFILE
: Turn on the YAKL timers so that calls toyakl::timer_start()
andyakl::timer_stop()
will actually call the timers. -
-DYAKL_AUTO_PROFILE
: Turn on the YAKL timers, and automatically add timers around allparallel_for
calls that provide a label.
Here is a table describing the CMake options to enable a given backend. YAKL automatically sets the C++ standard to 17 inside CMake for all YAKL source files.
Backend | -DYAKL_ARCH |
Setting flags | Notes |
---|---|---|---|
None / CPU | -DYAKL_ARCH="" |
-DYAKL_CXX_FLAGS="..." |
You must add all flags yourself. |
OpenMP 3.5 | -DYAKL_ARCH="OPENMP" |
-DYAKL_OPENMP_FLAGS="..." |
You must add all flags yourself, including OpenMP-enabling flags. |
CUDA | -DYAKL_ARCH="CUDA" |
-DYAKL_CUDA_FLAGS="..." |
YAKL automatically adds -x cu --expt-extended-lambda --expt-relaxed-constexpr -Wno-deprecated-gpu-targets -DTHRUST_IGN ORE_CUB_VERSION_CHECK -std=c++17 . You must add -arch sm_?? yourself along with any other CUDA specific options you want. Please do not specify -std=c++?? in your YAKL_CUDA_FLAGS . CUDA doesn't like multiple definitions of the standard. If -std=c++?? is added through other avenues in CMake, this may also cause a problem. CUDA also does not like multiple optimization level flags -O[0-4] . |
HIP | -DYAKL_ARCH="HIP" |
-DYAKL_HIP_FLAGS="..." |
You must add all flags yourself. |
SYCL | -DYAKL_ARCH="SYCL" |
-DYAKL_SYCL_FLAGS="..." |
You must add all flags yourself. |
If you want to use a Makefile
approach, then when you want to use a given YAKL backend, you'll need to specify -DYAKL_ARCH_<LANG>
, where <LANG>
is one of the previously mentioned backend labels (CXX, CUDA, HIP, SYCL, or OPENMP). You're on your own regarding any other compilation issues, but you can use the CMakeLists.txt
file for guidance. It is kept as simple as possible for this purpose.