-
Notifications
You must be signed in to change notification settings - Fork 16
Multi Dimensional Dynamically Allocated Arrays
// All YAKL Array classes have the following template parameters
// Default memory space depends on whether YAKL_ARCH is defined or not
// Default style is always C-style
template <class T, int rank, int myMem=memDefault, int myStyle=stylec>
// C-style Arrays
Array();
Array(const char *label, int d0 [, int d1,...]); // Owned Array object (reference counted)
Array(const char *label, T *data, int d0 [, int d1,...]); // Non-owned Array object (not ref counted)
// Fortran-style Arrays
Array(const char *label, int d0 [, int d1,...]); // Owned Array object (reference counted)
Array(const char *label, std::vector d0 [, std::vector d1,...]); // Owned Array object (reference counted)
Array(const char *label, T *data, std::vector d0 [, std::vector d1,...]); // Owned Array object (reference counted)
// Fortran-style Arrays with non-1 lower bounds typically use initializer lists
// real(8) :: arr(-1:nx+2)
Array<double,1,memDevice,styleFortran>("arr",{-1,nx+2});
YAKL's multi-dimensional Array
class provides a convenient way to manage multi-dimensional data (up to 8 dimensions). memDevice
Array
objects support any C++ type that is std::is_arithmetic<T>
. memHost
Array
objects support any type, since a constructor can be called on the host. Array
objects come in C-style or Fortran-style. Array
objects can be "owned" (meaning allocation and free is managed by the Array
object, and references are counted); or they can be "non-owned", meaning no allocation, free, or reference counting is handled by the Array
object. Reference counting means keeping track of how many Array
objects are aliasing the same data pointer, to avoid deallocating the data while it's still being used.
Array
objects are indexed via the C++ operator()
overload:
Array<float,3,memHost,styleC> a("a",4,3,2);
for (int k=0; k < 4; k++) {
for (int j=0; j < 3; j++) {
for (int i=0; i < 2; i++) {
a(k,j,i) = -999.;
}
}
}
C-style Array
objects have lower bounds of zero and row-major index ordering (meaning the right-most index varies the fastest).
Fortran-style Array
objects have lower bounds that default to 1 but can change and column-major index ordering (meaning the left-most index varies the fastest).
All YAKL Array
objects have data that is contiguous in memory with no striding.
YAKL Array
objects use what is called "shallow copy" by default. What this means is that the copy and move constructors / assignment operators only copy the metadata of the Array and the pointer address, not the data itself. To copy the data itself is called a "deep copy". Therefore, you can performantly pass Array
objects by value because only a small amount of data is actually copied.
Array<float,2,memDevice,styleC> a, b, c, d;
// This allocates an owned Array object "a"
a = Array<float,2,memDevice,styleC>("a",100);
yakl::memset(a,0.f);
// This causes b to share the data pointer of A, and it's *very cheap* to perform
b = a; // This is cheap
// The reference counter shared by a and b is now "2"
// Also note that b's label is "a" because a's metadata was copied over
// Because b shared a's data pointer, the following changes a's data as well
memset(b,1.f);
std::cout << a; // this will print all "1"s
// This creates a non-owned Array object that uses the data pointer from a and b
c = Array<float,2,memDevice,styleC>("c",b.data(),100); // This is cheap
// Note that c is *not* reference counted, and no metadata is copied into c.
// Therefore, if you said c is 101 elements, you'll get a memory error
// Also, if you deallocate a's and b's shared data pointer while still using c,
// you'll get a memory error.
// non-owned constructors should be used with significant caution.
// Frankly, they should only be used when converting another language's / framework's
// contiguous data into a YAKL Array object (e.g., from Fortran or Kokkos).
memset(c,2.f);
std::cout << a; // This will be all 2's
std::cout << b; // This will be all 2's
memset(b,3.f);
std::cout << c; // This will be all 3's
// d is its own separate Array object of size 100
// It does not share the data pointer of a, b, and c
d = Array<float,2,memDevice,styleC>("d",100);
// Copy the data from the pointer shared by a, b, and c to d's pointer
c.deep_copy_to(d); // This is expensive
// d has a separate data pointer from a, b, and c.
// Therefore, changes to a, b, and c do not affect d and vice versa
memset(d,4.f);
std::cout << d; // This will be all 4's
std::cout << a; // This will be all 3's
// This creates e as an Array<float,1,memDevice,styleC> object of size 100,
// and it deep copies d's data to e's data pointer
auto e = d.createDeviceCopy(); // This is expensive
memset(e,5.f);
std::cout << e; // This will be all 5's
std::cout << d; // This will be all 4's
std::cout << b; // This will be all 3's
Fortran users, please note this: In Fortran, a = b implies a coy of data from b to a. In portable C++ (YAKL), however, a = b only copies the metadata (and the data pointer, but not the data itself).
If you created your Array
object inside a function, including main()
, you do not need to explicitly deallocate. Just like in Fortran, the moment the Array
object falls out of scope, it is automatically deallocated. This is by far the preferred method.
However, there are mainly two cases when explicit deallocation is desirable: (1) the Array
object has a static lifetime (e.g., it lives in global or namespace scope); and (2) the Array
object is large, and you want to deallocate it before allocating something else to avoid excessive memory usage.
There are three ways to deallocate an Array
object:
- Explicitly call the
arr.deallocate()
class method - Replace the
Array
object with anotherArray
object (forcing the previous object to fall out of scope). Replacing with an emptyArray
object is also valid. - Let it fall out of scope naturally
void foo() {
Array<real,1,memDevice,styleC> arr ("arr", 1024*1024*32);
Array<real,1,memDevice,styleC> arr2("arr2",1024*1024*32);
Array<real,1,memDevice,styleC> arr3("arr3",1024*1024*32);
Array<real,1,memDevice,styleC> arr4("arr4",1024*1024*32);
arr.deallocate(); // arr is deallocated
arr2 = Array<real,1,memDevice,styleC>(); // arr2 is deallocated
arr3 = arr4 // arr3 is deallocated and now uses arr4's data pointer and metadata
}
// After this function completes, arr4 is deallocated by falling out of scope
You can reshape an Array
object by either calling the reshape()
member function or the collapse()
member function.
Array<real,3,memDevice,styleC> a("a",10,9,8);
// Create an Array with two dimensions shaped as (10,72)
auto b = a.reshape<2>({10,72}); // This is cheap
// b now shares a's data pointer in a shallow copy, but its dimensions are different
// You'll have a runtime error if the product of dimensions doesn't match
// create an Array with one dimension, collapsing all dimensions into one
auto c = a.collapse(); // This is cheap
// c now shares the data pointer with a and b, but its dimension is 720
The deep_copy_to()
, createHostCopy()
, and createDeviceCopy()
routines manage data between host and device memory spaces.
-
a.deep_copy_to(b);
will copy data froma
tob
, regardless of which memory spacesa
andb
live. -
auto b = a.createHostCopy();
will allocate a newmemHost
Array
object,b
, with the metadata shallow copied froma
and the data deep copied froma
regardless of wherea
lives. -
auto b = a.createDeviceCopy();
will allocate a newmemDevice
Array
object,b
, with the metadata shallow copied froma
and the data deep copied froma
regardless of wherea
lives.
-
YAKL_INLINE T *data()
: Get the raw data pointer used by thisArray
object -
YAKL_INLINE int get_rank()
: Get the number of dimensions -
YAKL_INLINE index_t totElems()
: Get the total number of array elements.index_t
is anunsigned int
by default. -
YAKL_INLINE span_is_contiguous()
: Always true -
YAKL_INLINE initialized()
: Determine if thisArray
object has been allocated -
const char *label()
: Get the label for theArray
object. Returns empty string ifYAKL_DEBUG
is not defined as a macro -
YAKL_INLILNE SArray<index_t,1,rank> get_dimensions()
: Get the dimension sizes of thisArray
object in the form of a staticArray
object. -
YAKL_INLILNE SArray<index_t,1,rank> get_lbounds()
: Get the lower bounds of thisArray
object in the form of a staticArray
object. For C-style arrays, this is always zeroes -
YAKL_INLILNE SArray<index_t,1,rank> get_ubounds()
: Get the dimension sizes of thisArray
object in the form of a staticArray
object. For C-style arrays, this is always the size of the dimension minus 1.
#include <iostream>
Array<float,1,memDevice,styleFortran> arr("arr",10);
memset(arr,10.f);
// This works for host or device arrays. Device arrays are copied to the host before printing
std::cout << arr;
YAKL supports only basic slicing at present, meaning that you must slice out entire dimensions rather than partial dimensions. Also, the slices must be contiguous. The syntax is as follows:
using yakl::COLON;
Array<float,3,memDevice,styleFortran> a("a",nx,ny,nz);
// b = a(:,:,k) -- "Grab a horizontal slice at this vertical level"
auto b = a.slice<2>(COLON,COLON,k);
// The number inside the <> is the number of dimensions in the resulting slice
// Also note that the COLON's must be at the left since it's column-major ordering
Array<float,3,memDevice,styleC> c("c",nz,ny,nx);
auto d = c.slice<2>(k,COLON,COLON);
// C-style is row-major ordering, so COLON's must be at the right
Important: slices are not reference counted or owned. The result of a slice<N>()
call is an Array
object that uses the data pointer from the Array
object it is sliced from. Therefore, if the Array
object you slice from is deallocated while you're still using the slice, you'll have a memory error.
If you want to write to the array slice passed to a function, you must save it as a temporary variable first and pass the temporary variable: E.g., auto tmp = arr.slice<2>({COLON,COLON,ind1,ind2}); myfunc(tmp);
. This is because C++ treats objects returned by value as const
. If you're only reading from the array slice, you can pass it directly inline because it obeys const
To begin, you never allocate an Array
with a const
underlying data type. The reason is that if the underlying data type is const
, then you can't write to it after creating the Array
object and allocating the const
data pointer. Instead, you create an Array
object with non-const
type, write to it, and then convert it to a const
-type array that shares the same data pointer and reference counter pointer (incrementing the reference counter).
Confused? See the example below:
Array<real,1,memDevice,styleC> arrTemp("arr",10);
parallel_for( 10 , YAKL_LAMBDA (int i) { arrTemp(i) = i; })
// Create a const-type array using the initialized non-const-type array
Array<real const,1,memDevice,styleC> arr(arrTemp); // East const for the win!
Now, when you use arr
, any attempts to write to it will be met with a nice comprehensible error during compile time, and its data is protected.
Note that you can create a non-const array inside a function, then convert it to a const array object, and pass the resulting const array object back by value. This is because the const array object shared the reference pointer with the non-const one you created, and therefore, the data pointer lives out of the function scope.
The most common situation by far in which memory leaks associated with Array
objects will happen is when you declare Array
objects with a static lifetime (e.g., you place them in global or namespace scope, or you apply the static
keyword to a class data member), meaning they never fall out of scope until the program exits. It is easy to allocate something and forget to deallocate it explicitly.
If you pass -DYAKL_DEBUG
to the YAKL compiler flags, array index checking is turned on. It will even detect if you create an Array
object on the host and access its data on the device and vice versa (and thrown an error if the YAKL hardware backend has disparate memory spaces between host and device). With -DYAKL_DEBUG
, YAKL will also check to see if you try to access data in an Array
object that hasn't been allocated yet.
Reference counting, allocation, and deallocation are all inside std::mutex
critical regions, and therefore, Array
objects can be safely used inside CPU threaded regions that use pthreads, std::thread
, or OpenMP CPU threads.