Skip to content

Multi Dimensional Dynamically Allocated Arrays

Matt Norman edited this page Dec 11, 2021 · 6 revisions
// All YAKL Array classes have the following template parameters
// Default memory space depends on whether YAKL_ARCH is defined or not
// Default style is always C-style
template <class T, int rank, int myMem=memDefault, int myStyle=stylec>

// C-style Arrays
Array();
Array(const char *label, int d0 [, int d1,...]); // Owned Array object (reference counted)
Array(const char *label, T *data, int d0 [, int d1,...]); // Non-owned Array object (not ref counted)

// Fortran-style Arrays
Array(const char *label, int d0 [, int d1,...]); // Owned Array object (reference counted)
Array(const char *label, std::vector d0 [, std::vector d1,...]); // Owned Array object (reference counted)
Array(const char *label, T *data, std::vector d0 [, std::vector d1,...]); // Owned Array object (reference counted)

// Fortran-style Arrays with non-1 lower bounds typically use initializer lists
// real(8) :: arr(-1:nx+2)
Array<double,1,memDevice,styleFortran>("arr",{-1,nx+2});

YAKL's multi-dimensional Array class provides a convenient way to manage multi-dimensional data (up to 8 dimensions). memDevice Array objects support any C++ type that is std::is_arithmetic<T>. memHost Array objects support any type, since a constructor can be called on the host. Array objects come in C-style or Fortran-style. Array objects can be "owned" (meaning allocation and free is managed by the Array object, and references are counted); or they can be "non-owned", meaning no allocation, free, or reference counting is handled by the Array object. Reference counting means keeping track of how many Array objects are aliasing the same data pointer, to avoid deallocating the data while it's still being used.

Indexing an Array object

Array objects are indexed via the C++ operator() overload:

Array<float,3,memHost,styleC> a("a",4,3,2);
for (int k=0; k < 4; k++) {
for (int j=0; j < 3; j++) {
for (int i=0; i < 2; i++) {
  a(k,j,i) = -999.;
}
}
}

C-style and Fortran-style Arrays

C-style Array objects have lower bounds of zero and row-major index ordering (meaning the right-most index varies the fastest).

Fortran-style Array objects have lower bounds that default to 1 but can change and column-major index ordering (meaning the left-most index varies the fastest).

Data Layout

All YAKL Array objects have data that is contiguous in memory with no striding.

Shallow and Deep Copy

YAKL Array objects use what is called "shallow copy" by default. What this means is that the copy and move constructors / assignment operators only copy the metadata of the Array and the pointer address, not the data itself. To copy the data itself is called a "deep copy". Therefore, you can performantly pass Array objects by value because only a small amount of data is actually copied.

Array<float,2,memDevice,styleC> a, b, c, d;
// This allocates an owned Array object "a"
a = Array<float,2,memDevice,styleC>("a",100);
yakl::memset(a,0.f);
// This causes b to share the data pointer of A, and it's *very cheap* to perform
b = a; // This is cheap
// The reference counter shared by a and b is now "2"
// Also note that b's label is "a" because a's metadata was copied over
// Because b shared a's data pointer, the following changes a's data as well
memset(b,1.f);
std::cout << a; // this will print all "1"s

// This creates a non-owned Array object that uses the data pointer from a and b
c = Array<float,2,memDevice,styleC>("c",b.data(),100); // This is cheap
// Note that c is *not* reference counted, and no metadata is copied into c.
// Therefore, if you said c is 101 elements, you'll get a memory error
// Also, if you deallocate a's and b's shared data pointer while still using c,
//    you'll get a memory error.
// non-owned constructors should be used with significant caution.
// Frankly, they should only be used when converting another language's / framework's
// contiguous data into a YAKL Array object (e.g., from Fortran or Kokkos).

memset(c,2.f);
std::cout << a; // This will be all 2's
std::cout << b; // This will be all 2's

memset(b,3.f);
std::cout << c; // This will be all 3's

// d is its own separate Array object of size 100
// It does not share the data pointer of a, b, and c
d = Array<float,2,memDevice,styleC>("d",100);
// Copy the data from the pointer shared by a, b, and c to d's pointer
c.deep_copy_to(d); // This is expensive
// d has a separate data pointer from a, b, and c.
// Therefore, changes to a, b, and c do not affect d and vice versa

memset(d,4.f);
std::cout << d; // This will be all 4's
std::cout << a; // This will be all 3's

// This creates e as an Array<float,1,memDevice,styleC> object of size 100,
// and it deep copies d's data to e's data pointer
auto e = d.createDeviceCopy(); // This is expensive

memset(e,5.f);
std::cout << e; // This will be all 5's
std::cout << d; // This will be all 4's
std::cout << b; // This will be all 3's

Fortran users, please note this: In Fortran, a = b implies a coy of data from b to a. In portable C++ (YAKL), however, a = b only copies the metadata (and the data pointer, but not the data itself).

How do I deallocate an Array object?

If you created your Array object inside a function, including main(), you do not need to explicitly deallocate. Just like in Fortran, the moment the Array object falls out of scope, it is automatically deallocated. This is by far the preferred method.

However, there are mainly two cases when explicit deallocation is desirable: (1) the Array object has a static lifetime (e.g., it lives in global or namespace scope); and (2) the Array object is large, and you want to deallocate it before allocating something else to avoid excessive memory usage.

There are three ways to deallocate an Array object:

  1. Explicitly call the arr.deallocate() class method
  2. Replace the Array object with another Array object (forcing the previous object to fall out of scope). Replacing with an empty Array object is also valid.
  3. Let it fall out of scope naturally
void foo() {
  Array<real,1,memDevice,styleC> arr ("arr", 1024*1024*32);
  Array<real,1,memDevice,styleC> arr2("arr2",1024*1024*32);
  Array<real,1,memDevice,styleC> arr3("arr3",1024*1024*32);
  Array<real,1,memDevice,styleC> arr4("arr4",1024*1024*32);
  arr.deallocate();                        // arr is deallocated
  arr2 = Array<real,1,memDevice,styleC>(); // arr2 is deallocated
  arr3 = arr4                              // arr3 is deallocated and now uses arr4's data pointer and metadata
}
// After this function completes, arr4 is deallocated by falling out of scope

Reshaping Array objects

You can reshape an Array object by either calling the reshape() member function or the collapse() member function.

Array<real,3,memDevice,styleC> a("a",10,9,8);
// Create an Array with two dimensions shaped as (10,72)
auto b = a.reshape<2>({10,72});  // This is cheap
// b now shares a's data pointer in a shallow copy, but its dimensions are different
// You'll have a runtime error if the product of dimensions doesn't match

// create an Array with one dimension, collapsing all dimensions into one
auto c = a.collapse();  // This is cheap
// c now shares the data pointer with a and b, but its dimension is 720

Copying data between host and device

The deep_copy_to(), createHostCopy(), and createDeviceCopy() routines manage data between host and device memory spaces.

  • a.deep_copy_to(b); will copy data from a to b, regardless of which memory spaces a and b live.
  • auto b = a.createHostCopy(); will allocate a new memHost Array object, b, with the metadata shallow copied from a and the data deep copied from a regardless of where a lives.
  • auto b = a.createDeviceCopy(); will allocate a new memDevice Array object, b, with the metadata shallow copied from a and the data deep copied from a regardless of where a lives.

Miscellaneous routines

  • YAKL_INLINE T *data(): Get the raw data pointer used by this Array object
  • YAKL_INLINE int get_rank(): Get the number of dimensions
  • YAKL_INLINE index_t totElems(): Get the total number of array elements. index_t is an unsigned int by default.
  • YAKL_INLINE span_is_contiguous(): Always true
  • YAKL_INLINE initialized(): Determine if this Array object has been allocated
  • const char *label(): Get the label for the Array object. Returns empty string if YAKL_DEBUG is not defined as a macro
  • YAKL_INLILNE SArray<index_t,1,rank> get_dimensions(): Get the dimension sizes of this Array object in the form of a static Array object.
  • YAKL_INLILNE SArray<index_t,1,rank> get_lbounds(): Get the lower bounds of this Array object in the form of a static Array object. For C-style arrays, this is always zeroes
  • YAKL_INLILNE SArray<index_t,1,rank> get_ubounds(): Get the dimension sizes of this Array object in the form of a static Array object. For C-style arrays, this is always the size of the dimension minus 1.

How do I print an Array object?

#include <iostream>
Array<float,1,memDevice,styleFortran> arr("arr",10);
memset(arr,10.f);
// This works for host or device arrays. Device arrays are copied to the host before printing
std::cout << arr;

How do I slice an Array object

YAKL supports only basic slicing at present, meaning that you must slice out entire dimensions rather than partial dimensions. Also, the slices must be contiguous. The syntax is as follows:

using yakl::COLON;
Array<float,3,memDevice,styleFortran> a("a",nx,ny,nz);
// b = a(:,:,k)    -- "Grab a horizontal slice at this vertical level"
auto b = a.slice<2>(COLON,COLON,k);
// The number inside the <> is the number of dimensions in the resulting slice
// Also note that the COLON's must be at the left since it's column-major ordering

Array<float,3,memDevice,styleC> c("c",nz,ny,nx);
auto d = c.slice<2>(k,COLON,COLON);
// C-style is row-major ordering, so COLON's must be at the right

Important: slices are not reference counted or owned. The result of a slice<N>() call is an Array object that uses the data pointer from the Array object it is sliced from. Therefore, if the Array object you slice from is deallocated while you're still using the slice, you'll have a memory error.

If you want to write to the array slice passed to a function, you must save it as a temporary variable first and pass the temporary variable: E.g., auto tmp = arr.slice<2>({COLON,COLON,ind1,ind2}); myfunc(tmp);. This is because C++ treats objects returned by value as const. If you're only reading from the array slice, you can pass it directly inline because it obeys const

How do I create an Array of const data?

To begin, you never allocate an Array with a const underlying data type. The reason is that if the underlying data type is const, then you can't write to it after creating the Array object and allocating the const data pointer. Instead, you create an Array object with non-const type, write to it, and then convert it to a const-type array that shares the same data pointer and reference counter pointer (incrementing the reference counter).

Confused? See the example below:

Array<real,1,memDevice,styleC> arrTemp("arr",10);
parallel_for( 10 , YAKL_LAMBDA (int i) { arrTemp(i) = i; })
// Create a const-type array using the initialized non-const-type array
Array<real const,1,memDevice,styleC> arr(arrTemp);  // East const for the win!

Now, when you use arr, any attempts to write to it will be met with a nice comprehensible error during compile time, and its data is protected.

Note that you can create a non-const array inside a function, then convert it to a const array object, and pass the resulting const array object back by value. This is because the const array object shared the reference pointer with the non-const one you created, and therefore, the data pointer lives out of the function scope.

In what circumstances can memory leaks happen?

The most common situation by far in which memory leaks associated with Array objects will happen is when you declare Array objects with a static lifetime (e.g., you place them in global or namespace scope, or you apply the static keyword to a class data member), meaning they never fall out of scope until the program exits. It is easy to allocate something and forget to deallocate it explicitly.

How do I do index checking?

If you pass -DYAKL_DEBUG to the YAKL compiler flags, array index checking is turned on. It will even detect if you create an Array object on the host and access its data on the device and vice versa (and thrown an error if the YAKL hardware backend has disparate memory spaces between host and device). With -DYAKL_DEBUG, YAKL will also check to see if you try to access data in an Array object that hasn't been allocated yet.

Thread Safety

Reference counting, allocation, and deallocation are all inside std::mutex critical regions, and therefore, Array objects can be safely used inside CPU threaded regions that use pthreads, std::thread, or OpenMP CPU threads.

Clone this wiki locally