-
Notifications
You must be signed in to change notification settings - Fork 16
Readable CPlusPlus
This is an evolving, incomplete, and certainly flawed document, but it's hopefully a start to the conversation about readability for codes written in C++ portability frameworks for more traditional domain science code developers.
I fear this is likely how C++ and performance experts come across to domain scientist developers. To be fair, this might be how some C++ and performance experts view domain scientists:
C++ performance portable codes (Kokkos, RAJA, SYCL, HC) have developed a reputation for being "unreadable." In some cases, this reputation is justifiable, but I think there is a lot of sociology wrapped up in this statement. I also think that C++ portable codes can be written in a way that is easily read by traditional scientific code developers. A seasoned C++ programmer (which I certainly am not) will probably look at some of these codes and say they are trivially readable. Then again, C++ seems to have at least three main flavors:
- C with ampersands (instead of the god-forsaken asterisks everywhere), basic templates, non-inheriting classes, and an easy-to-use multi-dimensional array class
- Category 1 plus complex inheritance
- Category 1 plus complex templates
Readability constraints only apply to code that is actively being developed by scientists. In some cases, projects can abstract these constructs away from domain developers. In many cases, though, the code cannot be easily abstracted away in a library because it is highly custom to the process being modeled. I'm talking about code that scientific developers have to be able to develop, read, and modify directly.
While readability is not an objective metric, for many science and engineering code developers, we are used to relatively simple looping structures and abstractions. The code is usually located in a set of adjacent loops inside simple functions. The call tree is straightforward, if a bit deep at times. If inheritance is used, it is straightforward, shallow, and clear. I would argue the first flavor of C++ in the previous numbered list is the style best suited for these kinds of developers. Departures from this style should be minimized and decorated with clear comments. The following seems to be a good starting list for constraints on readability for domain scientist developers:
- I need to be able to find the code easily
- I need to be able to recognize a loop body when I see one
- I need to know what the effective looping is doing
- I need to have recognizable data structures (e.g. MD arrays)
I cannot put enough emphasis on the first item in this list. The developer should not have to pass through more than one level of abstraction to find where the code actually lives. Abstraction improves flexibility and reduces code duplication. But it also makes the code very hard to find when taken past one level (two levels at most). Abstraction should be used but strictly limited in terms of the number of layers.
Your neat little trick to do something in one line instead of ten is likely incomprehensible to the person beside you. Go ahead and be verbose in what you're trying to do, and add comments to make it clear. One of the problems with the evolving C++ standard is they are creating more and more ways to combine things in a smaller number of lines of code at the cost of clarity for programmers of every other language. Sometimes it's better to have more lines of code. Nearly everybody can solve a Rubik's cube (https://www.speedcube.com.au/pages/how-to-solve-a-rubiks-cube), but not everybody has the time or desire to memorize how to do it in the fewest moves with a blindfold on (https://ruwix.com/the-rubiks-cube/how-to-solve-the-rubiks-cube-blindfolded-tutorial).
There's a natural tendency for each camp to point the finger at another and say, "they need training." A C++ metaprogramming expert will probably say, "scientists just need to study C++ and become more modern." Scientists will, in turn, say, "No, you need to write more readable code." I think it makes most sense for the two to meet in the middle. C++ experts need to better understand what a domain scientists considers readable. Domain scientists need to take a few steps to understanding how to map what they're used to seeing onto a new context. It might be a difficult task, but I don't think it's impossible.
For instance, C++ portability approaches tend to look like CUDA in the sense that they expose the code as a kernel that operates on a single thread. Domain scientists will find this confusing until we show them that it's actually identical to the code you find inside a set of loop nests. The goal is to take what domain scientists consider confusing and express it in terms they're familiar with. The goal is also to help C++ experts frame their code in simpler expressions that come closer to what a domain scientist is used to seeing.
Domain scientists aren't unintelligent, it's OK to expect them to go through a limited amount of training to bring them from where they are to a place where they can actively work in a C++ portability framework. At the same time, their degrees are not in computer science, and it's unreasonable to expect them to understand complex topics in C++. In particular, jumping through layers of template specializations is not a fair expectation of domain scientists. Using templated expressions in limited and clearly communicated ways is a fair expectation.
Ultimately, I think the burden is probably larger on C++ expert developers to write code that is understandable to non-experts. But regardless, the paradigm where domain scientists write something they understand but performs abysmally and C++ and performance experts then turn that code into something very few can read is unsustainable. Both are going to have to create a common workspace that takes compromises on both parts.
There is also an issue of performance having conflicting constraints with readability. While there are some compromises to be had, in my opinion, it's best to err on the side of readability at the cost of performance unless the degradation is particularly egregious. Readable code is productive code and is much less likely to lead to unintended bugs.
C++ iterators are powerful, but they are very rarely necessary in most scientific codes. Typically, we don't loop in arbitrarily complex ways, and for
loops are easier to understand for domain scientists (and most people in general).
Template syntax gets pretty ugly to a domain scientist. When it's nested and specialized, it gets even uglier. We should hide it from domain scientists when we can. This can be done through typedef
s:
typedef Kokkos::View<real***,Kokkos::LayoutRight,Kokkos::Device<Kokkos::Cuda,Kokkos::CudaUVMSpace>> real3d;
Using this typedef
, the user only sees real3d
in the code to represent a 3-dimensional array. Nasty templated types can be shielded through the auto
keyword as well when functions are returning them. We can also shield some of this from users through simple function and class wrappers. The template decisions, being made at compile-time anyway, can be hidden by a single-level abstraction that developers can easily find if they want to or ignore if they don't want to know about it.
It is OK to expect developers to use basic templated expressions where they are required (e.g., the number of indices to slice out of an array or the number of loop bounds in a parallel for). In these cases, it should be clearly documented and explained.
Abstraction can reduce redundant code and significantly improve flexibility, but when taken past a single level of abstraction, it quickly makes the code you want to find hard to find. I've looked at C++ codes where I've had to duck through as many as 7 or 8 levels of abstraction either through wrappers, templates, or inheritance to find what I want to find, floating through several different files in the code. I've had times when I literally could not find the code unless I knew what some of the code was ahead of time and did a project-wide grep
. This practice is clearly unacceptable for domain scientists and code readability. Whenever possible, indirection should be kept to a single level at most.
Typical domain science codes have loops that appear together inside functions. There is a habit in many C++ codes to over-modularize the code, making every infinitesimal operation its own function. This habit is formed to reduce redundant code, but this is not what domain scientists are used to. Modularization should be a compromise, not a hard and fast rule. If you end up with 50 lines of redundant code, I don't think that's always a bad idea.
There is a conflict between maintainability and readability. Ideally, if you make a small change in an algorithm, you should only have to make it once. Modularity affords you that option. However, if successive loops are clearer when cast together inside a function, making the code more readable to a scientist, yet you have to make that minor algorithmic change 5 times, I think that's a acceptable compromise. It's not easy to determine what the optimal place is for this, but I think it's safe to say that neither completely fine-grained modular codes nor completely redundant codes are optimal.
Continuing the theme of the previous point, I think readable code needs to use C++ Lambdas to express kernels. The alternative is to create a separate functor manually outside the current function, meaning the code is no loner in-line, and you have to go searching for it.
Most of the time, we're going to expose kernels as tightly-nested loops in science codes because it's much easier to expose tightly-nested loops on the GPU, which only has two levels of parallelism. Given this, I think it makes sense to launch the library's equivalent of parallel_for
in-place with a lambda and to leave the equivalent loops commented above the parallel_for
. An example of this in YAKL is below:
// Compute x-direction tendencies as the flux divergence plus the source term
void computeTend_X(real3d &flux, real3d &tend, Domain &dom) {
// for (int l=0; l<numState; l++) {
// for (int j=0; j<dom.ny; j++) {
// for (int i=0; i<dom.nx; i++) {
yakl::c::parallel_for( Bounds<3>(numState,dom.ny,dom.nx) ,
YAKL_LAMBDA (int l, int j, int i) {
tend(l,j,i) = - ( flux(l,j,i+1) - flux(l,j,i) ) / dom.dx + src(l,j,i);
});
}
Also, if they need to change the loops, it's not that hard to parse out what to change to make that happen. Suppose they wanted to loop to dom.nx+1
, for instance. They can pretty easily change the code to:
// Compute x-direction tendencies as the flux divergence plus the source term
void computeTend_X(real3d &flux, real3d &tend, Domain &dom) {
// for (int l=0; l<numState; l++) {
// for (int j=0; j<dom.ny; j++) {
// for (int i=0; i<dom.nx+1; i++) {
yakl::parallel_for( Bounds<3>(numState,dom.ny,dom.nx+1) ,
YAKL_LAMBDA (int l, int j, int i) {
tend(l,j,i) = - ( flux(l,j,i+1) - flux(l,j,i) ) / dom.dx + src(l,j,i);
});
}
Keep SFINAE and other complex concepts away from scientific developers
Substitution Failure Is Not An Error SFINAE is a pretty wild concept domain scientists and should be hidden from any code you plan on them touching. I still don't know what's going on with enable_if
and related capabilities. I only know generally what to do for the result I want:
I'm sure there are a lot of other concepts similar to this as well, but not being a C++ expert myself, I leave those to the experts to decide for themselves. Regardless, some of these concepts are really bizarre to non-C++ experts.
If you do something for which the motivations are not obvious, it's important to comment why you're doing that so someone doesn't come after you and undo your work because they didn't know why you did it. As a simplistic example, suppose you use a Kokkos View
with a memory layout that can potentially have padding or unexpected ordering, but you want to work with a library that doesn't expect this. You'll probably have to deep_copy
the data to a View
with a more controlled memory layout to interface with the library. But another developer might come behind you and say, "Why are we doing this needless copy of the same data? I'm going to delete this." Then you get wrong answers because that same developer didn't read your developer's guide to pay attention to integration tests. There are multiple problems here, of course, but you can help the situation significantly with a simple comment: // Copying to a data structure with the proper memory layout to interface with the library
.
Also, you should set the bar high for what's obvious. Err on the side of clarity.