Skip to content

Latest commit

 

History

History
529 lines (355 loc) · 27.7 KB

JStoWASM.md

File metadata and controls

529 lines (355 loc) · 27.7 KB

This document contains developer documentation for BioImage Suite Web.


Interfacing to WebAssembly in BioImage Suite Web

Introduction

BioImage Suite Web is written in a mix of JavaScript and C++ complied to WebAssembly using Emscripten. A reader unfamiliar with either of these is urged to consult the documentation linked above. A fair understanding of these tools is needed to understand the rest of this document.

Please note that WebAssembly is completely platform independent. The code needs to be compiled on one platform, but the binaries can be copied between platforms without difficulty.

Note: The C++ code is compiled as a native library but may be accessed from Python. This is the subject of a different document.

All C++ code in the project is located in the cpp directory. The functions that are called from JS are in three files, namely bisExportedFunctions.h,.cpp, bisExportedFunctions2.h,.cpp and bisTesting.h/bisTestting.cpp. The corresponding JS utility code is in js/core/bis_wrapperutils.js and js/core/bis_wasmutils.js.

Much of the description below is specific to how things are done in BioImage Suite Web.


Emscripten Modules

The low-level Web Assembly code compilation results in two files

  1. build/libbiswasm.js
  2. build/libbiswasm.wasm

The .wasm file is the binary bytecode Web Assembly file. The .js file is a wrapper for JavaScript code automatically generated by Emscripten that provides a more user-friendly interface to the code. We will call this the Emscripten Module in our documentation. In this document, all references to "WebAssembly" mean either the low level code or the Emscripten Module.

The BisWeb build process creates two additional files:

  1. build/libbiswasm_wasm.js
  2. build/libbiswasm_wrapper.js

These are created by two custom scripts that are run as part of the CMake process. These will be described in more detail below.

Compilation details

When compiling a WASM library with Emscripten there is the option to provide additional JS code that will go into the Emscripten module. These are passed in as LINKER_FLAGS to CMake as:

-DCMAKE_EXE_LINKER_FLAGS="--pre-js ${DIR}/../cpp/libbiswasm_pre.js --post-js ${DIR}/../cpp/libbiswasm_post.js"

Wasm code is loaded using a custom tool compiletools/bis_create_wasm_module — this is invoked as part of the CMake build process to embed the .wasm file in a .js file.

// This is the emscripten .js file
const libbiswasm_raw=require('libbiswasm');

// This is the bisweb genericio file 
const genericio=require('bis_genericio.js');

// Automatically encoded string from bis_create_wasm_module.js
// ------------------------------------------------------------
// This is abbreviated here ...
const base64str="H4sIAAAAAAAAA+y9CYBdRZUwfKvu/u59r2+v6SSdpO5NCB02E8SAiCYXsjQhBEb.... "

module.exports=function() {

    return new Promise( (resolve,reject) => {

        // Make String Binary and uncompress
        // ------------------------------------------------------------
        let binary=genericio.fromzbase64(base64str);

        // Load WASM and initialize libbiswasm_wrapper module
        // ------------------------------------------------------------
        libbiswasm_raw(resolve,"webpack_encapsulated_module",binary);
    });
};

This simplifies the task of finding the .wasm file at runtime as, by converting it to a .js file, webpack can embed it in the overall build.

Finally a custom script compiletools/bis_create_wrappers.js is fed to CMake to create some JS-interace functions automatically (and also the Python wrapper and Matlab wrapper functions as well, more below). This results in a final file build/libbiswasm_wrapper.js.

From JS to WASM and back

Functions compiled from C/C++ to WebAssembly that need to be accessed from JS must, unless you have another trick:

  1. Be compiled as export "C" — this means static linkage and only simple C-style arguments
  2. Be tagged in the source code as "KEEP" so that Emscripten does not discard them as useless.

Functions which take only numbers as arguments (float or int) can be called directly, whereas functions which pass pointers (strings or arrays of numbers) need to be called via special functions provided by Emscripten.

Simple Function call

Consider the simple function (see bisExportedFunctions.h/bisExportedFunctions.cpp)

// bisExportedFunctions.h
/** @returns Magic Code for Serialized Image */
BISEXPORT int getImageMagicCode();

// bisExportedFunctions.cpp
int getImageMagicCode() { return bisDataTypes::s_image;   }

This is a simple function that returns the code for an image (used during data serialization, more on this later). The function is declared as extern "C" in bisExportedFunctions.h and tagged with the label BISEXPORT. This is defined in bisDefinitions.h and for the purpose of WASM compilation it is simply defined as

#define BISEXPORT  __attribute__((used))

This is a signifier to Emscripten that we need this function and that it should not be eliminated as dead code during linking. The function itself is trivial and simply returns an integer. To call this from the Emscripten module we can simply call this as:

Module._getImageMagicCode();

Please note the underscore ('_') before getImageMagicCode. We call this from a JS function in bis_wasmutils.js in exactly this way as shown in the code below:

/**
* @alias bisWasmUtils.get_image_magic_code
* @param {EmscriptenModule} Module - the emscripten Module object
* @returns {number} the Bis WebAssembly Magic Code for a image
*/
var get_image_magic_code=function(Module) { 
    return Module._getImageMagicCode(); 
};

Note: Emscripten functions are stored in a variable Module, more on how this comes about later. This will be expanded on in future sections.

Complex Function call

Things get a little more complicated with functions that take in complex arguments (pointers and strings). These are called using the Emscripten-provided function Module.ccall as follows:

const wasm_output=Module.ccall('gaussianSmoothImageWASM','number',
    ['number', 'string', 'number'],
    [ image1_ptr, jsonstring, debug]);

( 4/16/2018: Emsripten seems to have removed exporting the .ccall method in its latest version but it can be added back via the libbiswasm_post.js file.)

The actual C++ function being called is contained in bisExportedFunctions.h and has the signature:

BISEXPORT unsigned char* gaussianSmoothImageWASM(unsigned char* input, const char* jsonstring, int debug);

The first argument gaussianSmoothImageWasm is the C-function name with a preceeding underscore. The next argument is the return type which is either a number or a string. Pointers are numbers — their underlying value is a memory address. The third argument is an array containing the the types of the arguments of the function and the last argument is another arrray containing the actual variables to pass in as arguments.

The variables image1_ptr and output are pointers to arrays. These are numbers that point to the beginning of the memory storage of this array in WebAssembly memory space — WebAssembly has its own memory space separate from JS and all data that need to be passed into/out of Wasm functions need to be copied to this separate memory space. There are different ways to do this. BioImage Suite code tends to do all memory management on the C++ side and call functions from JS. In particular, there are two functions in bisExportedFunctions.h that handle this — see the declarations below:

/** Called from JS code to allocate an array
* @param sz the size of the array in bytes
* @returns the pointer to the allocated data
*/
BISEXPORT unsigned char* allocate_js_array(int sz);

/** Called from JS code to del_ete a pointer 
* @param ptr the pointer to del_ete
*/
BISEXPORT int jsdel_array(unsigned char* ptr);

These are used to as part of data-object serialization/de-serialization, which will be discussed next. Memory can be allocated and deallocated in JS directly by calling through to Emscripten functions — see this page in the Emsripten documentation for a good discussion. For organization and code maintenence reasons, low-level memory management is centralized to the C++ codebase in BioImage Suite. The curious may refer to bisMemoryManagement.h/bisMemoryManagement.cpp.

Note: Passing Strings in to WASM functions packed into Emscripten Modules is easy as Emscripten automatically converts JS strings to C-style char* arrays. Getting Strings out is a little more complex. Strings are returned as vectors and deserialized as JS Strings in BioImage Suite.


Calling Computational Code in BisWeb

An Example function call -- gaussianSmoothImage

Given the limitation of only being able to pass numbers and arrays, an operations such as gaussianSmoothImage that takes an input image and a set of parameters and returns an output image requires some extra scaffolding. Bisweb handles this as follows:

  1. On the JS side, the image is stored as an object of type BisWebImage (see js/dataobjects/bisweb_image.js).
  2. An automatically generated JS-interface function is called to invoke the C++ code, see below. In this the image is serialized to an WASM-memory stored array.
  3. A C-interface function is then invoked. This deserializes the array into a corresponding C++ image object, bisSimpleImage in this case.
  4. The actual computational C++ code is then called that takes a bisSimpleImage as input and returns a bisSimpleImage as output.
  5. The C-interface function takes this C++ object, serializes it to a new WASM-stored array, and returns it to the JS interface function
  6. The JS-interface function deserializes the WASM array to create a new BisWebImage JS-object, releases the memory allocated in both serialization and deserialization, and returns the JS object.

Consider the automatically generated JS Interface Function:

// C++:
/** Smooth image using \link bisImageAlgorithms::gaussianSmoothImage \endlink
* @param image1 - serialized input as unsigned char array
* @param paramobj - the parameter string for the algorithm { "sigma" : 1.0, "inmm" :  true, "radiusfactor" : 1.5 },
* @param debug - if > 0 print debug messages
* @returns a pointer to a serialized image
*/
// - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
// JS: {'gaussianSmoothImageWASM', 'bisImage', [ 'bisImage', 'ParamObj', 'debug' ]}
//      returns a bisImage
// - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
var gaussianSmoothImageWASM = function(image1, paramobj, debug) { 

    if (debug !== true && debug !=="true" && debug !==1 && debug !==2) 
        debug = 0; 
    else if (debug !== 2) 
        debug = 1;
    
    const jsonstring = JSON.stringify(paramobj || { } );

    // Serialize objects
    let image1_ptr = wrapperutil.serializeObject(Module, image1, 'bisImage')

    // Call WASM
    if (debug || debug === 'true') 
        console.log('++++\n++++ Calling WASM Function:gaussianSmoothImageWASM with ' + jsonstring + '\n++++');
    
    const wasm_output = Module.ccall('gaussianSmoothImageWASM', 'number', 
        [ 'number', 'string', 'number' ],
        [ image1_ptr, jsonstring, debug ]);

    // Deserialize Output
    const output = wrapperutil.deserializeAndDeleteObject(Module, wasm_output, 'bisImage', image1);
    
    // Cleanup
    if (image1_ptr !== image1)
        wasmutil.release_memory(Module, image1_ptr);

    // Return
    return output;
}

This code performs the following steps:

  1. Takes an image input image and serializes it to create a WASM-array using wrapperutil.serializeObject.

  2. Invokes Module.ccall to call the C++ function.

  3. Deserializes the WASM output and releases the WASM memory using wrapperutil.deserializeAndDeleteObject.

  4. Returns output, a BisWebImage.

The whole build/libbiswasm_wrapper.js file including this function is generated automatically by looking for special comments in the C++ header file. In bisExportedFunctions.h the definition of gaussianImageSmoothWASM has a special comment beginning with // BIS: as shown below:

// BIS: { 'gaussianSmoothImageWASM', 'bisImage', [ 'bisImage', 'ParamObj', 'debug' ] } 
BISEXPORT unsigned char*  gaussianSmoothImageWASM(unsigned char* input,const char* jsonstring,int debug);

These tags are parsed by a custom script compiletools/bis_create_wrappers.js to create the JS-interace functions as well as and also the Python and Matlab wrapper functions. This saves as a step.

Object serialization and deserialization

An OpenIGTLink-inspired structure is used to transfer data back and forth. Each array consists of a core header of 4 32-bit integers as follows:

  • magic_type — The type of the object. There are six known objects as of know and their codes are defined in cpp/bisDataTypes.h. These are vector, matrix, image, grid_transform, combo_transform, and transformation_collection.

  • data_type — The data type of the actual data (e.g. short, float etc.) The codes for these are defined in cpp/bisDataTypes.h

  • header_size — The size of the object-specific header. For example, a matrix has a header of size 8, a 4-byte integer for the width and a 4-byte integer for the height.

  • data_bytes_size — The size of the whole data array in bytes. For a float matrix this 4 * Number of Rows * Number of Columns.

The core header is then followed by an object-specific header of size header_size and then the data array storing the actual object of size data_bytes_size bytes of type data_type.

On the JS-side the core objects are defined in js/dataobjects. Each has a function serializeWasm which serializes the object to this format and a function deserializeWASM which parses an array to recreate the object.

Consider a simple 4x3 matrix of type float. In this case the serialized object would look as follows

  • bytes 0-3 store the integer 20002 — the code for matrix
  • bytes 4-7 store the integer 16 — the code for float (these codes match the ones used in the NIFTI header)
  • bytes 8-11 store the integer 8 — the code for the size of the object-specifc header
  • bytes 12-15 store the integer 4 * 3 * 4 = 48 — the size of the actual matrix in bytes
  • bytes 16-23 store the matrix-specific header. In particular these store two integers, the number of rows (4) and the number of cols (3)
  • bytes 24-71 store the actual valus of the matrix one row at a time.

For images the type specific header contains 5 integers and 5 floats which are the dimensions and spacing for a 5-dimensional image, which should be permissive enough for most cases moving forward.

Looking at the actual gaussianImageSmoothWASM code in more detail.

This section will work through the six steps of smoothing an image using C++ code called from JS, beginning inside the wrapper function shown above.

Serializing an Image Object

First examine the call

    let image1_ptr = wrapperutil.serializeObject(Module, image1, 'bisImage')

This where the image is serialized. Looking inside serializeObject, see that for images, this reduces to:

    return image1.serializeWasm(Module);

In turn this function is simply:

    let arr = this.getImageData();
    let dim = this.getDimensions();
    let spa = this.getSpacing();
    return biswasm.packStructure(Module, arr, dim, spa);

Next, the low level code in biswasm (js/core/bis_wasmutils.js). For images this reduces to:

let magic_code = get_image_magic_code(Module);
let headersize = 40;
let header_array = new Uint8Array(headersize);

// These two lines create views inside the actual header_array    
let dim = new Int32Array(header_array.buffer, 0, 5);
let spa = new Float32Array(header_array.buffer, 20, 5);
for (let k = 0; k < 5; k++) {
    if (k < dimensions.length) {
        dim[k] = dimensions[k];
        if (k < spacing.length)
            spa[k] = spacing[k];
        else
            spa[k] = 1.0;
    } else {
        dim[k] = 1;
        spa[k] = 1.0;
    }
}

return packRawStructure(Module, header_array, data_array, magic_type);

This last call gets us to the low level code:

var packRawStructure = function(Module, 
                                header_array,
                                data_array,
                                magic_type) {

    let headersize = 0;
    if (header_array !== 0)
        headersize = header_array.byteLength;
    let nDataBytes = data_array.byteLength + headersize + 16;

    // This allocates the raw memory
    let dataPtr = allocate_memory(Module, nDataBytes);

    // This packs the whole structure inside dataPtr
    packRawStructureInPlace(Module, dataPtr, header_array, data_array, magic_type);
    
    return dataPtr;
};

There are two calls inside this function worth exploring. The first is allocate_memory. This simply calls the C++ function allocate_js_array to allocate memory on the WASM side

var allocate_memory = function(Module, nDataBytes) {
    //  JS-Style: return Module._malloc(nDataBytes);
    return Module._allocate_js_array(nDataBytes);
};

The last call is packRawStructureInPlace which performs the actual serialization. There are three elements to this: (1) the 16-byte global header, (2) the specific image header, header_array and (3) the raw image data, data_array.

var packRawStructureInPlace = function(Module,dataPtr,
                                    header_array,
                                    data_array,
                                    magic_type) {

    let headersize = 0;
    if (header_array !== 0)
        headersize = header_array.byteLength;
    const datatype = getCodeFromType(data_array);
    let nDataBytes = data_array.byteLength + headersize + 16;

This creates the global header

    // Create the global header
    let intheader = get_array_view(Module, Int32Array, dataPtr, 4);
    intheader[0] = magic_type;
    intheader[1] = datatype;
    intheader[2] = headersize;
    intheader[3] = data_array.byteLength;

If the object has a header (one of our supported types, vectors do not have an object-specfic header) store this first:

    if (headersize > 0) {
        // Copy Header
        let headerView = get_array_view(Module, Uint8Array, dataPtr + 16, header_array.byteLength);
        let inputView = new Uint8Array(header_array.buffer);
        headerView.set(inputView);
    }

Store the actual data (voxel intensities in the case of an image)

    // Copy Data
    let dataView = get_array_view(Module, Uint8Array, dataPtr + headersize + 16, data_array.byteLength);
    let bisoffset = data_array.bisbyteoffset || 0;
    let inp = new Uint8Array(data_array.buffer, bisoffset, dataView.length);
    dataView.set(inp);

Return the number of bytes

    return nDataBytes;
};

Inside this there is one more function call worth looking into which is get_array_view. Module.HEAPU8 is the entire WASM memory space cast into an unsigned char array. dataPtr is the raw memory location and sz is the number of elements involved. Finally, arraytypename is the type of array e.g. Float32Array.

var get_array_view = function(Module, arraytypename, dataPtr, sz) {
    return new arraytypename(Module.HEAPU8.buffer, dataPtr, sz);
};

The C++ Side

The C++ function is below. It takes three arguments

  • input - The image in serialized form

  • jsonstring - The parameters as a JSON String

  • debug - A flag to enable print debug statements

    unsigned char* gaussianSmoothImageWASM(unsigned char* input, const char* jsonstring, int debug) {

    if (debug) std::cout << "In Smooth Image" << std::endl;

The first interesting thing is the use of bisJSONParameterList to parse the jsonstring to extract the actual arguments. Note that all memory allocation in bisweb with one exception, the allocation of raw memory, is done using smart pointers which enables automatic deletion of pointers. See this Microsoft article for a good introduction to C++ smart pointers.

std::unique_ptr<bisJSONParameterList> params(new bisJSONParameterList());
int ok = params->parseJSONString(jsonstring);
if (!ok) 
    return 0;

if (debug)
    params->print();

Next extract the actual parameters from the bisJSONParameterList object

float sigmas[3];
int ns = params->getNumComponents("sigmas");
if (ns == 3)  {
    for (int ia = 0; ia <= 2; ia++) 
        sigmas[ia] = params->getFloatValue("sigmas", 1.0, ia);
} else {
    float s = params->getFloatValue("sigmas", 1.0);
    sigmas[0] = s;
    sigmas[1] = s;
    sigmas[2] = s;
}
int inmm = params->getBooleanValue("inmm");
float radiusfactor = params->getFloatValue("radiusfactor", 1.5);

Next deserialize the actual image and cast it to type float. If the input image was of type of float, no new memory is allocated by this step. Other functions in bisExportedFunctions.h can use variable types and templated execution to preserve native types, but smoothing practically needs float precision as integer smoothing can produce nonsensical results.

std::unique_ptr<bisSimpleImage<float>> in_image(new bisSimpleImage<float>("smooth_input_float"));
if (!in_image->linkIntoPointer(input))
    return 0;

Create the output image by cloning the input image to allocate an image of the same size. copyStructure does this in the code below.

std::unique_ptr<bisSimpleImage<float>> out_image(new bisSimpleImage<float>("smooth_output_float"));
out_image->copyStructure(in_image.get());

Call the "higher level" C++ code to perform the smoothing. Note that pointer contained in the smart pointer is passed instead of the whole smart pointer structure, i.e. in_image.get() instead of in_image.

float outsigmas[3];
bisImageAlgorithms::gaussianSmoothImage(in_image.get(), out_image.get(), sigmas,outsigmas, inmm, radiusfactor);
if (debug)
    std::cout << "outsigmas=" << outsigmas[0] << "," << outsigmas[1] << "," << outsigmas[2] << std::endl;

Finally serialize the image into a raw data array and release ownership so that the image returned to the JS side does not get deleted when out_image is automatically deleted at the end of this function.

return out_image->releaseAndReturnRawArray();
}

Back to JS

The returned array wasm_output is parsed to create a new object. This is done using:

  const output = wrapperutil.deserializeAndDeleteObject(Module, wasm_output, 'bisImage', image1);

Again for images this reduces to:

    let output = new BisWebImage();
    output.deserializeWasmAndDelete(Module, ptr, first_input);
    return output;

The input image image1 is used as part of this process as this has the NIFTI header information that needed to create a full output image.

The actual operation happens with the call to ouput.deserializeWasmAndDelete. This has the form:

deserializeWasmAndDelete(Module, wasmarr, extra = 0) {
    const out = this.deserializeWasm(Module, wasmarr, extra);
    biswasm.release_memory_cpp(Module, wasmarr);
    return out;
}

The deserializeWasm function goes through the array and reconstructs the BisWebImage. The release_memory_cpp function calls C++ code to delete the raw WASM array on the C++ side — this is critical to avoid memory leaks. This is the array 'entrusted' to the JS side by the C++ code as part of the call releaseAndReturnArray. The release part of the call is essentially tells the smart pointer destructor not to delete this array as it will be deleted later. Here we keep this end of the bargain.


Developer Guide to adding new functions and datatypes

Adding new functions

Adding new C++ functions is relatively easy. These are the steps:

  1. Implement the C++ code.

  2. Create an interface function in one of bisExportedFunctions, bisExportedFunctions2, or bisTesting that accepts inputs, deserializes it, and calls the actual C++ code. If the code has dependencies/requires other files then add the needed files to the PARSE_HEADERS list in cpp/CMakeLists.txt.

     SET(PARSE_HEADERS
         ${CPP_SOURCE_DIR}/bisDefinitions.h
         ${CPP_SOURCE_DIR}/bisExportedFunctions.h
         ${CPP_SOURCE_DIR}/bisExportedFunctions2.h
         ${CPP_SOURCE_DIR}/bisTesting.h
     ) 
    
  3. Ensure that the interface function has the extra header tag specifying its inputs and output. The allowed values for this can be found in compiletools/bis_create_wrappers.js (for both JS and Python). The JS-spec is:

     const js_types = {
         'bisImage' : [ 'image', 'number', true, false ],
         'Matrix' : [ 'matrix' ,'number', true, false ],
         'Vector' : [ 'vector' , 'number', true, false ],
         'bisTransformation' : [ 'transformation' ,'number', true,false ],
         'bisLinearTransformation' : [ 'linearxform','number', true, false ],
         'bisGridTransformation'   : [ 'gridxform', 'number', true, false ],
         'bisComboTransformation'  : [ 'comboxform', 'number', true, false ],
         'String' : [ 'string', 'string', false, true ],
         'Int' : [ 'intval' , 'number', false, false ],
         'Float' : [ 'floatval', 'number', false, false ],
         'debug' : [ 'debug', 'number', false, false ],
         'ParamObj' : [ 'paramobj', 'string', false ,false ],
     };
    

This is a key-value pair. Consider the first key, bisImage. The values specify the variable name to be created, e.g. image, the equivalent Emscripten.ccall type, number, whether it is a pointer, true, and whether it is a String, false.

The header for gaussianSmoothImage has the form:

    // BIS: { 'gaussianSmoothImageWASM', 'bisImage', [ 'bisImage', 'ParamObj', 'debug' ] } 

The type ParamObj is a dictionary (object in JS, dictionary in Python) that will be serialized to a JSON-string prior to calling the C++ function.

Adding new data object types for serialization and deserialization

One day we will add polygonal surfaces. Hopefully this section will be helpful when that time comes.

This is not for the faint-hearted. On the JS-side, create a new object deriving from BisDataObject (see js/dataobjects/bis_dataobject.js) that implements all its key functions, including serializeWasm and deserializeWasm.

Next, edit compiletools/bis_create_wrappers.js to add the new object type to the spec. So long as it derives from BisDataObject, this is all that is needed.

Finally, edit js/core/bis_wrapperutils and in particular the function deserializeAndDeleteObject to add the new object to what is effectively a factory function.

var deserializeAndDeleteObject=function(Module,ptr,datatype,first_input=0) {

...
    
    if (datatype==='Matrix') {
        let output=new BisWebMatrix();
        output.deserializeWasmAndDelete(Module,ptr);
        return output;
    }

...
}

On the C++ side, implement appropriate functionality — perhaps a class — that will perform the opposite serialization and deserialization. See for example, cpp/bisSimpleDataStructures.h and in particular bisSimpleMatrix.