-
Notifications
You must be signed in to change notification settings - Fork 10
Bois Schema Specs
TO BE COMPLETED.
Bois binary format is pretty straight forward. BOIS stands for Binary Object Indexed Serializer. Even tough the overall structure doesn't follow any specific rule, it still can be categorized as indexed sequential data format, hence the indexed word in name. Being indexed means that there is an index byte before every object. This index byte contains information about the the data that comes after it. It can even contain data by in itself. To know how continue reading.
There are several type of index bytes that depending on the type of data that is going to be stored are used as the index byte.
IB1 - Nullable: Generally used if the object/number is nullable.
index byte: [0_{null-flag}_0_0_0_0_0_0]
embedable integer: none
IB2 - Embed-able Nullable: Generally used if the object/number is nullable and is small enough to be embedded.
index byte: [{embedded-flag}_{null-flag}_0_0_0_0_0_0]
followed by optional data: [0_0_0_0_0_0_0_0]
embedable integer: 0..63
IB3 - Embed-able Nullable Signed Number: Used for signed numbers which is nullable and is small enough to be embedded.
index byte: [{embedded-flag}_{null-flag}_{negative-flag}_0_0_0_0_0]
followed by optional data: [0_0_0_0_0_0_0_0]
embedable integer: 0..31
IB4 - Embed-able Not-Null Signed Number: Used for signed numbers which can not be null and is small enough to be embedded.
index byte: [{embedded-flag}_{negative-flag}_0_0_0_0_0_0]
followed by optional data: [0_0_0_0_0_0_0_0]
embedable integer: 0..63
IB5 - Embed-able Nullable Unsigned Number: Used for unsigned numbers which can be null and is small enough to be embedded.
index byte: [{embedded-flag}_{null-flag}_0_0_0_0_0_0]
followed by optional data: [0_0_0_0_0_0_0_0]
embedable integer: 0..63
IB6 - Embed-able Not-Null Unsigned Number: Used for unsigned numbers which can not be null is be small enough to be embedded.
index byte: [{embedded-flag}_0_0_0_0_0_0_0]
followed by optional data: [0_0_0_0_0_0_0_0]
embedable integer: 0..127
If you have noticed, some of these index bytes have same structure. I've done this to simplify the process of writing the program. But we still need more info about these bytes which is the the amount of data that be embedded. Before that Lets see how to embed data in index byte.
If the number that is going to be stored is small enough it can be stored in the index byte by merging the number and the flags.
The flags should be preserved at all times. Any misuse of the embedded flag may lead to invalid data.
First we have to know how much data can can be stored. For example Int32
is type of IB4 which can store any number in 0...63 range.
As an example of a Unsinged Integer imagine we want to store number 50. Since the datatype is uint
and is not nullable it falls into IB6 category. Because 50 is smaller than IB6 embeddable range it can be stored in the index byte. Finally because the number is embeded we have to set the flag.
50 decimal = [00110010] byte
IB6
Embedded flag = [10000000]
Final byte = [10110010]
Now imagine that we want to save the same number 50 but this time the data type is a nullable signed integer int?
. This type falls into IB3 category which the largest embedable number is 31 so that means we cannot embed 50 into index byte. This is how it is stored.
50 decimal = [00110010] byte
IB3
Not null not embeded signed number flag = [00000000]
Final bytes = [0000000][00110010]
In here the first byte is index flag which its flags are not enabled and the second byte is the number itself.
Same process should be done while reading data. As the first step we have to determine the datatable from the schema, then decide which index bytes category it belongs to and finally check the flags and read the data and seperated it from any flags.
This section descirbes the category and also the structure of simple data types supported by the serializer.
Category: None
Structure: None.
Category: IB5
Structure: None.
Category: None
Structure: None.
Category: IB3
Structure: None.
Category: IB4
Structure: None.
Category: None
Structure: byte
.
Category: IB2
Structure: byte?
.
Category: IB6
Structure: ushort
.
Category: IB2
Structure: ushort?
.
This section describes the types that require a simple structure in addition to the category.
Structure: [data-length : uint?][string-data-encoded : byte-array]
String Data: Byte-Array.
Note: Encoding string to byte-array is done throught utf8 encoder by default.
Structure: [data-length : uint][double-variable-data : byte-array]
Data Format: Double value is converted to 16 bytes and only low values with actual data stored.
TODO: explain.
Structure: [data-length : uint?][double-variable-data : byte-array]
Data Format: Same as double
Structure: [data-length : uint][double-variable-data : byte-array]
Data Format: Double value is converted to 8 bytes and only low values with actual data stored.
TODO: explain.
Structure: [data-length : uint?][double-variable-data : byte-array]
Data Format: Double value is converted to 8 bytes and only low values with actual data stored.
TODO: explain.
TO BE COMPLETED