Status: Standard
This document describes the data format and directions for reading and writing bundled binary data. Bundled data is a way of writing multiple independent data transactions (referred to as DataItems in this document) into one top level transaction. A DataItem shares many of the same properties as a normal data transaction, in that it has an owner, data, tags, target, signature, and id. It differs in that is has no ability to transfer tokens, and no reward, as the top level transaction pays the reward for all bundled data.
Bundling multiple data transactions into one transaction provides a number of benefits:
-
Allow delegation of payment for a DataItem to a 3rd party, while maintaining the identity and signature of the person who created the DataItem, without them needing to have a wallet with funds
-
Allow multiple DataItems to be written as a group
-
Increase the throughput of logically independent data-writes to the Arweave network
There is a reference implementation for the creation, signing, and verification of DataItems and working with bundles in TypeScript
A bundle of DataItems MUST have the following two tags present:
Bundle-Format
a string describing the bundling format. The format for this standard isbinary
Bundle-Version
a version string. The version referred to in this standard is2.0.0
Version changes may occur due to a change in encoding algorithm in the future
This format for the transaction body is binary data in the following bytes format
N = number of DataItems
Bytes | Purpose |
---|---|
32 | Numbers of data items |
N x 64 |
Pairs of offset and entry ids [offset (32 bytes), entry ID (32 bytes)] |
Remaining bytes | Binary encoded data items in bundle |
A DataItem is a binary encoded object that has similar properties to a transaction
Field | Description | Encoding | Length (in bytes) | Optional |
---|---|---|---|---|
signature type | Type of key format used for the signature | Binary | 2 | ❌ |
signature | A signature produced by owner | Binary | Depends on signature type | ❌ |
owner | The public key of the owner | Binary | 512 | ❌ |
target | An address that this DataItem is being sent to | Binary | 32 (+ presence byte) | ✔️ |
anchor | A value to prevent replay attacks | Binary | 32 (+ presence byte) | ✔️ |
number of tags | Number of tags | Binary | 8 | ❌ |
number of tag bytes | Number of bytes used for tags | Binary | 8 | ❌ |
tags | An array of tag objects | Binary | Variable | ❌ |
data | The data contents | Binary | Variable | ❌ |
All optional fields will have a leading byte which describes whether the field is present (1
for present, 0
for not present). Any other value for this byte makes the DataItem invalid.
A tag object is a binary object representing an object { name: string, value: string }
.
The anchor
and target
fields in DataItem are optional. The anchor
is an arbitrary value to allow bundling gateways
to provide protection from replay attacks against them or their users.
Field | Description | Encoding | Length | Optional |
---|---|---|---|---|
name | Name of the tag | Binary | Variable | ❌ |
value | Value of the tag | Binary | Variable | ❌ |
The number of bytes used for tags is needed to know the start point of the data payload
The fields on each DataItem will either be fixed-sized or used run-length encoding in order to describe the fields' length. This allows the parser to know the bytes relevant to each field
The signature and id for a DataItem is built in a manner similar to Arweave 2.0 transaction signing. It uses the Arweave
2.0 deep-hash algorithm. The 2.0 deep-hash algorithm operates on arbitrarily nested arrays of binary data, i.e a
recursive type of DeepHashChunk = Uint8Array | DeepHashChunk[]
.
There are reference implementations for the deep-hash algorithm in TypeScript and Erlang
To generate a valid signature for a DataItem, the contents of the DataItem and static version tags are passed to the deep-hash algorithm to obtain a message. This message is signed by the owner of the DataItem to produce the signature. The id of the DataItem, is the SHA256 digest of this signature.
The exact structure and content passed into the deep-hash algorithm to obtain the message to sign is as follows:
[
utf8Encoded("dataitem"),
utf8Encoded("1"),
owner,
target,
anchor,
[
... [ tag.name, tag.value ],
... [ tag.name, tag.value ],
...
],
data
]
DataItem verification is a key to maintaining consistency within the bundle standard. A DataItem is valid iff.1:
- id matches the signature (via SHA-256 of the signature)
- signature matches the owner's public key
- tags are all valid
- an anchor isn't more than 32 bytes
A tag object is valid iff.:
- there are <= 128 tags
- each key is <= 1024 bytes
- each value is <= 3072 bytes
- only contains a key and value
- both the key and value are non-empty strings
To write a bundle of DataItems, each DataItem should be constructed, signed, encoded, and placed in a transaction with the transaction body format and transaction tags specified in Section 1.
To read a bundle of DataItems, the list of bytes representing the DataItems can be partitioned using the offsets in each
pair. Subsequently, each partition can be parsed to a DataItem object (struct
in languages such as Rust/Go etc.
or JSON
in TypeScript).
This allows for querying of a singleton or a bundle as a whole.
This format allows for indexing of specific fields in O(N)
time. Some form of caching or indexing could be performed
by gateways to improve lookup times.
1 - if and only if