diff --git a/docs/programmers_guide/witness_formal_spec.md b/docs/programmers_guide/witness_formal_spec.md deleted file mode 100644 index 66ad01fdcd3..00000000000 --- a/docs/programmers_guide/witness_formal_spec.md +++ /dev/null @@ -1,599 +0,0 @@ -# Block Witness Formal Specification - -The keywords "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC 2119](https://tools.ietf.org/html/rfc2119). - -## Data Types - -### Basic - -`nil` - an empty value. - -`Any` - any data type. MUST NOT be `nil`. - -`Int` - an integer value. We treat the domain of integers as infinite, -the overflow behaviour or mapping to the actual data types is undefined -in this spec and should be up to implementation. - -`Byte` - a single byte. - -`Hash` - 32 byte value, representing a result of Keccak256 hashing. - -### Composite - -`()` - an empty array of arbitrary type. - -`(Type...)` - an array of a type `Type`. MUST NOT be empty. - -`{field:Type}` - an object with a field `field` of type `Type`. - - - full notation: `type T = {field:Type}` - - - inline `type TBase = T1{field:Type}|T2{field2:Type2}` - -### Type Definitions - -The type definitions are a bit similar to [Haskell](https://en.wikibooks.org/wiki/Haskell/Type_declarations). -The key differences are how the arrays and type fields are defined. - -### Nodes - -``` -type Node = HashNode{raw_hash:Hash} - | ValueNode{raw_value:(Byte...)} - | AccountNode{nonce:Int balance:Int storage:nil|Node code:nil|CodeNode|HashNode} - | LeafNode{key:(Byte...) value:ValueNode|AccountNode} - | ExtensionNode{key:(Byte...) child:Node} - | BranchNode{child0:nil|Node child1:nil|Node child3:nil|Node ... child15:nil|Node} - | CodeNode{code:(Byte... )} -``` - -### Witness - -Witness MUST have at least 1 element. - -``` -type WitnessHeader = {version:Int} -type Instruction = {code:Int parameter:Any...} -type Witness = (Node|Instruction...) -``` - - -## Execution Environment - -The witness execution environment MUST contain the following 2 elements: - -- **WitnessHeader** -- a header containing the version of the witness. The `version` MUST be 1. - -- **Witness** -- a witness to be executed; - -- **Substitution Rules** -- a list of all possible substitution rules. - - -## Execution process - -Initially, the Witness MUST BE an array of `Instruction`s. - -Then, as substitution rules are applied to the witness, some elements of the -array are replaced with `Node`s. - -The execution continues until there are no substitution rules left to execute. - -Here is how the execution code might look like in Go for building a single trie. - -```go -witness := GetInitialWitness() -rules := GetSubstitutionRules() -numberOfRulesApplied := 1 // initial state - -for numberOfRulesApplied > 0 { - witness, numberOfRulesApplied := ApplyRules(witness, rules) -} - -if len(witness) == 1 { - trie.root = witness[0] -} else { - panic("witness execution failed") -} - -``` - - -And here is an example of the execution process (we will use the set of rules -from the **Substitution Rules** section of this document): - -* **Step 1**. Witness: `(HASH h1 HASH h2 BRANCH 0b101 HASH h3 BRANCH 0b11)` - -* **Step 2**. Apply `HASH` substitution rules. -Witness: `(HashNode{h1} HashNode{h2} BRANCH 0b101 HashNode{h3} BRANCH 0b11)` - -* **Step 3**. Apply `BRANCH` substitution rules (only once, because `BRANCH 0b11` -doesn't pass its `GUARD` statements just yet). -Witness: `(BranchNode{0: HashNode{h1} 2:HashNode{h2}} HashNode{h3} BRANCH 0b11)` - -* **Step 4**. Apply `BRANCH` substitution rules again. -Witness: `(BranchNode{0: BranchNode{0: HashNode{h1} 2:HashNode{h2}} 1:HashNode{h3}})` - -* **Step 5**. No more rules are applicable, the witness contains only one - element, the execution ends successfully. - - -## End Criteria - -The execution ends when there are no substitution rules applicable for this -witness. - -### Building a single trie from the witness - -If we are building a single trie from the witness, then the only SUCCESS -execution is when the following are true: - -- The execution state MUST match the End Criteria -- There MUST be only one item left in the witness -- This item MUST be of the type `Node` - -In that case, this last item will be the root of the built trie. - -Every other end state is considered a FAILURE. - - -### Building a Forest - -We also can build a forest of tries with this approach, by adding a new -Instruction `NEW_TRIE` and adjusting the success criteria a bit: - -- The execution state MUST match the End Criteria; -- The items that are left in the witness MUST follow this pattern: `(Node - NEW_TRIE ... Node)` -- Each `Node` element will be a root of a trie. - -Every other end state is considered a FAILURE. - - -## Instructions & Parameters - -A single instruction consists of substitution rules and parameters. - -Each instruction MAY have one or more parameters. -The parameters values MUST be encoded in the witness. - -That makes it different from the helper function parameters that MAY come from the stack or MAY come from the witness. - - -## Substitution rules - -A substitution rule consists of 3 parts: - -`[GUARD] PATTERN |=> RESULT` - -- to the left of the `|=>` sign: - - - optional `GUARD` statements; - - - the pattern to match against; - -- result, to the right of the `|=>` sign. - - -### `GUARD`s - -Each substitution rule can have zero, one or multiple `GUARD` statements. -Each `GUARD` statement looks like this: - -``` -GUARD -``` - -For a substitution rule to be applicable, the `` in its `GUARD` statement MUST be true. - -If a substitution rule has multiple `GUARD` statements, all of them MUST BE satisfied. - -If there are no `GUARD` statements, the substitution rule's applicability is -only defined by the PATTERN. - -Example: -``` - GUARD NBITSET(mask) == 2 -|---- GUARD STATEMENT ---| - - Node(n0) Node(n1) BRANCH(mask) |=> - BranchNode{MAKE_VALUES_ARRAY(mask, n0, n1)} -``` - -For the example rule to be applicable both facts MUST be true: - -1. `mask` contains only 2 bits set to 1 (the rest are set to 0); - -2. to the left of `BRANCH` instruction there is at least 2 `Node`s. - -Fact (1) comes from the `GUARD` statement. - - -### PATTERN - -`[NodeType(boundVar1)... NodeType(boundVarN)] Instruction[(param1... paramN)]` - -A pattern MUST contain a single instruction. -A pattern MAY contain one or more `Node`s to the left of the instruction to -match. -An instruction MAY have one or more parameters. - -Pattern matching is happening by the types. `Node` type means any node is -matched, some specific node type will require a specific match. - -Pattern can have bound variable names for both matched nodes and instruction -parameters (if present). - -Match: - -``` -HASH h0 HashNode{h1} HashNode{h2} BRANCH 0b11 - |------------------- MATCH -----------| - -HASH h0 BranchNode{0: HashNode{h1} 1: HashNode{h2}} - |----------- SUBSTITUTED -------------------| -``` - -No match (not enough nodes to the left of the instruction): - -``` -HASH h0 HASH h1 HashNode{h2} BRANCH 0b11 -``` - -### Result - -`NodeType(HELPER_FUNCTION(arguments))` - -The result is a single `Node` statement that replaces the pattern in the -witness if it matches and the guards are passed. - -The result MAY contain helper functions or might have in-line computation. -The result MUST have a specific node type. No generic `Node` is allowed. - -Helper functions or inline computations might use bound variables from the -pattern. - -Example - -``` - -Node(n0) Node(n1) BRANCH(mask) |=> -BranchNode{MAKE_VALUES_ARRAY(mask, n0, n1)} - ^ ^-- ^--- BOUND NODES - |---- BOUND INSTRUCTION PARAM - |------ HELPER CALL ------------| -|----------------- RESULT ------------------| - -``` - -### Bringing it all together - - -So the full syntax is this: - -``` -[GUARD ...] [ NodeType(bound_variable1)... ] INSTRUCTION[(param1 ...)] |=> -Node() -``` - -`NodeType` is one of the types of nodes to match. Can also be `Node` to match -any non-nil node. - -Substitution rules MUST be non-ambiguous. Even though, there can be multiple -substitution rules applicable to the whole witness at the same time, there MUST -be only one rule that is applicable to a certain position in the witness. - -So, the minimal substitution rule is the one for the `HASH` instruction that pushes one hash to the stack: -``` -HASH(hashValue) |=> HashNode{hashValue} -``` - - -## Helper functions - -Helper functions are functions that are used in GUARDs or substitution rules. - -Helper functions MUST be pure. -Helper functions MUST have at least one argument. -Helper functions MAY have variadic parameters: `HELPER_EXAMPLE(arg1, arg2, list...)`. -Helper functions MAY contain recursion. - -## Instructions - -### `LEAF key raw_value` - -**Substitution rules** - -Replaces the instruction with a `ValueNode` wrapped with a `LeafNode`. - -``` -LEAF(key, raw_value) |=> LeafNode{key, ValueNode(raw_value)} -``` - -### `EXTENSION key` - -Wraps a node to the left of the instruction with an `ExtensionNode`. - -**Substitution rules** - -``` -Node(node) EXTENSION(key) |=> ExtensionNode{key, node} -``` - -### `HASH raw_hash` - -Replaces the instruction with a `HashNode`. - -**Substitution rules** - -``` -HASH(hash_value) |=> HashNode{hash_value} -``` - -### `CODE raw_code` - -Replaces the instruction with a `CodeNode`. - -``` -CODE(raw_code) |=> CodeNode{raw_code} -``` - -### `ACCOUNT_LEAF key nonce balance has_code has_storage` - -Replaces the instruction and, optionally, up to 2 nodes to the left of the -instruction with a single `AccountNode` wrapped with a `LeafNode`. - -**Substitution rules** - -``` -GUARD has_code == true -GUARD has_storage == true - -CodeNode(code) Node(storage_hash_node) ACCOUNT_LEAF(key, nonce, balance, has_code, has_storage) |=> -LeafNode{key, AccountNode{nonce, balance, storage_root, code}} - ---- - -GUARD has_code == true -GUARD has_storage == true - -HashNode(code) Node(storage_hash_node) ACCOUNT_LEAF(key, nonce, balance, has_code, has_storage) |=> -LeafNode{key, AccountNode{nonce, balance, storage_root, code}} - ---- - -GUARD has_code == false -GUARD has_storage == true - -Node(storage_root) ACCOUNT_LEAF(key, nonce, balance, has_code, has_storage) |=> -LeafNode{key, AccountNode{nonce, balance, storage_root, nil}} - ---- - -GUARD has_code == true -GUARD has_storage == false - -CodeNode(code) ACCOUNT_LEAF(key, nonce, balance, has_code, has_storage) |=> -LeafNode{key, AccountNode{nonce, balance, nil, nil, code}} - ---- - -GUARD has_code == true -GUARD has_storage == false - -HashNode(code) ACCOUNT_LEAF(key, nonce, balance, has_code, has_storage) |=> -LeafNode{key, AccountNode{nonce, balance, nil, nil, code}} - ---- - -GUARD has_code == false -GUARD has_storage == false - -ACCOUNT_LEAF(key, nonce, balance, has_code, has_storage) |=> -LeafNode{key, AccountNode{nonce, balance, nil, nil, nil}} - -``` - -### `NEW_TRIE` - -No substitution rules. This instruction is used as a "divider" when building -a forest of tries. - -### `BRANCH mask` - -Replaces `NBITSET(mask)` `Node`s to the left of the instruction with a single -`BranchNode` with these nodes as children according to `mask`. - -**Substitution rules** -``` - -GUARD NBITSET(mask) == 2 - -Node(n0) Node(n1) BRANCH(mask) |=> -BranchNode{MAKE_VALUES_ARRAY(mask, n0, n1)} - ---- - -GUARD NBITSET(mask) == 3 - -Node(n0) Node(n1) Node(n2) BRANCH(mask) |=> -BranchNode{MAKE_VALUES_ARRAY(mask, n0, n1, n2)} - ---- - -... - ---- - -GUARD NBITSET(mask) == 16 - -Node(n0) Node(n1) ... Node(n15) BRANCH(mask) |=> -BranchNode{MAKE_VALUES_ARRAY(mask, n0, n1, ..., n15)} -``` - -## Helper functions - -### `MAKE_VALUES_ARRAY` - -returns an array of 16 elements, where values from `values` are set to the indices where `mask` has bits set to 1. Every other place has `nil` value there. - -**Example**: `MAKE_VALUES_ARRAY(5, [a, b])` returns `[a, nil, b, nil, nil, ..., nil]` (binary representation of 5 is `0000000000000101`) - -``` -MAKE_VALUES_ARRAY(mask, values...) { - return MAKE_VALUES_ARRAY(mask, 0, values) -} - -MAKE_VALUES_ARRAY(mask, idx, values...) { - if idx > 16 { - return [] - } - - if BIT_TEST(mask, idx) { - return PREPEND(FIRST(values), (MAKE_VALUES_ARRAY mask, INC(idx), REST(values))) - } else { - return PREPEND(nil, (MAKE_VALUES_ARRAY mask, INC(idx), values)) - } -} -``` - - -### `RLP(value)` - -returns the RLP encoding of a value - - -### `NBITSET(number)` - -returns number of bits set in the binary representation of `number`. - -### `BIT_TEST(number, n)` - -`n` MUST NOT be negative. - -returns `true` if bit `n` in `number` is set, `false` otherwise. - -### `PREPEND(value, array)` - -returns a new array with the `value` at index 0 and `array` values starting from index 1 - -### `INC(value)` - -increments `value` by 1 - -### `FIRST(array)` - -returns the first value in the specified array - -### `REST(array)` - -returns the array w/o the first item - -### `KECCAK(bytes)` - -returns a keccak-256 hash of `bytes` - - -## Serialization - -The format for serialization of everything except hashes (that we know the -length of) is [CBOR](https://cbor.io). It is RFC-specified and concise. - -For hashes we use the optimization of knowing the lengths, so we just read 32 -bytes - -### Block Witness Format - -Each block witness consists of a header followed by a list of instructions. - -There is no length of witness specified anywhere, the code expects to just reach `EOF`. - -Serialized Witness: `(HEADER, OP1, OP2, ..., OPn-1, OPn, EOF)` - -#### Encoding - -##### CBOR - -The parts of the key that are encoded with CBOR are marked by the `CBOR` function. - -##### Keys - -Keys are also using custom encryption to make them more compact. - -The nibbles of a key are encoded in a following way `[FLAGS NIBBLE1+NIBBLE2 NIBBLE3+NIBBLE4 NIBBLE5... ]` - -*FLAGS* -* bit 0 -- 1 if the number of nibbles were odd -* bit 1 -- 1 if the nibbles end with 0x10 (the terminator byte) - -This is shown later as `ENCODE_KEY` function. - -#### Header - -format: `version:byte` - -encoded as `[ version ]` - -the current version is 1. - -#### Instructions - -Each instruction starts with an opcode (`uint`). - -Then it might contain some data. - -##### `LEAF` - -format: `LEAF key:[]byte value:[]byte` - -encoded as `[ 0x00 CBOR(ENCODE_KEY(key))... CBOR(value)... ]` - - -##### `EXTENSION` - -format: `EXTENSION key:[]byte` - -encoded as `[ 0x01 CBOR(ENCODE_KEY(key))... ]` - - -##### `BRANCH` - -format: `BRANCH mask:uint32` - -*mask* defines which children are present -(e.g. `0000000000001011` means that children 0, 1 and 3 are present and the other ones are not) - -encoded as `[ 0x02 CBOR(mask)...]` - - -##### `HASH` - -format: `HASH hash:[32]byte` - -encoded as `[ 0x03 hash_byte_1 ... hash_byte_32 ]` - - -##### `CODE` - -format: `CODE code:[]byte` - -encoded as `[ 0x04 CBOR(code)... ]` - - -##### `ACCOUNT_LEAF` - -format: `ACCOUNT_LEAF key:[]byte flags [nonce:uint64] [balance:[]byte]` - -encoded as `[ 0x05 CBOR(ENCODE_KEY(key))... flags /CBOR(nonce).../ /CBOR(balance).../ ]` - -*flags* is a bitset encoded in a single bit (see [`witness_operators_test.go`](https://github.com/erigontech/erigon/blob/adf52465e3fd67a76a52c7214b20dcbe15182b98/turbo/trie/witness_operators_test.go) to see flags in action). -* bit 0 defines if **code** is present; if set to 1, then `has_code=true`; -* bit 1 defines if **storage** is present; if set to 1, then `has_storage=true`; -* bit 2 defines if **nonce** is not 0; if set to 0, *nonce* field is not encoded; -* bit 3 defines if **balance** is not 0; if set to 0, *balance* field is not encoded; - -##### `NEW_TRIE` - -format: `NEW_TRIE` - -encoded as `[ 0xBB ]` - diff --git a/docs/programmers_guide/witness_format.md b/docs/programmers_guide/witness_format.md deleted file mode 100644 index fe0909f6898..00000000000 --- a/docs/programmers_guide/witness_format.md +++ /dev/null @@ -1,96 +0,0 @@ -# Block Witness Format - -Each block witness consists of a header followed by a list of operators. - -There is no length of witness specified anywhere, the code expects to just reach `EOF`. - -Witness: `[HEADER, OP1, OP2, ..., OPn-1, OPn, EOF]` - -## Encoding - -### Keys - -key nibbles are encoded in a following way `[FLAGS NIBBLE1+NIBBLE2 NIBBLE3+NIBBLE4 NIBBLE5... ]` - -*FLAGS* -* bit 0 -- 1 if the number of nibbles were odd -* bit 1 -- 1 if the nibbles end with 0x10 - -## Header - -format: `version:byte` - -encoded as `[ version ]` - -the current version is 1. - -## Operators - -Each operator starts with an opcode (see [`witness_operators.go`](../../trie/witness_operators.go) for exact values). - -Then it might contain some data. - -### `OpEmptyRoot` - -puts empty trie root to on the stack - -format: `OpEmptyRoot` - -encoded as `[ 0x06 ]` - -### `OpHash` - -puts a single hash on the stack - -format: `OpHash hash:[32]byte` - -encoded as `[ 0x03 hash_byte_1 ... hash_byte_32 ]` - -### `OpBranch` - -pops N values from the stack and adds them as the children; pushes the result to the stack; N is the number of 1's in the *mask* field. - -format: `OpBranch mask:uint32` - -*mask* defines which children are present -(e.g. `0000000000001011` means that children 0, 1 and 3 are present and the other ones are not) - -encoded as `[ 0x02 CBOR(mask)...]` - -### `OpCode` - -pushes a code to the stack - -format: `OpCode code:[]byte` - -encoded as `[ 0x04 CBOR(code)... ]` - -### `OpExtension` - -pushes an extension node with a specified key on the stack - -format: `OpExtension key:[]byte` - -encoded as `[ 0x01 CBOR(key)... ]` - -### `OpLeaf` - -pushes a leaf with specified key and value to the stack - -format: `OpLeaf key:[]byte value:[]byte` - -encoded as `[ 0x00 CBOR(key)... CBOR(value)... ]` - -### `OpAccountLeaf` - -pushes a leaf with specified parameters to the stack; if flags show, before that pops 1 or 2 values from the stack; - -format: `OpAccountLeaf key:[]byte flags [nonce:uint64] [balance:[]byte]` - -encoded as `[ 0x05 CBOR(key|[]byte)... flags /CBOR(nonce).../ /CBOR(balance).../ ]` - -*flags* is a bitset encoded in a single bit (see [`witness_operators_test.go`](../../trie/witness_operators_test.go) to see flags in action). -* bit 0 defines if **code** is present; if set to 1 it assumes that either `OpCode` or `OpHash` already put something on the stack; -* bit 1 defines if **storage** is present; if set to 1, the operators preceding `OpAccountLeaf` will reconstruct a storage trie; -* bit 2 defines if **nonce** is not 0; if set to 0, *nonce* field is not encoded; -* bit 3 defines if **balance** is not 0; if set to 0, *balance* field is not encoded;