Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Update ADR-040 to store hash(value) in SMT leaf #9680

Merged
merged 8 commits into from
Sep 15, 2021
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -38,14 +38,14 @@ The storage model presented here doesn't deal with data structure nor serializat

Separation of storage and commitment (by the SMT) will allow the optimization of different components according to their usage and access patterns.

`SS` (SMT) is used to commit to a data and compute merkle proofs. `SC` is used to directly access data. To avoid collisions, both `SS` and `SC` will use a separate storage namespace (they could use the same database underneath). `SC` will store each `(key, value)` pair directly (map key -> value).
`SC` (SMT) is used to commit to a data and compute merkle proofs. `SS` is used to directly access data. To avoid collisions, both `SS` and `SC` will use a separate storage namespace (they could use the same database underneath). `SS` will store each `(key, value)` pair directly (map key -> value).

SMT is a merkle tree structure: we don't store keys directly. For every `(key, value)` pair, `hash(key)` is stored in a path (we hash a key to evenly distribute keys in the tree) and `hash(key, value)` in a leaf. Since we don't know a structure of a value (in particular if it contains the key) we hash both the key and the value in the `SC` leaf.
SMT is a merkle tree structure: we don't store keys directly. For every `(key, value)` pair, `hash(key)` is stored in a path (we hash a key to evenly distribute keys in the tree) and `hash(value)` in a leaf.
Copy link
Collaborator

@robert-zaremba robert-zaremba Jul 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
SMT is a merkle tree structure: we don't store keys directly. For every `(key, value)` pair, `hash(key)` is stored in a path (we hash a key to evenly distribute keys in the tree) and `hash(value)` in a leaf.
SMT is a merkle tree structure: we don't store keys directly. For every `(key, value)` pair, `hash(key)` is stored in a path (we hash a key to evenly distribute keys in the tree) and `0x00 || key || hash(value)` in a leaf.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed - in the leaf we need to commit to the key as well (it's not enough that it is in a path).

Copy link
Contributor

@i-norden i-norden Jul 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hash(key)/path is stored in the leaf node, even if not as part of the "leaf value" (i.e. the value returned from calling Get on the SMT)

leaf node == prefix || path || leaf_value == prefix || hash(key) || hash(value_provided_to_the_SMT)

When calling Set(key, value_provided_to_the_SMT) it hashes the key into the path and the value_provided_to_the_SMT into the leaf_value. When calling Get(key) it returns value_provided_to_the_SMT.

Note that value_provided_to_the_SMT currently is hash(key || value_in_the_StateStore) so that when we call Get we retrieve hash(key || value_in_the_StateStore) which is the key we need for the (current) inverse index.

So the SMT leaf node is, in current practice, prefix || hash(key) || hash(hash(key || value_in_the_StateStore))

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, so we can add prefix + to my suggestion. We can use || operator instead of + if you prefer.

Note that value_provided_to_the_SMT currently is hash(key || value_in_the_StateStore)

Why is that? We should provide key and obj_value without modifying it. SMT will do all necessary operations.

Copy link
Contributor Author

@roysc roysc Jul 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was just a misunderstanding of the ADR language, I think. We didn't think it was describing the SMT's internal structure, since the hashed values are not exposed by the SMT interface (so we assumed the hashed value should be passed in). But we can add methods for that and fork the code if necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is that? We should provide key and obj_value without modifying it. SMT will do all necessary operations.

Like Roy said it is because the current implementation does not expose the hashed values.

We did it like this so that when we would Get from the SMT we retrieved the value we needed for the old inverse index hash(key || value).

  1. Set takes a key and value, Get only returns the value provided to Set not some internal transformation of key and/or value- this would be really odd behavior for a Setter and Getter interface.
  2. If the value we provided to Set was the unhashed "value" (key || obj_value), then when we would Get from the SMT we would get that unhashed value (again, not some internal- hashed- transformation of the value we provided) and we would have to hash it again at the level above the SMT before we could use it in the inverse index. This would have worked but would mean we were duplicating hashing efforts at the two levels.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are on the same page. In my suggestion I added hash(key) to the leaf (I'm using + operator rather than ||). It seams we need to update it to (as you noted in the comment above):

prefix + hash(key) + hash(value_provided_to_the_SMT)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping, @roysc , @i-norden - let's update the paragraph to what's in the SMT leaf and merge this PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


For data access we propose 2 additional KV buckets (namespaces for the key-value pairs, sometimes called [column family](https://github.com/facebook/rocksdb/wiki/Terminology)):
For data access we propose 2 additional KV buckets (implemented as namespaces for the key-value pairs, sometimes called [column family](https://github.com/facebook/rocksdb/wiki/Terminology)):

1. B1: `key → value`: the principal object storage, used by a state machine, behind the SDK `KVStore` interface: provides direct access by key and allows prefix iteration (KV DB backend must support it).
2. B2: `hash(key, value) → key`: a reverse index to get a key from an SMT path. Recall that SMT will store `(k, v)` as `(hash(k), hash(key, value))`. So, we can get an object value by composing `SMT_path → B2 → B1`.
2. B2: `hash(key) → key`: a reverse index to get a key from an SMT path. Recall that SMT will store `(k, v)` as `(hash(k), hash(value))`. So, we can get an object value by composing `SMT_path → B2 → B1`.
3. we could use more buckets to optimize the app usage if needed.

Above, we propose to use a KV DB. However, for the state machine, we could use an RDBMS, which we discuss below.
Expand Down