Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADR-040: Storage and SMT State Commitments #8430

Merged
merged 24 commits into from
May 11, 2021
Merged

ADR-040: Storage and SMT State Commitments #8430

merged 24 commits into from
May 11, 2021

Conversation

robert-zaremba
Copy link
Collaborator

Description

In ADR-040 we propose a change of storage management without changing SDK store API. At the core of the proposed solution we decouple state commitments from state storage and propose to use LazyLedger SMT.

closes: #7100


Before we can merge this PR, please make sure that all the following items have been
checked off. If any of the checklist items are not applicable, please leave them but
write a little note why.

  • Targeted PR against correct branch (see CONTRIBUTING.md)
  • Linked to Github issue with discussion and accepted design OR link to spec that describes this work.
  • Code follows the module structure standards.
  • Wrote unit and integration tests
  • Updated relevant documentation (docs/) or specification (x/<module>/spec/)
  • Added relevant godoc comments.
  • Added a relevant changelog entry to the Unreleased section in CHANGELOG.md
  • Re-reviewed Files changed in the Github PR explorer
  • Review Codecov Report in the comment section below once CI passes

@robert-zaremba robert-zaremba added C:Store T: ADR An issue or PR relating to an architectural decision record labels Jan 25, 2021
@robert-zaremba robert-zaremba marked this pull request as draft January 25, 2021 22:32

### Snapshots

One of the Stargate core features are snapshots and fast sync. Currently this feature is implemented through IAVL.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't see a way to use snapshots/versions with BoltDB API. BoltDB provides "snapshot isolation", but do not support explicit creation/usage of snapshots.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. We should double-check that the dbs mentioned below actually provide the desired snapshotting mechanism.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking. I'm not completely sure. We can check with the hashicorp team. I found this:

  • "If a long running read transaction (for example, a snapshot transaction) is needed, you might want to set DB.InitialMmapSize to a large enough value to avoid potential blocking of write transaction. " source
  • This sovled issue could be relevant, because waypoint is using boltDB and it support snapshots: Server Snapshot/Restore hashicorp/waypoint#682

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we should spend a time here to discuss the backend. I mean - we need to verify the proposals, but we should have a separate discussion about the the "recommended" DB backend.
So, what do you think about shifting the snapshot functionality to DB?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if using database snapshots to access old state roots is overkill, or even necessary for SMT.

By default, you can already access the old state roots in the LazyLedger SMT implementation, because the tree isn't garbage collected currently. Once garbage collection is added, it could be configured to only garbage collect state roots older than a certain version, which would be equivalent to snapshots, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"If a long running read transaction (for example, a snapshot transaction) is needed, you might want to set DB.InitialMmapSize to a large enough value to avoid potential blocking of write transaction. "

I saw this before, and I'm pretty sure it's about app-level logic executing app-level transaction (iteration over entire KV-store) to generate app-level snapshots.

I'm wondering if using database snapshots to access old state roots is overkill, or even necessary for SMT.

I think we discuss snapshots because of the "storage" part of proposal not "state commitment" part. For SMT, I also don't see a reason to use DB-level snapshots.

So, what do you think about shifting the snapshot functionality to DB?

I agree that it is more than reasonable to use capabilities of DB to ensure we can access&manage previous versions of state.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.
We need to have old versions of SMT for snapshots. So, I see two options here:

  1. Whenever we schedule a snapshot, we do it in a same way as for state snapshots: using DB builtin feature. If we need to create a new node, and orphan some other node, then we immediately remove orphan nodes. So, effectively, we won't have old roots because we will remove them immediately.
  2. Extend the current LL SMT and add pruning.

PS: I assume in your current implementation, you don't update nodes - instead you create a new one, right? This allows you to keep all old versions. Why do you do that, instead of updating nodes?

Copy link

@musalbas musalbas Jan 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to add pruning to LL SMT anyway, otherwise the state will keep growing.

PS: I assume in your current implementation, you don't update nodes - instead you create a new one, right? This allows you to keep all old versions. Why do you do that, instead of updating nodes?

Nodes are stored in a key => value store where the key is a hash of the node, and the value is the preimage of the hash (i.e. the hashes of the children of the node). Assuming the hashes are collision resistant, you can't "update" a node since the tree is immutable - you can only "create" a new tree with a new root.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But you have a path to the updated Leaf, so you can use it to remove old nodes:

  1. Given key, we traverse the tree with path=hash(key)
  2. While going down to a node, we can remember all nodes on the path
  3. When going back and updating (or creating new ones) internal nodes, we can remove old node - we use the path we constructed in the step 2.

Copy link
Contributor

@liamsi liamsi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great start! Left some feedback while reding through it.


### Snapshots

One of the Stargate core features are snapshots and fast sync. Currently this feature is implemented through IAVL.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. We should double-check that the dbs mentioned below actually provide the desired snapshotting mechanism.


## Abstract

Sparse Merke Tree (SMT) is a version of a Merkle Tree with various storage and performance optimizations. This ADR defines a separation of state commitments from data storage and the SDK transition from IAVL to SMT.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this shouldn't in fact be two ADRs instead? One for separating storage and commitments and one about the SMT.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also thinking about it. But they are highly related - one cannot be done without other. Hence, I'm proposing here a general design and leave a space for future ADR for RDMS which will introduce SDK breaking changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well we could separate the two with IAVL right? We don't need SMT for that AFAIK...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aaronc, we could describe here only SMT, but it will only a half backed idea without a working solution:

  • keeping IAVL (in it's current implementation) with anything else doesn't make sense because we double the data.
  • the main value proposition here is to not store objects in SMT (we store only hashes).

Do you have something else in mind?


### Snapshots

One of the Stargate core features are snapshots and fast sync. Currently this feature is implemented through IAVL.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"If a long running read transaction (for example, a snapshot transaction) is needed, you might want to set DB.InitialMmapSize to a large enough value to avoid potential blocking of write transaction. "

I saw this before, and I'm pretty sure it's about app-level logic executing app-level transaction (iteration over entire KV-store) to generate app-level snapshots.

I'm wondering if using database snapshots to access old state roots is overkill, or even necessary for SMT.

I think we discuss snapshots because of the "storage" part of proposal not "state commitment" part. For SMT, I also don't see a reason to use DB-level snapshots.

So, what do you think about shifting the snapshot functionality to DB?

I agree that it is more than reasonable to use capabilities of DB to ensure we can access&manage previous versions of state.

@jackzampolin
Copy link
Member

Just found out about this work. Its very exciting. Where is the best place to find current state of this implementation and how that work is going?

@robert-zaremba
Copy link
Collaborator Author

@jackzampolin - here an the #8297 discssion.

The coding started and will be pushed to store/ll-smt branch

@tzdybal
Copy link
Contributor

tzdybal commented Feb 4, 2021

First SMT store PR is ready: #8507

@tzdybal tzdybal mentioned this pull request Feb 4, 2021
@hxrts
Copy link
Contributor

hxrts commented Feb 4, 2021

attn @jlandrews

Comment on lines 119 to 124
### Backwards Compatibility

This ADR doesn't introduce any SDK level API changes.

We change the storage layout of the state machine, a storage migration and network upgrade is required to incorporate these changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a storage migration and network upgrade is required to incorporate these changes.

Is this changing key paths as well?

Can you discuss backwards compatibility with respect to ics23? It sounds like SMT will be responsible maintaining compatibility? This is a little beyond my area of expertise but I guess my question is:

  • will SMT support ICS 23 (or does it already)?
  • will SMT require a different proof spec from the existing one?

I ask these questions, because changing how proofs are verified or the proof paths can possibly break IBC resulting in the best case an asynchronous network upgrade and in the worst case a synchronous network upgrade (all IBC chains need to synchronously upgrade to continue communication)

cc @AdityaSripal

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this changing key paths as well?

Yes, it's essentially a hard fork.

will SMT support ICS 23?

It's not done - it's in the scope of work we defined few weeks ago. I will clarify it here.

will SMT require a different proof spec from the existing one?

No, it's a merkle tree after all.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

proof paths can possibly break IBC

This sounds important. The proofs will change (because it will require a hard fork), however the verification will be the same. Shall we discuss the network upgrade mechanism in the ADR, or it's a separate discussion?
I think it should be following:

  • disable IBC
  • export chain
  • update binary and import genesis
  • enable IBC

Copy link
Contributor

@colin-axner colin-axner Apr 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the response.

I'll defer to @AdityaSripal and @cwgoes to correct me, but I believe we need to consider the following cases:

IBC increases the complexity of upgrades because the counterparty needs to be informed of any changes in how the communication is being conducted.

Lets say we send a packet and construct a packet commitment under the path K. Normally the counterparty light client can verify a proof of the packet commitment, P, at some block height (ie VerifyProof(K, P)). In the code, our packet commitment proof path looks like this, a store prefix is also added before performing verification of this path. Until given further notice (unsupported atm), the counterparty will continue to assume the packet commit will only ever exist under key path K.

If the key paths are changing, then lets say K' represents the key path after the hard fork and P' represents the proof after the hard fork. My understanding is that K != K' but that P may equal P' (if the same value was to be proved at the same height, but just with different proof paths). K' would always fail on counterparties which are incapable of knowing the key path should be K'

Lets say we do the hard fork at height X, ie height X we are on the current code, height X + 1 we use SMT. The problem is all our counterparties need to be aware that at height X + 1 all key paths should use the new format. Not only do they need to be aware of that, but they need to be capable of using the new key paths for any proof heights >= X + 1. If the upgrade can be somewhat asynchronous, then chains need to be capable of supporting the old and new proof paths.

Re: disable/enable IBC

Unfortunately, there is no disabling/enabling IBC. We can disable/enable IBC transfers, but IBC itself is constantly moving forward. Clients may become expired if they are not successfully updated within a finite amount of time. This would result in funds essentially being "stuck". We do have recovery mechanisms, but ideally we coordinate a solution to avoid such a situation.

Currently, changing key paths/migrating IBC store is an unsupported upgrade, meaning existing IBC channels would be incapable of being used in the upgraded version. I believe it is entirely possible to add support for this, but we will need some time to specify the changes and implement/test them. In addition, all existing channels which cannot be lost between upgrades require all the counterparties to upgrade to the new IBC code which supports this upgrade route. Only then can the upgrade be performed without losing access to the channels.

Re: proof spec changes

I believe ics23 is a generic merkle proofs which uses proof specs for specific merkle tree representation. Changing the proof specs is an easier upgrade mechanism (still requires some coordination) than the above (it is supported), but this is would only be needed if it changed things like the hash function

This is a lot of unpack in a github comment and I think it will require more coordination/clarification - maybe best done in a live call. The primary concern is if a proof key path used in the new code cannot be verified using the old code (and vice versa). This requires the most difficult IBC upgrade path, but there may be clever tricks we can do to alleviate the difficulty over a period of time, if we are aware this is something that should be addressed in the near future

With regards to the ADR, if any the areas above will be effected, then it should be mentioned that these upgrade paths will need to be pursued and ibc-go may need to add more functionality, but the specific upgrade path itself is out of scope

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for comment @colin-axner . I think we should move this to the ADR-40 discussion page, so let me answer it there: #8297 (comment)

TL;DR: we need to design a mechanism for IBC which will work support hard forks anyway. They are inevitable.

In my opinion, this should not block ADR-40, for me this is related, but more general problem: how to handle hard forks. That being said - it will block a release - we need to have a general solution before doing any hard fork.

@robert-zaremba
Copy link
Collaborator Author

It may be good to clarify how the state migration is expected to work.

We will need to do a hard fork and recreate the SMT when loading data. BTW: this is how all major updates have been done in the past.

Copy link
Contributor

@roysc roysc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've suggested some edits to try to disambiguate references to historical versioning (which some DB engines call snapshots) vs. the snapshots used for ABCI state sync. Please correct me if I'm interpreting anything wrong, but I think it's important to be specific with the terminology, especially since the two concerns are related here.

I've also made comments and corrected some typos.

Comment on lines +81 to +82
Database snapshot is a view of DB state at a certain time or transaction. It's not a full copy of a database (it would be too big), usually a snapshot mechanism is based on a _copy on write_ and it allows to efficiently deliver DB state at a certain stage.
Some DB engines support snapshotting. Hence, we propose to reuse that functionality for the state sync and versioning (described below). It will the supported DB engines to ones which efficiently implement snapshots. In a final section we will discuss evaluated DBs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Database snapshot is a view of DB state at a certain time or transaction. It's not a full copy of a database (it would be too big), usually a snapshot mechanism is based on a _copy on write_ and it allows to efficiently deliver DB state at a certain stage.
Some DB engines support snapshotting. Hence, we propose to reuse that functionality for the state sync and versioning (described below). It will the supported DB engines to ones which efficiently implement snapshots. In a final section we will discuss evaluated DBs.
Database versioning provides a view of DB state at a certain time or transaction. It's not a full copy of a database (it would be too big), usually a versioning mechanism is based on a _copy on write_ and it allows to efficiently deliver DB state at a certain stage.
Some DB engines support viewing past versions. Hence, we propose to reuse that functionality for the state sync snapshots and versioning (described below). It will limit the supported DB engines to ones which efficiently implement versioning. In a final section we will discuss DBs evaluated for this feature.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm defining database snapshot here, so I prefer to use snapshot mechanism here, so I prefer to keep the original language.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I may have misunderstood. This is what some DBs call snapshots, and distinct from state sync snapshots as used in the ABCI, right? (although it can be used to implement ABCI snapshots)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, here we are talking about a database engine mechanism.

Database snapshot is a view of DB state at a certain time or transaction. It's not a full copy of a database (it would be too big), usually a snapshot mechanism is based on a _copy on write_ and it allows to efficiently deliver DB state at a certain stage.
Some DB engines support snapshotting. Hence, we propose to reuse that functionality for the state sync and versioning (described below). It will the supported DB engines to ones which efficiently implement snapshots. In a final section we will discuss evaluated DBs.

New snapshot will be created in every `EndBlocker`. The `rootmulti.Store` keeps track of the version number and implements the `MultiStore` interface. `MultiStore` encapsulates a `Commiter` interface, which has the `Commit`, `SetPruning`, `GetPruning` functions which will be used for creating and removing snapshots. The `Store.Commit` function increments the version on each call, and checks if it needs to remove old versions. We will need to update the SMT interface to implement the `Commiter` interface.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For clarification, copy-on-write is used to maintain historical versions, but the state sync snapshot still involves copying the entire state store at the time of creation (at least, that is how it's currently implemented).

@roysc
Copy link
Contributor

roysc commented May 7, 2021

Thanks for the clarifications @robert-zaremba.

My only follow up questions: do snapshots and versioning refer to distinct feature requirements for the state storage backend DB, and if so, how do they differ? And, is there any desire or intention to use DB backup/export (as @tzdybal mentioned here) as part of the implementation of SDK sync snapshots?

Copy link
Member

@aaronc aaronc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine we will want to refine certain things as we go along but this is a great start. Good work @robert-zaremba et al!

@aaronc aaronc merged commit f1de92f into master May 11, 2021
@aaronc aaronc deleted the robert/adr-040 branch May 11, 2021 20:45
@aaronc
Copy link
Member

aaronc commented May 12, 2021

One thing that I think needs to happen when we transition to SMT which isn't covered in this ADR is removing the multi-store. IMHO the multi-store just has to go (see #6370).

@alexanderbez
Copy link
Contributor

One thing that I think needs to happen when we transition to SMT which isn't covered in this ADR is removing the multi-store. IMHO the multi-store just has to go (see #6370).

Yes please! Just use namespaces or something.

@aaronc
Copy link
Member

aaronc commented May 12, 2021

One thing that I think needs to happen when we transition to SMT which isn't covered in this ADR is removing the multi-store. IMHO the multi-store just has to go (see #6370).

Yes please! Just use namespaces or something.

So probably it should work something like this:

  • given a store named foo
  • in the SC layer we store hash("foo", key) for the key
  • in the SS layer we store concat(prefix("foo"), key) for the key where prefix("foo") is a single byte deterministically generated prefix

What do you think @robert-zaremba ?

@robert-zaremba
Copy link
Collaborator Author

Correct, multistore is not needed. Prefix store a cleaner solution.

Re: @aaronc example, I prefer to be consistent and use same keys in both SC and SS - so in both cases the key which will go in the API call to SC and SS will be concat("foo", key) . NOTE: SC will always hash keys internally when inserting a record in a tree.

prefix("foo") is a single byte deterministically generated prefix

I'm a bit skeptical about this. What if we will have more than 255 "sub-stores" under same "parent" (eg 300 modules)?
We can use Huffman Coding instead if we want to compress the prefix keys.

I will create an update to this ADR to discuss this.

@robert-zaremba
Copy link
Collaborator Author

PR: adr-40: use prefix store instead of multistore #9355

@zmanian
Copy link
Member

zmanian commented May 19, 2021

Death to the multistore!

@robert-zaremba
Copy link
Collaborator Author

Let's continue the discussion about multistore in Replace IAVL and decouple state commitment from storage #8297 discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C:Store T: ADR An issue or PR relating to an architectural decision record T:Docs Changes and features related to documentation.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

What to do about IAVL?