Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement support for child tries #166

Closed
tomaka opened this issue Feb 11, 2023 · 62 comments · Fixed by #722
Closed

Implement support for child tries #166

tomaka opened this issue Feb 11, 2023 · 62 comments · Fixed by #722

Comments

@tomaka
Copy link
Contributor

tomaka commented Feb 11, 2023

No description provided.

@tomaka
Copy link
Contributor Author

tomaka commented Feb 14, 2023

As commented, the design of child tries is a bit confusing. An issue should be opened in the spec repo to ask for some of the behaviors to be specified. See the TODOs in host.rs for a list of unclear behaviors.

@cmichi
Copy link

cmichi commented Mar 20, 2023

While working on use-ink/cargo-contract#988 we ran into the issue of missing child tries in smoldot. More specifically we tried to get https://github.com/AcalaNetwork/chopsticks working with ink! contracts.

Missing child tries is a blocker for anything to do with pallet-contracts on smoldot and it would be greatly appreciated if it would happen at some point. Currently we unfortunately can't use smoldot for smart contracts.

@tomaka
Copy link
Contributor Author

tomaka commented Mar 20, 2023

The next step in adding support would be to update runtime_host.rs.

This basically contains the logic of the pending changes that have been made to the storage by the runtime but not saved yet.
It also contains the logic of "transactions", where the runtime can potentially revert some changes that it has made to the storage.

Adding support for child tries in an efficient way (I'm not looking for the most optimal implementation, but at least the best possible O-notation complexity) seems extremely complicated, although I haven't precisely looked into it yet.

@tomaka
Copy link
Contributor Author

tomaka commented Mar 20, 2023

After runtime_host.rs is updated, there's also the database code to update (i.e. making the database store the child tries). Other than that, the rest of the changes should be mostly just API changes.

@cmichi
Copy link

cmichi commented Mar 21, 2023

@tomaka Would you be open to writing out a W3F grant for someone to implement this?

For context: We need this not only for using light clients with smart contract Dapps, but also for some more advanced testing tooling and some features that auditors of ink! contracts requested.

@tomaka
Copy link
Contributor Author

tomaka commented Mar 21, 2023

Writing a grant would probably take me way more time than doing it myself

@cmichi
Copy link

cmichi commented Mar 21, 2023

I was honestly hoping for this response 😜 .

@cmichi
Copy link

cmichi commented Mar 21, 2023

So do you think there's a chance of implementing it some time in the next weeks?

@tomaka
Copy link
Contributor Author

tomaka commented Mar 21, 2023

Honestly the biggest obstacle is that the way child tries work is a bit of an illogical shit show, and I have the biggest trouble in the world finding someone who is able to explain me how they work in Substrate.

My previous attempts at shaking things up a bit (https://github.com/w3f/polkadot-spec/issues?q=is%3Aissue+is%3Aopen+child+tries) haven't been very successful.

This is quite demotivating. Everything would be a degree of magnitude more simple if we had a document explaining clearly how they work.

@tomaka
Copy link
Contributor Author

tomaka commented Mar 23, 2023

#272 should to be tackled first in order to avoid having to maintain a cache of each child trie. Removing the caches would considerably simplify the implementation.
An alternative could be to just not have a cache for child tries, but that would obviously not be a long term solution. Having no cache first would just make #272 more difficult to do.

@tomaka
Copy link
Contributor Author

tomaka commented Jun 3, 2023

I've noticed a bunch of people follow this issue, so to keep you updated:
#639 was a major change blocking this. Now that it has landed, it should be possible to add proper support for child tries in runtime_host.rs.

Note that when it comes to using Chopsticks with child tries, Chopsticks will need a non-trivial update for #639.

@tomaka
Copy link
Contributor Author

tomaka commented Jun 3, 2023

If someone is willing to help, having test cases would be extremely appreciated. Since I'm unfamiliar with runtime development and Substrate APIs in general, just creating a contracts chain would probably take me ten times the time necessary to implement the feature. And given that the way child tries work is very poorly documented, these test cases would be extremely useful.

Ideally, I would need as test fixture the header and body of a block that uses child tries in some way (e.g. creating, executing, or deleting a contract) plus the storage of its parent. If the parent is the genesis block, then the storage can be in the chain spec.

@tomaka
Copy link
Contributor Author

tomaka commented Jun 6, 2023

Remains to do in order to have something potentially working:

  • Accept proofs that contain child tries. Right now they'd be rejected because they contain unused entries.
  • Add a child tries diff system to runtime_host.rs.
  • Adjust the public APIs of StorageGet/NextKey/ClosestDescendantMerkleValue to indicate from which trie to read from.

@cmichi
Copy link

cmichi commented Jun 6, 2023

If someone is willing to help, having test cases would be extremely appreciated. Since I'm unfamiliar with runtime development and Substrate APIs in general, just creating a contracts chain would probably take me ten times the time necessary to implement the feature. And given that the way child tries work is very poorly documented, these test cases would be extremely useful.

Ideally, I would need as test fixture the header and body of a block that uses child tries in some way (e.g. creating, executing, or deleting a contract) plus the storage of its parent. If the parent is the genesis block, then the storage can be in the chain spec.

@pgherveou Could you help here? The overall context for this issue is that it's needed to enable light client support of pallet-contracts.

@pgherveou
Copy link
Contributor

pgherveou commented Jun 6, 2023

@tomaka looking into your request, if you want to give me more info happy to jump on an element (@pg:parity.io) / google meet call

@pgherveou
Copy link
Contributor

following up here,
We have test cases in pallet-contracts that does things that are similar to what you described above, for example in this lazy_removal tests, we compile a basic contract from a .wat file, put some data in it's child_trie storage and check if things works as expected upon termination.

https://github.com/paritytech/substrate/blob/55bb6298e74d86be12732fd0f120185ee8fbfe97/frame/contracts/src/tests.rs#L2337-L2415

These tests relies on sp_io::TestExternalities (see https://docs.substrate.io/test/unit-testing/#test-storage-in-a-mock-runtime) to test storage in a mock environment.

With this context, let me know whether or not we could adapt a similar setup in smol-dot to provide the test cases that you mentioned above.

@tomaka
Copy link
Contributor Author

tomaka commented Jun 7, 2023

Hi,

I have zero knowledge or motivation to figure out how Substrate's APIs work.
What I'm looking for is raw data plus a written (non-code) explanation of what this data is.

One way could be to dump the entire content of the storage of block N, plus the header and body of block N+1. This is only useful if block N+1 modifies child tries in some way, preferably in the most interesting way possible (such as creating or deleting one).

@tomaka
Copy link
Contributor Author

tomaka commented Jun 7, 2023

After #684, child tries support is supposed to work in the light client. In other words, if I was sure that the code didn't contain any bug, I would close this issue as there's nothing left to do.
However, given that everything was implemented through guesswork, I'll keep this issue open before there's reasonable confidence that things work.

@tomaka
Copy link
Contributor Author

tomaka commented Jun 8, 2023

there might be something missing from my naive db extract here : https://github.com/pgherveou/rocksdb_export_json/blob/main/src/main.rs

It's what I mentioned above: there are tons of layers of abstractions on top of RocksDB. Probably that 32b090ba46cc0bac21ab1b180c4f1649242c7a7db2b199881880a8a6d4466ad9 is the block hash or state root or something.

@pgherveou
Copy link
Contributor

fyi this is the storage hash
Screenshot 2023-06-08 at 17 28 22

What would be the next steps to make these dataset useful for your test cases?

@tomaka
Copy link
Contributor Author

tomaka commented Jun 9, 2023

What would be the next steps to make these dataset useful for your test cases?

Well, do you know why some entries have a storage hash but not all?
Without understanding this I can't really do if key.len() > 32 { key.truncate(key.len() - 32) }, as it might alter legitimate keys.

@pgherveou
Copy link
Contributor

I think only the leaves keys have the storage hash.
also as it is exported now, the db hold the previous and current keys, (ie old value are not pruned)
What would be the ideal export state for you to make this useful?

@tomaka
Copy link
Contributor Author

tomaka commented Jun 9, 2023

I would need either all the trie nodes with their value (both branch nodes and non-branch nodes), or just the trie nodes that have a value.
The former would be better, because if you give me the latter I calculate the former from the latter.
No storage hash or anything like that, just keys and values.
I also need only the current keys. I only need block 0, so I assume that this is not a problem because there's no previous value.

I don't know how to phrase that, but basically the simplest thing possible. I don't really care if Substrate likes to make things complicated and mix data in a weird way. I don't need storage hashes or previous values or the age of the mother of the developer who wrote the code. Just keys and values.

@pgherveou
Copy link
Contributor

pgherveou commented Jun 9, 2023

I am just trying to help here, for me "simplest" is giving you the raw output that I get from substrate, I understand that smoldot is working without the same abstractions, that's why I am asking what format works best for you.

@tomaka
Copy link
Contributor Author

tomaka commented Jun 9, 2023

What I mean by "simplest" is "simple" in terms of purity, not what is simple for you to generate. For example just a value is more simple than a value plus its hash concatenated together. I am generally frustrated by Substrate being over-engineered and not being able to generate simple things.

To give a comparison, imagine if I needed to know the color of a specific pixel in a PNG image, but instead of just the color of the pixel I get the color plus the Huffman table used to decode it concatenated together. I don't care about the Huffman table, I just need the color, and to me "just the color" is more simple than "color and Huffman table".

I understand that generating this can be a frustrating time sink, it's precisely why I asked for help from potentially someone for which it is less a frustrating time sink than it is for me.

@pgherveou
Copy link
Contributor

pgherveou commented Jun 9, 2023

Ok looked into it a bit more this pm
It turned out that trying to dump the db as JSON was indeed not the best approach.
Luckily, Frame exposes some useful rpc state endpoints to get the raw data:

combining these into a small cli
I was able to export the db and the storage change set for each block

Archive.zip

@tomaka
Copy link
Contributor Author

tomaka commented Jun 10, 2023

Thanks! cc #722

There are a few changes to make:

I'm also using the block 1 from your comment a few days ago, is it still the same? I suggest that your script also retrieves block 1 so that we're sure that they correspond.

@pgherveou
Copy link
Contributor

On my phone now, will update these last few things, later today.

Regarding paging I think there are less than 100 keys so the Api call should pick all keys in one pass.

Regarding previous data, we should not use it, I tweaked the chain to remove transaction fees and deposits in an attempt to get fewer db changes so I could try to understand what's going on at a low level at bit better

I can put all of this data extraction in a shell script if that's useful at all. All is needed is the substrate-contract-node, cargo-contract to compile and execute a contract transaction and this little repo to query the data

@tomaka
Copy link
Contributor Author

tomaka commented Jun 10, 2023

👍 There's absolutely no rush, by the way

@pgherveou
Copy link
Contributor

Here you go: Archive.zip
database is exported at block 0
added block 1 in blocks.json

source if you want to check it https://github.com/pgherveou/contracts-query/blob/404079d270cd7c7e4ae98e4eb42ac49996af1634/src/main.rs#L50-L93

@tomaka
Copy link
Contributor Author

tomaka commented Jun 12, 2023

Thanks

The test is failing but in a way that lets me think there's a bug in smoldot.

Interestingly, the runtime doesn't call ext_default_child_storage_root like I thought it was supposed to do. This means that answer to question 9 is probably Some, and runtime_host.rs needs to manually commit the child trie root change to the main trie.

@tomaka
Copy link
Contributor Author

tomaka commented Jun 14, 2023

The test is now passing after #743

Would it also be possible to generate three other test cases 🙏

  • One that creates two contracts in the same block, so that I test if multiple child tries at the same time work
  • One that accesses/calls a contract that already existed in the previous block.
  • One that destroys a contract that already existed in the previous block.

For the last two, you probably also have to do something like create a contract in block 1, then access/destroy it in block 2. In that case I need the storage of block 1. That storage must also include the child tries of the block, instead of only the main trie. It's unfortunately not really clear to me how to do that.

@pgherveou
Copy link
Contributor

That storage must also include the child tries of the block, instead of only the main trie. It's unfortunately not really clear to me how to do that.

don't the storage APIs mentioned above returned all storage keys, whether they are child trie keys or or not 🤔

@pgherveou
Copy link
Contributor

Added all of this in the repo https://github.com/pgherveou/contracts-query/tree/main
it's all generated by this script (https://github.com/pgherveou/contracts-query/blob/main/export.sh)

  • block 1 -> instantiate contract
  • block 2 -> instantiate contract x 2 (using same code but instantiating it twice)
  • block 3 -> call contract created at block 1
  • block 4 -> call contract terminate

db is exported at each block and all blocks are exported in blocks.json

@tomaka
Copy link
Contributor Author

tomaka commented Jun 16, 2023

The way child tries work is that every key that starts with 0x3a6368696c645f73746f726167653a64656661756c743a corresponds to a child trie whose identifier is the second part of the key. And I would need their content 🙏
I think you can retrieve the content of these child tries using childstate_getKeys, childstate_getStorage and similar.

So for example when looking at db-3.json, there's this:

{
    "key": "0x3a6368696c645f73746f726167653a64656661756c743a14e6ea499ccfe3241adbcf937cfb9d41cb1c33d4c33e5bbf72deb7701a286d53",
    "value": "0xedafed99882fa0a681b7d285180098a9fa0b757bcc2ff51a249c88a80f698079"
  },
  {
    "key": "0x3a6368696c645f73746f726167653a64656661756c743a9c32e5ae59a1f5bc78d625dedc7bce6e788152dee2a0b77837c3cd2e7c7ce1ed",
    "value": "0x233859be5ca485b609b339f61deb9104c9918c91f48dc78f3bc9f858cb304ad3"
  },
  {
    "key": "0x3a6368696c645f73746f726167653a64656661756c743aeb412d2bf2f48d26267fe76b4b9854cf31038b2b465e4ba845b264f4559ba5f6",
    "value": "0xedafed99882fa0a681b7d285180098a9fa0b757bcc2ff51a249c88a80f698079"
  },

Meaning that there are 3 child tries identified by 0x14e6ea499ccfe3241adbcf937cfb9d41cb1c33d4c33e5bbf72deb7701a286d53, 0x9c32e5ae59a1f5bc78d625dedc7bce6e788152dee2a0b77837c3cd2e7c7ce1ed, and 0xeb412d2bf2f48d26267fe76b4b9854cf31038b2b465e4ba845b264f4559ba5f6.
These are the 3 values that you would need to pass as the first parameter of childstate_getKeys or childstate_getStorage

(note that this is not recursive, there's no such thing as a "grandchild trie", so if a child trie key starts with 0x3a6368696c645f73746f726167653a64656661756c743a it's nothing special)

@pgherveou
Copy link
Contributor

Ok out of curiosity this 0x3a6368696c645f73746f726167653a64656661756c743a is this something hard coded somewhere or it's just specific to our small dataset here ?

@pgherveou
Copy link
Contributor

so to be clear, you would like to fill the missing key, values in the db-x.json files to take into account all storage keys including the child-state keys that are currently missing

@tomaka
Copy link
Contributor Author

tomaka commented Jun 16, 2023

0x3a6368696c645f73746f726167653a64656661756c743a is hardcoded. It's ASCII for :child_storage:default:

so to be clear, you would like to fill the missing key, values in the db-x.json files to take into account all storage keys including the child-state keys that are currently missing

Yes!

You can for example add a new map that looks something like this:

"child_tries": {
    "14e6ea499ccfe3241adbcf937cfb9d41cb1c33d4c33e5bbf72deb7701a286d53": [
        {"key": ..., "value", ...},
        {"key": ..., "value", ...}
    ],
    "9c32e5ae59a1f5bc78d625dedc7bce6e788152dee2a0b77837c3cd2e7c7ce1ed": [
        {"key": ..., "value", ...},
        {"key": ..., "value", ...}
    ],
    ...
}

@pgherveou
Copy link
Contributor

updated the files in the repo with the child_keys
format is

  "child_tries": {
    "0x3a6368696c645f73746f726167653a64656661756c743aeb412d2bf2f48d26267fe76b4b9854cf31038b2b465e4ba845b264f4559ba5f6": [
      [
        "0x11d2df4e979aa105cf552e9544ebd2b500000000",
        "0x38466972737420636f6e7472616374"
      ]
    ],
    "0x3a6368696c645f73746f726167653a64656661756c743a14e6ea499ccfe3241adbcf937cfb9d41cb1c33d4c33e5bbf72deb7701a286d53": [
      [
        "0x11d2df4e979aa105cf552e9544ebd2b500000000",
        "0x38466972737420636f6e7472616374"
      ]
    ]
  }

the outer keys have the suffix and (key, values) are encoded as an array of 2 instead of an object, if this is inconvenient to consume we can map it to the suggested format.

@pgherveou
Copy link
Contributor

actually pushed another commit with { key, value } object

@tomaka
Copy link
Contributor Author

tomaka commented Jun 16, 2023

if this is inconvenient to consume we can map it to the suggested format.

Once we're happy with the tests I'll do some conversion to something a bit better. I want to do some changes such as turn the block headers into bytes.

@tomaka
Copy link
Contributor Author

tomaka commented Jun 16, 2023

Ok, with #763 all the tests are passing! Thanks for the help.

I think that there's reasonable confidence that smoldot supports child tries. There are probably issues in corner cases, but none of which can be reached through the paths that a runtime would normally take.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants