-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
smoldot-light: GrandPa warp sync idle at block #592
Comments
Please run smoldot with My immediate suspicion is that the PeerId of the chain spec is wrong because you're running with |
Hi, I've managed to get this working by doing the following:
./polkadot-cbd5330173 --dev --node-key 0000000000000000000000000000000000000000000000000000000000000001 --alice --validator This produces the same genesis hash and the same node identity
curl -H "Content-Type: application/json" -d '{"id":1, "jsonrpc":"2.0", "method": "sync_state_genSyncSpec", "params":[true]}' http://localhost:9944/ | jq .result > res.spec
If I were to leave the https://github.com/smol-dot/smoldot/blob/main/lib/src/chain_spec.rs#L445-L446 Would it be possible that the "freshly" spawned local node does not have 2 epochs yet? Is there a way to avoid this, but still benefit from the |
The warp sync time is proportional to the length of the chain. Every epoch (4096 blocks? I believe), there's one signature per validator to download and verify. When the chain is several million blocks, this adds up and causes slowness. But for a local chain that has just started, it is really not supposed to take more than a few dozen milliseconds. |
I'm a bit confused about whether you (still) have a problem. If you need help with the warp syncing being stuck, I would need the |
Hi, I manage to get this working, however connecting to the local node seems to be slower than connecting to the polkadot chain. I have attached here the logs: 2023-05-30T15:36:08.955641Z INFO smoldot: Chain initialization complete for dev.
2023-05-30T15:41:13.019192Z INFO runtime-dev: Successfully compiled runtime. Spec version: 9410. Size of `:code`: 1.4 MiB.
// TX submitted now
2023-05-30T15:41:13.769082Z DEBUG json-rpc-dev: PendingRequestsQueue <= {"jsonrpc":"2.0","id":"5", "method":"author_submitAndWatchExtrinsic","params":["0x2d028400d43593c715…
// TX included in chain
2023-05-30T15:41:42.700032Z DEBUG json-rpc-dev: JSON-RPC <= {"jsonrpc":"2.0","id":"7","result":"0x2400000000000000e2e8793a55170200000001000000000003c8ac2097c6f9…
Balance transfer success: Transfer { from: AccountId32([212, 53, 147, 199, 21, 253, 211, 28, 97, 20, 26, 189, 4, 169, 159, 214, 130, 44, 133, 88, 133, 76, 205, 227, 154, 86, 132, 231, 165, 109, 1 .. Is there anything we could do for the local node (spawn as above) to make things a bit faster? |
Is your CPU by any chance extremely busy constantly up until the transaction has been submitted? Or does it have idle times? |
The CPU consumption is at 100% for 1 CPU, while the others are idle all the time, verified with |
Another question is: did you truncate the logs between |
Hey, I did enable tracing for my example run, however, the transaction gets dropped after a while resulting in some 271 MB of logs. I expect this to be related to the large number of logs. I ended up running Will retry the experiment fetching those two PRs, thanks! 🙏 |
I've updated the full logs at my drive.
thread 'async-std/runtime' panicked at 'explicit panic', /home/lexnv/workspace/smoldot/lib/src/sync/all.rs:1791:17
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
The tx was submitted successfully in around 5 minutes |
It seems that the sync service is completely frozen for two minutes. At the same time, the logs are filled at 99.9% by log lines from wasmtime and cranelift. If you remove all the lines from wasmtime and cranelift from the 92.2 MiB log file, there's only 87 kiB left. So my hypothesis is that wasmtime indeed takes 2 whole minutes to compile the runtime, just because it spends 99% of its CPU time writing logs. Please try again with |
I've uploaded another file to the drive |
What happens is: smoldot connects, warp syncs, downloads the runtime, compiles it (which takes 221 seconds!), then tries to download other things from the peer, and the peer disconnects smoldot in return. Then smoldot reconnects, and the entire thing repeats. I don't know why Substrate disconnects smoldot. I would need the However my main hypothesis is still that your CPU is completely overloaded for some reason. Are you compiling in debug maybe? If so you really should try |
It is now possible to no longer use |
The light-client with default-features disabled seems to take a bit longer than before: 2023-06-13T14:43:33.413835Z TRACE light-client: Create light client
2023-06-13T14:53:08.541603Z TRACE async_io::driver: block_on()
2023-06-13T14:53:08.541681Z TRACE light-client: Submit request "{\"jsonrpc\":\"2.0\",\"id\":\"1\", \"method\":\"chain_getBlockHash\",\"params\":[0]}"
2023-06-13T14:53:08.541714Z TRACE light-client-background: Received register message Request { id: "1", sender: Sender { inner: Some(Inner { state: State { is_complete: false, is_closed: false, is_rx_task_set: false, is_tx_task_set: false } }) } }
2023-06-13T14:53:08.541727Z DEBUG json-rpc-polkadot: PendingRequestsQueue <= {"jsonrpc":"2.0","id":"1", "method":"chain_getBlockHash","params":[0]}
2023-06-13T14:53:08.542210Z TRACE polling::epoll: add: epoll_fd=6, fd=8, ev=Event { key: 18446744073709551615, readable: false, writable: false }
2023-06-13T14:53:08.542227Z TRACE polling::epoll: add: epoll_fd=6, fd=7, ev=Event { key: 18446744073709551615, readable: true, writable: false }
2023-06-13T14:53:08.542234Z TRACE polling::epoll: new: epoll_fd=6, event_fd=7, timer_fd=Some(8)
2023-06-13T14:53:08.542289Z TRACE polling: Poller::notify()
2023-06-13T14:53:08.542296Z TRACE polling::epoll: notify: epoll_fd=6, event_fd=7
2023-06-13T14:53:08.542317Z TRACE async_io::driver: main_loop: waiting on I/O
2023-06-13T14:53:08.542343Z TRACE async_io::reactor: process_timers: 0 ready wakers
2023-06-13T14:53:08.542352Z TRACE polling: Poller::wait(_, Some(4.999827661s))
2023-06-13T14:53:08.542360Z TRACE polling::epoll: wait: epoll_fd=6, timeout=Some(4.999827661s)
2023-06-13T14:53:08.542367Z TRACE polling::epoll: modify: epoll_fd=6, fd=8, ev=Event { key: 18446744073709551615, readable: true, writable: false }
2023-06-13T14:53:08.542375Z TRACE polling::epoll: new events: epoll_fd=6, res=1
2023-06-13T14:53:08.542381Z TRACE polling::epoll: modify: epoll_fd=6, fd=7, ev=Event { key: 18446744073709551615, readable: true, writable: false }
2023-06-13T14:53:08.542390Z TRACE async_io::reactor: process_timers: 0 ready wakers
2023-06-13T14:53:08.542395Z TRACE async_io::reactor: react: 0 ready wakers
2023-06-13T14:53:08.542403Z TRACE async_io::driver: main_loop: sleeping for 50 us
2023-06-13T14:53:08.542423Z TRACE polling: Poller::notify()
2023-06-13T14:53:08.542432Z TRACE polling::epoll: notify: epoll_fd=6, event_fd=7
2023-06-13T14:53:08.542515Z TRACE async_io::driver: main_loop: waiting on I/O
2023-06-13T14:53:08.542523Z TRACE async_io::reactor: process_timers: 0 ready wakers
2023-06-13T14:53:08.542529Z TRACE polling: Poller::wait(_, Some(4.999642489s))
2023-06-13T14:53:08.542534Z TRACE polling::epoll: wait: epoll_fd=6, timeout=Some(4.999642489s)
2023-06-13T14:53:08.542540Z TRACE polling::epoll: modify: epoll_fd=6, fd=8, ev=Event { key: 18446744073709551615, readable: true, writable: false }
..
2023-06-13T14:53:15.623893Z INFO sync-service-polkadot: GrandPa warp sync finished to #15951164 (0xc1e5…7054)
..
2023-06-13T14:53:15.624945Z INFO runtime-polkadot: Finalized block runtime ready. Spec version: 9370. Size of `:code`: 1.3 MiB. There is an ~10 minute gap in logs, might it be related to #661? 🤔 |
I don't really have any hypothesis except that you're reaching the computation limits of your machine. |
It seems to work just fine on arm mac M1, will close this for now as it might be a limitation of AMD/my box Thanks for the help in debugging this and improving the code! |
Hi, I'm trying to integrate smoldot-light in paritytech/subxt#965, locally on my mac M1 it works perfectly, on AMD x86 the light client seems stuck on:
Im starting my polkadot nodes the same, and Im using the same chain spec
Im using polkadot version on x86 cbd5330173, compared to 6dc9e84dde2 for mac.
Do you know what might cause this warning?
The text was updated successfully, but these errors were encountered: