Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

RPC performance decimated #906

Closed
emielsebastiaan opened this issue Mar 17, 2020 · 11 comments
Closed

RPC performance decimated #906

emielsebastiaan opened this issue Mar 17, 2020 · 11 comments
Labels
I9-footprint An enhancement to provide a smaller (system load, memory, network or disk) footprint.

Comments

@emielsebastiaan
Copy link

For our Polkascan use-case we extensively query the Substrate RPC endpoints.
We notice a very significant difference in performance between v0.7.20 and v0.7.25.
Kusama v0.7.20 is approximately 8 times faster than v0.7.25.

We have run v0.7.25 with various OPTIONS:

--state-cache-size 8192000000
--max-runtime-instances 256
--db-cache=8192 

This does not make a significant difference.
In general our harvester requests the following RPCs for each and every block:

  • chain_getBlock
  • state_getStorage (various calls)

We would like to get back to the performance we had when we were using v0.7.20.
Please advise.

@bkchr
Copy link
Member

bkchr commented Mar 17, 2020

@tomusdrw did we had any big changes to the rpc crate lately?

@bkchr bkchr added the I9-footprint An enhancement to provide a smaller (system load, memory, network or disk) footprint. label Mar 17, 2020
@tomusdrw
Copy link
Contributor

I presume it's HTTP server, right?

Just checked and we bumped jsonrpc from 14.0.3 to 14.0.5. There is a change that may definitely affect that:
https://github.com/paritytech/jsonrpc/pull/518/files#diff-b40889955acdf574fe9db43bfc17372dL541

It should still though utilise all 4 threads of the runtime spawned by rpc-servers crate (part of substrate).

@emielvanderhoek would you mind checking:

  1. What is the load distribution between cores in these two different versions (i.e. are we using more cores on 0.7.20)?
  2. If you are ok compiling the client, could you check this branch:
    https://github.com/paritytech/polkadot/tree/td-old-http to see if it brings the performance back to satisfactory levels?

Also what metric do you use to measure performance? Is it average response time? Or do you mean resource utilisation (as in 0.7.20 taking 8x less resources)?

@emielsebastiaan
Copy link
Author

Yes we are using RPC over HTTP (not WS).
I have built the branched release (https://github.com/paritytech/polkadot/tree/td-old-http) and ran it. It is performing slightly better but unfortunately not anywhere near the v0.7.20 level.

I currently have a very coarse grained performance metric (I am aware of that). I'll follow-up to better define that soon.

@tomusdrw
Copy link
Contributor

@emielvanderhoek Thanks a lot for running the branch, it suggests that the issue is not fully caused by jsonrpc change, so we need to dig deeper.
It would be best to collect some performance data, but I'm not entirely sure what the best format would be?

@arkpar is it possible that some DB/cache options were changed and could be causing that. Seems that --db-cache 8G does not really fix the issue, was there anything else? Can you suggest how to best collect performance metrics? Should we try with valgrind --tool=callgrind or do you have better ideas?

@emielsebastiaan
Copy link
Author

What is the load distribution between cores in these two different versions (i.e. are we using more cores on 0.7.20)?

I will check the difference between v0.7.20 and v0.7.25. I know the machine I am running on has eight cores/threads available.

@arkpar
Copy link
Member

arkpar commented Mar 17, 2020

We need a benchmark for this.

chain_getBlock
state_getStorage

These don't invoke runtime so it can't be wasm execution.
Would be nice to get some kind of profiling report.

@arkpar is it possible that some DB/cache options were changed and could be causing that. Seems that --db-cache 8G does not really fix the issue, was there anything else? Can you suggest how to best collect performance metrics? Should we try with valgrind --tool=callgrind or do you have better ideas?

valgrind or perf
Here are some instructions for the latter:
https://rust-lang.github.io/packed_simd/perf-guide/prof/linux.html

@emielsebastiaan
Copy link
Author

emielsebastiaan commented Mar 17, 2020

Ok new info...
When I ran the branched version (https://github.com/paritytech/polkadot/tree/td-old-http) I did not run it with the --db-cache 8192 option; I omitted this option altogether. Just now I did run it with this added option and it did increase overall performance.

I will see what I can do to get real numbers (benchmark).
For now an indication of performance is:

  1. v0.7.20: "fast"
  2. v0.7.25: "slow" (~1/10 * fast)
  3. v0.7.25 with --db-cache 8192: "slow" (~1/8 * fast)
  4. v0.7.25-90bf7bc (td-old-http): "slow" (~1/6 * fast)
  5. v0.7.25-90bf7bc (td-old-http) with --db-cache 8192: "slow" (~1/3 * fast)

Like I said we currently have a coarse grained benchmark.
Our harvester works in batches of 10 blocks and fetches from HTTP-RPC 'chain_getBlock' and various 'state_getStorage'. Then it processes the data and stores it in our relational database.
Fast is defined as what we were used to (~10 blocks per second).
Slow goes all the way down to less than one block per second.

@arkpar
Copy link
Member

arkpar commented Mar 17, 2020

@emielvanderhoek Could you provide instructions to run the harvester?
https://github.com/polkascan/polkascan-pre are these up to date?

@emielsebastiaan
Copy link
Author

@arkpar I would need to check. We are currently wrapping up some grant work and hence most of that is pending a big refactor.

@arjanz
Copy link

arjanz commented Mar 31, 2020

I included a simple script what basically loops through a 1000 blocks the same way the harvester would (90% of our RPC calls are extrinsics and events), I noticed that just after a clean sync the performance of both versions are somewhat the same, I suspect the performance drops when the database grows bigger but couldn't confirm yet.

On a Python 3.6+ env run:

pip install substrate-interface
python rpc_perf_test.py http://[ip-address]:9933

rpc_perf_test.py.zip

@emielsebastiaan
Copy link
Author

Unfortunately I cannot get the v0.7.20 version to sync anymore. So it is hard to get an objective baseline for performance with this script...

HCastano added a commit to HCastano/polkadot that referenced this issue Apr 15, 2021
5330d84e CLI: naming clean-up. (paritytech#897)
f99f2225 Westend<>Rococo Headers Relay (paritytech#875)
72c9117b Use complex headers+messages relay in test deployments (paritytech#905)
48423d5b Stop recursing when creating test headers (paritytech#906)
f8586fd4 Fix outstanding bridge names. (paritytech#901)
54b683b3 Complex headers+messages Millau<->Rialto relay (paritytech#878)
c0e77ca1 fix message generator scripts (paritytech#900)
debf3a82 Use Substrate state_getReadProof RPC method to get storage proofs (paritytech#893)
c3fa7216 Support more than `u8::max_value` GRANDPA validators (paritytech#896)
e5cb87f9 Grandpa Pallet Pruning (paritytech#890)
0b6a8920 RestartNeeded is a connection error (paritytech#894)
2cf5fa26 CLI: Estimate Fee (paritytech#888)
7dace624 CLI: Send Message (paritytech#886)
f8eaecfa CLI: Encode Message (paritytech#889)
1610f868 Bump `jsonrpsee` to Alpha.3 (paritytech#892)
d665b531 Use new Cargo feature resolver (paritytech#891)
ce2ee6ed Rialto Millau Maintenance Dashboard (paritytech#881)
7c585ce8 Revert to older nightly. (paritytech#887)
73a0470e Adding GrandpaJustification custom type (paritytech#882)
b9ccea9c Install CA certificates in relay images (paritytech#880)
ec7841a2 fix widget names (paritytech#879)
REVERT: 746a4027 Accidentally committed `cargo-expand`ed code 🤦
REVERT: 1a5d09c5 Add note to more closely match `initialize` Call variant
REVERT: fdd6e6b3 Add `submit_finality_proof` mock Call variant
REVERT: 768b053e Simplify the Rococo and Westend signing params
REVERT: 62aca80e Add Westend<>Rococo variants to `relay_headers`
REVERT: 0bcb0f51 Add Westend<>Rococo variants to `init_bridge`
REVERT: 01d1305f Use mock Westend and Rococo finaltiy tx calls
REVERT: fb34b9dd Add modules for Rococo<>Westend header sync

git-subtree-dir: bridges
git-subtree-split: 5330d84e9511e38cf9d9ec765bee865fedd4b260
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
I9-footprint An enhancement to provide a smaller (system load, memory, network or disk) footprint.
Projects
None yet
Development

No branches or pull requests

6 participants