Resolve performance bottleneck for querying epoch ownership #5948

octol · 2020-05-07T14:02:59Z

The babe_epochAuthorship RPC call can potentially take a long time to complete. The code that was originally designed for block authoring is sub-optimal for querying all the slots at once and is causing noticeable performance problems

Idea:

We should break apart authorship::claim_slot in Babe to be able to take a specific key to try

This should be resolved before we can merge paritytech/polkadot#1065

The text was updated successfully, but these errors were encountered:

burdges · 2020-05-07T14:56:07Z

Why do we loop over keys? Ala this flat_map?

How does it identify the BABE key from the keystore there? It should find only the current one, yes? We've no reason to sign under inactive keys.

Are we rotating BABE keys often? We need not rotate them often really, but maybe some people prefer to do so.

vrf_sign_after_check costs roughly 20 microseconds to fail, or roughly 35 microseconds to succeed, roughly twice as long as doing an ed25519 signature. You can reduce the succeed case if you need only the output ala #5876 not the signature. It should not really be called in a loop over keys though, well not unless you've one substrate instance running many validators, which sucks for other reasons.

There is another performance hit in how we compute the randomness for each BABE epoch. We currently accumulate a Vec and hash it all at once. We could iteratively do the hashing, but the cost of increasing header size by 32 bytes, so maybe this Vec actually makes the most sense, but if you recompute it in a loop then it'll suck, and if you must pull it from slow storage in a loop then it'll suck even more.

andresilva · 2020-05-07T18:31:53Z

Currently the BABE implementation supports authoring under multiple keys on the same node. So we check the current authority set against all matching keys in the keystore (usually just one or none, since people don't normally run multiple BABE authorities on the same node).

The problem here is that for this API that calculates slot assignments we are fetching (and parsing) the keys from the keystore for each slot we are trying, the bottleneck is actually parsing the keys. We're essentially doing:

for slot in epoch {
  all_local_keys = fetch_all_matching_keys_from_keystore();
  for key in all_public_keys {
    try_to_claim_slot()
  } 
}

Whereas we should do:

all_local_keys = fetch_all_matching_keys_from_keystore();
for slot in epoch {
  for key in all_public_keys {
    try_to_claim_slot()
  } 
}

After we do this change I suspect we will no longer have any performance bottleneck.

There is another performance hit in how we compute the randomness for each BABE epoch. We currently accumulate a Vec and hash it all at once. We could iteratively do the hashing, but the cost of increasing header size by 32 bytes, so maybe this Vec actually makes the most sense, but if you recompute it in a loop then it'll suck, and if you must pull it from slow storage in a loop then it'll suck even more.

I guess that would be advantageous since we wouldn't have to maintain a large pool of data in runtime storage, although this vec is bounded by epoch length so I think it's working out OK.
The current epoch randomness is always pulled from memory IIRC (since it is announced 1 epoch in advance), so here we just fetch the epoch randomness and then check all epoch slot assignments against it.

octol added I9-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task. M4-core labels May 7, 2020

octol self-assigned this May 7, 2020

octol mentioned this issue May 7, 2020

Enable Babe RPC for getting epoch authorship paritytech/polkadot#1065

Merged

octol mentioned this issue May 9, 2020

Don't repeatedly lookup keys in babe_epochAuthorship rpc function #5962

Merged

bkchr closed this as completed in #5962 May 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resolve performance bottleneck for querying epoch ownership #5948

Resolve performance bottleneck for querying epoch ownership #5948

octol commented May 7, 2020

burdges commented May 7, 2020

andresilva commented May 7, 2020

Resolve performance bottleneck for querying epoch ownership #5948

Resolve performance bottleneck for querying epoch ownership #5948

Comments

octol commented May 7, 2020

burdges commented May 7, 2020

andresilva commented May 7, 2020