Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bank vat updates for hundreds of accounts in a block is slow #3629

Closed
dckc opened this issue Aug 9, 2021 · 6 comments
Closed

bank vat updates for hundreds of accounts in a block is slow #3629

dckc opened this issue Aug 9, 2021 · 6 comments
Assignees
Labels
bug Something isn't working cosmic-swingset package: cosmic-swingset needs-design performance Performance related issues testnet-problem problem found during incentivized testnet
Milestone

Comments

@dckc
Copy link
Member

dckc commented Aug 9, 2021

Describe the bug

With a simulated threshold of 8M computrons per block, we project deliveries from agorictest-16 may have been delayed up to 72 blocks #3459 (comment) . Looking into what's going on in those blocks @michaelfig and I saw traffic with a few hundred account updates per block.

Without a limit on compute per block (#3459 ), this caused blocks up to 32 seconds.

To Reproduce

Presumably the agorictest-16 behavior is reproducible.

Expected behavior

???

Should there be back-pressure that causes the transactions to sit longer in the mempool or get dropped?

Or can we optimize this situation?

Platform Environment

agorictest-16

Additional context

#3459

cc @warner

@dckc dckc added the bug Something isn't working label Aug 9, 2021
@dckc dckc added performance Performance related issues testnet-problem problem found during incentivized testnet labels Aug 9, 2021
@michaelfig
Copy link
Member

Should there be back-pressure that causes the transactions to sit longer in the mempool or get dropped?

Backpressure is a useful goal, but this particular case of Cosmos bank sends resulting in balance notifications to the JS layer is probably not going to be reduced, especially on a production chain.

Or can we optimize this situation?

@dckc and I thought that we could have a way to optimise that involves polling from JS to the Cosmos vbank module. This is probably a useful idea since our @agoric/notifier JS module is optimised for the case when the client is not always active.

If we informed the Cosmos layer when the JS layer is waiting on a fresh balance update, and dropped updates whose JS clients are not keeping up, we could reduce the amount of traffic. Especially in the case where the JS layer is not waiting (i.e. there is no JS client because it hasn't been provisioned, or the JS client is not currently running).

vat-bank.js could wrap the notifier's getUpdateSince method for each particular Cosmos address to poll and possibly subscribe to the Golang vbank module:

  1. synchronously poll once for the current balances with vbank.getCurrentBalance(address). vat-bank could compare the returned values against the current information when deciding whether to update the account's notifier.
  2. if the notifier is not yet updated, subscribe to only the very next change in that address's balances with vbank.getNextBalance(address)

@JimLarson, @warner, @erights, any thoughts on this solution?

@dckc
Copy link
Member Author

dckc commented Aug 11, 2021

Up to 5000 accounts updated per delivery

a bit more data; see also #3459 (comment)

In just the first 100 deliveries in blocks 68817 to 69707:

image

count      44.000000
mean      887.272727
std      1071.550981
min       221.000000
25%       483.000000
50%       507.000000
75%       770.000000
max      5000.000000
  • 5fbc01c feat: bank_trace etc

@Tartuffo
Copy link
Contributor

Decision: Change the vBank notifier to have a smaller (than unlimited!) batch size. Aim for a few KB, 10 KB max. A KB is about 200 accounts being updated. Want to do about 10 - 100 pieces of work per message.

@dckc
Copy link
Member Author

dckc commented Aug 24, 2022

test scenario: do a bunch of cosmos-level sends. tx bank send A B

@dckc dckc assigned arirubinstein and unassigned JimLarson Aug 24, 2022
@Tartuffo Tartuffo assigned JimLarson and unassigned arirubinstein Aug 31, 2022
@JimLarson
Copy link
Contributor

Did some testing - created a single Tx that sent to 5K destination addresses on ollinet. At the time, typical time between blocks was 5-6 seconds. One or two blocks after one of these big Tx's, there was a single block where time went up to almost 20 seconds, then went back to normal. The transaction was over 1MB (in JSON) and racked up 100 million in gas.

The command-line tools balked at creating a Tx with 10k sends in it.

My thought is that this is a whole lot of effort for a pretty mild disruption, and a nominal Tx size fee might be enough to dissuade such an attack - or at least make it less attractive than other avenues for causing grief.

@dckc
Copy link
Member Author

dckc commented Sep 8, 2022

Thanks for testing, @JimLarson .

@Tartuffo @dtribble and @arirubinstein concur that this level of performance is OK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cosmic-swingset package: cosmic-swingset needs-design performance Performance related issues testnet-problem problem found during incentivized testnet
Projects
None yet
Development

No branches or pull requests

5 participants