Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iterator time is exponential in number of Cache wrappings #10310

Closed
4 tasks
yihuang opened this issue Oct 6, 2021 · 12 comments · Fixed by #13881
Closed
4 tasks

Iterator time is exponential in number of Cache wrappings #10310

yihuang opened this issue Oct 6, 2021 · 12 comments · Fixed by #13881
Labels
C:Store T: Performance Performance improvements

Comments

@yihuang
Copy link
Collaborator

yihuang commented Oct 6, 2021

Summary of Bug

evmos/ethermint#626
benchmark: evmos/ethermint#627
code of concern:

func (iter *cacheMergeIterator) skipUntilExistsOrInvalid() bool {

In ethermint we use a nested cache context stack to support the Snapshot and RevertToSnapshot APIs for EVM.
When EVM call Snapshot, we push a new cache context based on the top one, when RevertToSnapshot, we discard the cache contexts, after the execution, we commit all the cache contexts.
What we found is when the context stack is deep, it's extremely slow to create an iterator on it.

Version

v0.44.1

Steps to Reproduce


For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
@ValarDragon
Copy link
Contributor

ValarDragon commented Oct 7, 2021

We should get a pprof output out, and analyze whats going on!

Tried getting a pprof, but I can't build the ethermint library on machine...

But also, sounds like a misdesign that should be fixed to just be wrapping the store 34 times as mentioned in that PR. (Theres lots of overheads in doing so, outside of anything in the CacheKVStore)

Can you either get a pprof output or make a simplified PR here and I can try to get one? (Just add -cpuprofile cpu.out to your benchmark invocation, and then upload that along with the generated binary that ends in .test. You can investigate teh output with go tool pprof cpu.out)

@ValarDragon
Copy link
Contributor

Ah in the code, this is definitely exponential time wrt to CacheKVStore wrapping depth. https://github.com/cosmos/cosmos-sdk/blob/master/store/cachekv/store.go#L168-L184

Its creating its own iterator over its own cache, and over its parents. Its parent will then create an iterator over its own cache, and its parents. This causes a branch factor of 2 at every depth, hence time will follow 2^n.

Can you explain more about why your nesting the caches?

For a nested cache, you basically want one iterator at every level, but implementing that would require a native concept of 'depth' in the stores, and ways of distinguishing thin store layers (gasKV) and 'thick' ones e.g. CacheKVStore.

@ValarDragon ValarDragon changed the title Problem: cacheMergeIterator complexity curve is extremely steep against the depth of nested cache contexts Iterator time is exponential in number of Cache wrappings Oct 9, 2021
@ValarDragon ValarDragon added C:Store T: Performance Performance improvements labels Oct 9, 2021
@yihuang
Copy link
Collaborator Author

yihuang commented Oct 11, 2021

Ah in the code, this is definitely exponential time wrt to CacheKVStore wrapping depth. https://github.com/cosmos/cosmos-sdk/blob/master/store/cachekv/store.go#L168-L184

Its creating its own iterator over its own cache, and over its parents. Its parent will then create an iterator over its own cache, and its parents. This causes a branch factor of 2 at every depth, hence time will follow 2^n.

👍

Can you explain more about why your nesting the caches?

Nested caches are convenient to implement the nested exception revert, which is needed when we embed the EVM. There are other ways to do it, but nested caches are the simplest one.
And I think the complexity of get/set behavior on deeply nested caches are predictable, set is constant, get is O(N) where N is the depth of the cache stack.
The issue here is just we accidentally do an iteration without being aware of the computational complexity. In our case, we can flatten the stack before doing the iteration to fix the issue at hand.

For a nested cache, you basically want one iterator at every level, but implementing that would require a native concept of 'depth' in the stores, and ways of distinguishing thin store layers (gasKV) and 'thick' ones e.g. CacheKVStore.

@tomtau
Copy link
Contributor

tomtau commented Nov 9, 2021

@ValarDragon for reference, pprof from a related issue:
evmos/ethermint#710 (comment)

@robert-zaremba
Copy link
Collaborator

@yihuang will you be able to handle this issue?

@ValarDragon
Copy link
Contributor

ValarDragon commented Dec 3, 2021

I feel like this is something we should emit a warning for, but otherwise get downstream things to just not do this access pattern.

Also IAVL cache increases significantly helps here.

@yihuang
Copy link
Collaborator Author

yihuang commented Dec 4, 2021

@yihuang will you be able to handle this issue?

In ethermint we already changed our approach to avoid a deep cache context stack.
I haven't looked into the root cause, I'm not sure if it's worthwhile to put too much effort into this, maybe we just need to let people aware of the issue and avoid deep cache context stack.

@robert-zaremba
Copy link
Collaborator

So, normally this is not an issue in the core SDK. I'm not sure how we could log a warning without tracking the nested level: we would need to add one more parameter to cache pass it's increment when we create a new cache.

@yihuang
Copy link
Collaborator Author

yihuang commented Dec 17, 2021

So, normally this is not an issue in the core SDK. I'm not sure how we could log a warning without tracking the nested level: we would need to add one more parameter to cache pass it's increment when we create a new cache.

maybe just put sth in the comments for now?

@ValarDragon
Copy link
Contributor

I think my prior comment was incorrect, I don't see how this is exponential time in the number of cache wrappings.
Each layer does two iterations, one at the structs local cache, and one in the parent.

The parent would then cause two iterations. The local cache would cause no further branching factors, as its just over this layers local sorted cache.

@yihuang
Copy link
Collaborator Author

yihuang commented Apr 21, 2022

I think my prior comment was incorrect, I don't see how this is exponential time in the number of cache wrappings. Each layer does two iterations, one at the structs local cache, and one in the parent.

The parent would then cause two iterations. The local cache would cause no further branching factors, as its just over this layers local sorted cache.

I have wrote a benchmark for this before: https://github.com/tharsis/ethermint/blob/release/v0.7.x/x/evm/keeper/benchmark_test.go#L100
The result:

$ go test -v ./x/evm/keeper -v -run="^$" -bench="BenchmarkDeepContextStack"
BenchmarkDeepContextStack1
BenchmarkDeepContextStack1-16     	  236114	      5047 ns/op
BenchmarkDeepContextStack10
BenchmarkDeepContextStack10-16    	      30	  34824724 ns/op
BenchmarkDeepContextStack13
BenchmarkDeepContextStack13-16    	       1	2250838178 ns/op

not sure if it's exponential, but the curve is steep.

@yihuang
Copy link
Collaborator Author

yihuang commented Nov 16, 2022

I did some profile and optimization.

before:

BenchmarkDeepContextStack1-12     	  118875	      9095 ns/op
BenchmarkDeepContextStack10-12    	      21	  55358983 ns/op
BenchmarkDeepContextStack13-12    	       1	3444975408 ns/op

after:

BenchmarkDeepContextStack1-12     	  127264	      8550 ns/op
BenchmarkDeepContextStack10-12    	   25220	     48912 ns/op
BenchmarkDeepContextStack13-12    	   19500	     65458 ns/op

yihuang added a commit to yihuang/cosmos-sdk that referenced this issue Nov 16, 2022
Closes: cosmos#10310
Solution:
- cache the valid status
tac0turtle pushed a commit that referenced this issue Dec 16, 2022
Co-authored-by: Aleksandr Bezobchuk <[email protected]>
Co-authored-by: Marko <[email protected]>
Closes #10310
mergify bot pushed a commit that referenced this issue Dec 16, 2022
Co-authored-by: Aleksandr Bezobchuk <[email protected]>
Co-authored-by: Marko <[email protected]>
Closes #10310

(cherry picked from commit cbee1b3)

# Conflicts:
#	CHANGELOG.md
#	go.mod
#	go.sum
#	simapp/go.mod
#	simapp/go.sum
#	store/cachekv/store.go
#	store/cachekv/store_test.go
#	tests/go.mod
#	tests/go.sum
mergify bot pushed a commit that referenced this issue Dec 16, 2022
Co-authored-by: Aleksandr Bezobchuk <[email protected]>
Co-authored-by: Marko <[email protected]>
Closes #10310

(cherry picked from commit cbee1b3)

# Conflicts:
#	CHANGELOG.md
#	go.mod
#	store/cachekv/store.go
mergify bot pushed a commit that referenced this issue Dec 16, 2022
Co-authored-by: Aleksandr Bezobchuk <[email protected]>
Co-authored-by: Marko <[email protected]>
Closes #10310

(cherry picked from commit cbee1b3)

# Conflicts:
#	CHANGELOG.md
#	go.mod
#	go.sum
#	simapp/go.mod
#	simapp/go.sum
#	store/cachekv/memiterator.go
#	store/cachekv/store.go
#	store/cachekv/store_test.go
#	tests/go.mod
#	tests/go.sum
mmsqe pushed a commit to mmsqe/cosmos-sdk that referenced this issue Dec 28, 2022
Co-authored-by: Aleksandr Bezobchuk <[email protected]>
Co-authored-by: Marko <[email protected]>
Closes cosmos#10310

(cherry picked from commit cbee1b3)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C:Store T: Performance Performance improvements
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants