Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: bech32 library, implement a decode algorithm that doesn't check the checksum #10025

Closed
4 tasks
ValarDragon opened this issue Aug 29, 2021 · 19 comments · Fixed by cosmos/btcutil#1
Closed
4 tasks
Labels
T: Performance Performance improvements

Comments

@ValarDragon
Copy link
Contributor

Summary

Now that we are using bech32 strings in several locations within the state machine (e.g. all the proto structs for staking), we should be mindful of the efficiency. Bech32 decoding for checksum checks with the current implementation checks isn't negligible time at scale, especially if your doing a large operation over state. (E.g. in epoch based models)

Any bech32 address that makes it into state should be a correct bech32 address. Therefore, we do not need to check its checksum upon decoding. We should implement a method in our bech32 library for DecodeAssumingValidChecksum

Problem Definition

This feature helps improve performance of going to and fro 'native' data representation, and bech32 strings, for objects within the state machine.

The downside of this feature is it may add some additional thinking to relevant parts of the logic, for something that in the "standard" path of the SDK (e.g. not a large batch computation), will not be high impact.

Proposal

In the bech32 library we currently use, add a method for DecodeAssumingValidChecksum that simply omits the bech32verifychecksum call


For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
@ValarDragon ValarDragon added the T: Performance Performance improvements label Aug 30, 2021
@conr2d
Copy link
Contributor

conr2d commented Sep 10, 2021

goos: darwin
goarch: amd64
pkg: github.com/cosmos/cosmos-sdk/types/bech32
cpu: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
name iteration time
BenchmarkDecode-16 1506727 807.4 ns/op
BenchmarkDecodeAssumingValidChecksum-16 2391042 494.4 ns/op
BenchmarkDecodeUnsafe-16 2923123 410.4 ns/op

Tested three cases:

  • Decode: Normalization + ChecksumValidation
  • DecodeAssumingValidChecksum: Normalization
  • DecodeUnsafe: -

Normalization here is checking and changing address to have only lower-case letters. If we normalize address when it is stored within state machine, normalization can be omitted too.

@robert-zaremba
Copy link
Collaborator

DecodeAssumingValidChecksum. So now we need someone to update SDK to use that function when we deserialize things from state. Anything else?

@ValarDragon
Copy link
Contributor Author

Yup! Also @conr2d can you post a benchmarks with the memallocs as well? Add -benchmem to the benchmark command (That was what the bottleneck was in our osmosis benchmarks)

@conr2d
Copy link
Contributor

conr2d commented Sep 30, 2021

@ValarDragon

In original version (enigmampc/btcutil), memory allocation always happens multiple times due to string copy.
https://github.com/enigmampc/btcutil/blob/e2fb6adb2a25f868283c4f243c873ad6b4894974/bech32/bech32.go#L35-L36

By merging the latest version (btcsuite/btcutil), memory allocation only happens when a given bech32 string consists of upper case letters.
https://github.com/cosmos/btcutil/blob/a68c44d216624107f23b6c8e66704ff4ecee879a/bech32/bech32.go#L176-L179

Now all tests have 1 allocs/op.

@ValarDragon
Copy link
Contributor Author

Thats amazing! Thanks

@tac0turtle
Copy link
Member

@conr2d would you like to update the sdk?

@conr2d
Copy link
Contributor

conr2d commented Nov 17, 2021

@marbar3778 Yes, if possible. Do you mean changing decoding address from state to unsafe decode?

@tac0turtle
Copy link
Member

@marbar3778 Yes, if possible. Do you mean changing decoding address from state to unsafe decode?

yes! would you make the required changes?

@robert-zaremba
Copy link
Collaborator

@conr2d
Copy link
Contributor

conr2d commented Jan 14, 2022

@marbar3778 @robert-zaremba OK, I will check it. Thank you.

@conr2d
Copy link
Contributor

conr2d commented Feb 8, 2022

Hmm, I can add UnsafeDecodeAndConvert() to types/bech32/bech32.go, but its main usage is in types/address.go.

AccAddressFromBech32(), ValAddressFromBech32(), ConsAddressFromBech32() calls GetFromBech32() and GetFromBech32() will call DecodeAndConvert().

We cannot know in advance whether the module will call AccAddressFromBech32() to convert a bech32 string for the message (untrusted source) or from the state (must be verified already before being stored). I guess we need two methods for each conversion like AccAddressFromBech32() and UnsafeAccAddressFromBech32() so that module can decide what method to be called. This refactoring also seems to be related to #10838.

@robert-zaremba
Copy link
Collaborator

In many places we know if we deserialize address from store or from a request. So we can have functions like:

  • UnsafeAccAddressFromBech32
  • UnsafeValAddressFromBech32
  • ...

BTW: there is always a dilemma, half of my brain would prefer to put Unsafe at the end to keep the prefix by object name (AccAddress...)

@robert-zaremba
Copy link
Collaborator

Personally I prefer to keep address as bytes in the store, and don't do any verification. That would require to have 2 type of structures: one for store and one for response. I think that this is the right way to go, instead constraining ourselves (in order to use the same struct for store and response) and doing potentially bad decisions. But that's a separate topic ;)

@elias-orijtech
Copy link
Contributor

What's the status of this issue?

@tac0turtle
Copy link
Member

this was partially implemented. There is a fork but the function to skip the checksum is not being used

@elias-orijtech
Copy link
Contributor

I shall close this as implemented, under the assumption that the optimized function is used on a per-case basis.

@tac0turtle
Copy link
Member

this isnt completed, i dont believe we are using the function that doesnt check checksums? want to work on this?

@tac0turtle tac0turtle reopened this May 19, 2023
@elias-orijtech
Copy link
Contributor

I would, but surely the checksum-skipping variant shall only be used where it matters for performance? My reasoning for closing this was because that is determined on a case-by-case basis.

@tac0turtle
Copy link
Member

i have removed a few of these calls in staking and we plan on removing them more when we migrate state for now we can close this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T: Performance Performance improvements
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

5 participants