Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken optimised wasmd build #536

Closed
maurolacy opened this issue May 10, 2024 · 6 comments · Fixed by #541
Closed

Broken optimised wasmd build #536

maurolacy opened this issue May 10, 2024 · 6 comments · Fixed by #541
Assignees
Milestone

Comments

@maurolacy
Copy link
Contributor

maurolacy commented May 10, 2024

When compiling wasmd with optimisation (via CGO_CFLAGS="-O" go build) the resulting binary crashes with a panic when iterating over a range of map entries from a smart contract.

This is part of the stack trace:

goroutine 459 [running, locked to thread]:
runtime.throw({0x3335f99?, 0xc0057bc178?})
  runtime/panic.go:1077 +0x5c fp=0xc0057bc128 sp=0xc0057bc0f8 pc=0x446fbc
runtime.sigpanic()
  runtime/signal_unix.go:875 +0x285 fp=0xc0057bc188 sp=0xc0057bc128 pc=0x45e1c5
github.com/CosmWasm/wasmvm/internal/api.cNext({0x7f5f167fabd8?, 0x0?}, 0x7f5f00008f70?, 0x7f5f167fab88?, 0x7f5f167fabe0?, 0x413665?, 0xc004b86340?)
  [github.com/CosmWasm/[email protected]/internal/api/callbacks.go:291](mailto:github.com/CosmWasm/[email protected]/internal/api/callbacks.go:291) +0xb0 fp=0xc0057bc228 sp=0xc0057bc188 pc=0x1fafc90
_cgoexp_5bd3f86fb1b2_cNext(0x7f5f167faae0)
...

A SEGV when iterating over and calling the "next" method. Only happens on Linux machines AFAIK.

The offending line turns out to be https://github.com/CosmWasm/wasmvm/blob/v1.5.2/internal/api/callbacks.go#L291

where accessing a uninitialised / invalid errOut vector causes the crash.

Funnily, removing the check over errOut fixes / prevents the bug from triggering, as this is the only instance the variable is being used in that code path.
This seems to be caused by optimisations removing some initialisation. Perhaps because the variable is not really being used, except for the check. Perhaps because of an optimiser bug under Linux.
Needless to say, there may be other places this issue may be present as well.

Oddly, compiling wasmd with optimisation level 2 (CGO_CFLAGS="-O2") doesn't trigger the bug; which may point to an optimiser bug or similar.

Reporting this so that it is known, and can perhaps be fixed.

For reproducing, it's enough to build wasmd with CGO_CFLAGS="-O", deploy a smart contract that iterates over a range, and call that code, either through a query or execute call.
In https://github.com/maurolacy/iterator-contract there's A PoC contract to help with reproducing / triggering this under Linux.

Tested with wasmd v0.50.0 and wasmvm v1.5.2.

@maurolacy maurolacy changed the title Broken optimised build Broken optimised wasmd build May 10, 2024
@maurolacy
Copy link
Contributor Author

Related: babylonchain/babylon#642.

@webmaster128
Copy link
Member

webmaster128 commented May 10, 2024

Thank you for digging into this!

You are using a standard wasmvm build, right? No custom compilation of the library. Do you know how CGO_CFLAGS is used? As the code in question is already compiled at that point, it can only affect the linking step. Nevermind, this is Go code

@webmaster128 webmaster128 transferred this issue from CosmWasm/wasmd May 10, 2024
@webmaster128
Copy link
Member

Very strange indeed. Smells like a bug in the linker or something like that as the Go compiler alone cannot know if !(*errOut).is_none or not. That might also explain a different behaviour on Linux vs. Mac.

errOut *C.UnmanagedVector is unused (except for the .is_none check) in all of those

  • func cGet
  • func cSet
  • func cDelete
  • func cNext
  • func nextPart

I think there are two ways out here

  1. Just mark the errOut function argument as unused in the function implementations and not perform any checks
  2. Trick the optimizer into keeping the variable untouched

@maurolacy
Copy link
Contributor Author

maurolacy commented May 10, 2024

Thanks for the prompt response!

Yes, I think checking an unused parameter doesn't make much sense. And, let's hope that's the only source of these errors.

In any case, making that change looks like a safe bet.

@chipshort
Copy link
Collaborator

chipshort commented Jun 6, 2024

I have created a PR removing the is_none check, but this is a bit of a scary bug.
When printing out the errOut fields on the rust side and the go side, I get completely crazy values:
rust inside next:
UnmanagedVector { is_none: true, ptr: 0x0, len: 0, cap: 0 }
Go inside cNext:

errOut.is_none: false
errOut.ptr: 0x1c03dc8
errOut.len: 29375936
errOut.cap: 140736667359256

@maurolacy
Copy link
Contributor Author

Agreed, this looks like a bug at another level (linker, optimiser, library code).

Let's keep an eye on it; perhaps this will appear in a different form. We can re-enable optimisations on our side, to try and see if this triggers with the new version by the way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants