Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrading to v0.44 breaks vested accounts #10591

Closed
4 tasks
RiccardoM opened this issue Nov 22, 2021 · 11 comments · Fixed by #10608
Closed
4 tasks

Upgrading to v0.44 breaks vested accounts #10591

RiccardoM opened this issue Nov 22, 2021 · 11 comments · Fixed by #10608
Assignees

Comments

@RiccardoM
Copy link
Contributor

RiccardoM commented Nov 22, 2021

Summary of Bug

Upgrading a chain to v0.44 breaks all the vested accounts that delegated some tokens.

Version

v0.44.3

Steps to Reproduce

  1. Run a chain using v0.42.9 that has some vested accounts
  2. Delegate some tokens using the vested accounts.
    3 Perform an on-chain upgrade to v0.44.3
  3. Try performing an operation using the vested accounts

All the vested accounts will now be unusable due to the now empty delegated_vesting field.

Breakdown

After our mainnet upgrade, some people from the community came to us saying that all their operations were failing with the following error:

panic message redacted to hide potentially sensitive system info: panic

They also noticed that their vesting accounts had an empty delegated_vesting field even though, previous to the upgrade, they used to have some of their vesting tokens delegated.

After digging things for a while, I think I was able to reconstruct the whole history of this bug.

With PR #8865 the v0.44 migration script for vesting accounts was added to the codebase . This migration script runs through all the vesting accounts and, for each account, it reads all the delegations and unbonding delegations for such account. The pecularity here is that this is done using the gRPC querier of the x/staking module (see here and here).

Now, when upgrading to v0.44 one other thing that is migrated is the x/staking state. Particularly, as we can see here, we migrate the delegation, redelegation and unbonding delegations keys.

This causes a major problem: if the x/auth module is migrated before the x/staking module, during the migration process we will not be able to query the delegations and unbonding delegations due to the fact that they would be still stored using the old keys. But if the x/auth is upgraded after the x/staking module, then everything would work properly.

Initially I thought that the bug could then be caused by non-determinism while iterating over the module upgrades map. But then I found out that some work on the subject had already been done inside PR #10189. Reading though the PR, I noticed that this same problem related to the x/auth and x/staking migrations already came up inside a comment (#10189 (comment)):

I just tried to debug-log the order of migrations and what's being passed to TrackDelegation during the auth migration:

  • if "staking" is migrated before "auth", then balance and delegations will have values;
  • if "auth" is migrated before "staking", then balance and delegations will be empty.

Unfortunately, during that discussion, it was decided that this was not important even though it was clearly stated that it could represent a problem (#10189 (comment)):

auth is more fundamental to me too, so should be first. But unfortunately in #8865 auth has vesting accounts which tracks delegations. So auth also depends on staking. Anyways, it's messy, and should be cleaned up by #9958

Let's put automerge then for this PR.

This has lead to the current situation: all chains upgrading to v0.44 will break their state and have no vesting account working anymore.

The solution for chains to fix this is to have a second on-chain upgrade (after the first one) that iterates all the vesting accounts and runs the MigrateAccount method once again. This would theoretically fix the problems as the x/staking state is already migrated and thus the delegations and unbonding delegations can be queried properly.


For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
@aaronc
Copy link
Member

aaronc commented Nov 22, 2021

@AmauryM can you take a look at this?

@alexanderbez
Copy link
Contributor

The summary sounds correct. Good work @RiccardoM. It's not clear to me why we didn't address the concern raised in #10189 (comment)? Do you know why?

@RiccardoM
Copy link
Contributor Author

Do you know why?

If you are asking me, honestly I have no clue. Maybe it was too complicated or overlooked by error?
Anyway this could be also solved by forcing the x/staking module to be set before the x/auth module inside the OrderInitGenesis of the application so that before performing the upgrade:

  1. search the x/staking index inside the OrderInitGenesis slice
  2. search the x/auth index inside the OrderInitGenesis slice
  3. swap the two modules' order
  4. perform the upgrade

@alexanderbez
Copy link
Contributor

We've already addressed the fix for chains that are about to upgrade. E.g.: https://github.com/cosmos/gaia/blob/0dd70958a10b24703c09c43745611779f0e7970f/app/app.go#L572-L593

However, this is still alarming due to the nature of how fragile these upgrades are in terms of order of operations. A trivial approach would be to allow developers to state what order they want the migrations to happen in (per module).

cc @AmauryM @robert-zaremba

@amaury1093
Copy link
Contributor

amaury1093 commented Nov 24, 2021

I remember having no strong intuition/understanding on which one between auth and staking should go first. The correct solution here IMO is to refactor x/vesting to remove the cyclic dependency between staking and auth (and bank).

Anyways, for a SDK-side quick fix, I can think of 2 solutions:

  1. always run auth last. It can be done here.
  2. as Bez proposed, let the user decide the migration order. Like a OrderUpgrade array.

While 2 seems more flexible, I think we would need to rely on the fact that all app developers knows that auth should go last, which is risky. For now, I prefer 1.

Note: this fix is not state-machine breaking, and can be backported.

@robert-zaremba robert-zaremba self-assigned this Nov 24, 2021
@robert-zaremba
Copy link
Collaborator

On Monday I was discussing to add change the migration map and add a priority parameter. Map is good because it allows easily to self register the modules.

@robert-zaremba
Copy link
Collaborator

Here is a list of tasks we are planning to do now:

  1. (this issue) see if we need to recover the vested accounts. Do we need to provide any support to this? @RiccardoM this bug happened in your testnet only?
  2. fix: In place migration: auth must be migrated after staking #10606
  3. Support migration ordering for in place migration #10604

@RiccardoM
Copy link
Contributor Author

  1. (this issue) see if we need to recover the vested accounts. Do we need to provide any support to this? @RiccardoM this bug happened in your testnet only?

No, it happened inside our mainnet which caused 289 investors having their accounts locked. I have already found out how to solve it though.

@robert-zaremba
Copy link
Collaborator

@RiccardoM ouch, sorry for that :(
Could you share the solution?

I was thinking about a migration function which will query an old state and recreate the accounts.

@robert-zaremba
Copy link
Collaborator

Closed by: #10608

@RiccardoM please reopen if you need support to this issue.

@RiccardoM
Copy link
Contributor Author

RiccardoM commented Nov 26, 2021

@robert-zaremba Recovery it's as simple as re-running the v043.MigrateAccount method (see here). Since the x/staking state has already been migrated, re-running the x/auth migration after that will allow the auth module to get the correct staking data, de-facto setting the correct delegated vesting amounts back. No need to manual nor old-state queries.

I think you should change the release notes from:

Recovery, unfortunately, is complicated and either involves manually overwriting it or querying an old state.

to

Recovery, fortunately, can be done by simply re-running the v0.43 auth migration process after the state has been corrupted. This will properly recover the delegated vesting amounts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants