Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expected v2 deprecation behavior in HA clusters #17009

Open
serathius opened this issue Nov 23, 2023 · 9 comments
Open

Expected v2 deprecation behavior in HA clusters #17009

serathius opened this issue Nov 23, 2023 · 9 comments

Comments

@serathius
Copy link
Member

serathius commented Nov 23, 2023

What would you like to be added?

I would like to discuss how etcd v3.6 should behave with regards to v2 store API.

Background

v2 API was deprecated in etcd v3.4, but could be still used as long as you provided --enable-v2 flag. Didn't change in v3.5, however for v3.6 we are planning for total removal. Expected behavior is that when upgrading to v3.6, etcd will panic if there is any v2 data still left. More in #12913

User can do two things:

  • Halt the upgrade and reconsider what to do with v2 state.
  • Run etcd with --v2-deprecation=write-only-drop-data which is expected to delete the local v2 data.

Problem

What happens if in HA clusters during upgrade/downgrade (v3.6 supports downgrade to v3.5), if user forgets that etcd v3.5 member still uses --enable-v2 and introduces a v2 change to cluster. This is worrying as a single member could take down whole cluster. Fixing this would require to reconfigure whole cluster to run with --v2-deprecation=write-only-drop-data

Options:

  • Check the snapshot and WAL for v2 data only on bootstrap, skip it later. It will lead to inconsistency on v2 state.
  • Have v3.6 members reject v2 proposals. Not sure this is possible, we as v3.5 can still become a leader. I would be careful about changing logic for leader eligibility.
  • Do nothing

Options rejected:

  • Downgrades require enabling --v2-deprecation=write-only-drop-data. Doesn't solve upgrade case.
  • Document that users should double and triple check that they are not using --enable-v2. Doesn't seem user friendly.

Why is this needed?

Want to make sure this is properly discussed, understood and documented.

@serathius serathius changed the title Expected v2 deprecation behavior Expected v2 deprecation behavior in HA clusters Nov 23, 2023
@serathius
Copy link
Member Author

cc @ahrtr @jmhbnz @wenjiaswe

@wenjiaswe
Copy link
Contributor

@chaochn47
Copy link
Member

Have v3.6 members reject v2 proposals. Not sure this is possible, we as v3.5 can still become a leader. I would be careful about changing logic for leader eligibility.

Another option, proposals could be accepted but apply is rejected just like no space applier or corruption applier.

@siyuanfoundation
Copy link
Contributor

Have v3.6 members reject v2 proposals. Not sure this is possible, we as v3.5 can still become a leader. I would be careful about changing logic for leader eligibility.

Another option, proposals could be accepted but apply is rejected just like no space applier or corruption applier.

rejecting apply would not stop the server ack commit index progress, this would give the client a false sense of HA if v3.5 is the leader.

@serathius
Copy link
Member Author

Have v3.6 members reject v2 proposals. Not sure this is possible, we as v3.5 can still become a leader. I would be careful about changing logic for leader eligibility.

Another option, proposals could be accepted but apply is rejected just like no space applier or corruption applier.

This is the Check the snapshot and WAL for v2 data only on bootstrap, skip it later. It will lead to inconsistency on v2 state. case. I used word skip instead of reject but meant the same thing. They are just treated as no-op.

@serathius
Copy link
Member Author

rejecting apply would not stop the server ack commit index progress, this would give the client a false sense of HA if v3.5 is the leader.

It's less about HA, more about inconsistency if user ever aborted upgrade. etcd v3.6 already doesn't expose v2 API so there is no HA for it. The inconsistency happens if user reverted the upgrade then the member that was temporarily v3.6 would have the same data to one that stayed v3.5 all the time.

cc @ahrtr

@ahrtr
Copy link
Member

ahrtr commented Dec 4, 2023

  • We just need to clearly document this (must migrate the data to v3 and remove --enable-v2 before upgrade to 3.6), especially highlight it in the Upgrade etcd from 3.5 to 3.6 guide.
  • As far as I know, the only case that was still using v2 was flannel. It also upgraded etcd client to v3 in v0.18.0 more than a year ago.
  • In the worst case, if a member is still using --enable-v2 or has v2 data, etcd will panic on bootstrap, the data file will keep unchanged. Users can take care of it separately.

Copy link

stale bot commented Mar 17, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@ahrtr
Copy link
Member

ahrtr commented Dec 4, 2024

We will cover this using document (tracked in etcd-io/website#926) and also v2store data checking tool (tracked in #18993) .

@serathius do you have any comment? If no, can we close this ticket?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

5 participants