Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement raft-wal #21460

Merged
merged 47 commits into from
Jan 25, 2024
Merged

Implement raft-wal #21460

merged 47 commits into from
Jan 25, 2024

Conversation

raskchanky
Copy link
Contributor

@raskchanky raskchanky commented Jun 26, 2023

This provides a config option called raft_wal, which can be set to "true" or "false" (raft backend config is map[string]string) for optionally enabling the use of https://github.com/hashicorp/raft-wal for raft storage instead of BoltDB.

If Vault is configured to use raft-wal but it detects raft.db in the normal spot, it continues to use raft.db and logs a warning that it's ignoring the raft-wal config, similar to what Consul does.

I modified the raft test helpers to allow a config map to be passed in, and then made most of the raft tests table driven, so they can exercise both boltdb and raft-wal. Most of the changes in that test file are mechanical, just moving existing test code into a table driven setup and adding a bit of error checking.

On the subject of "why are we doing this":
There are 2 main motivations for using raft-wal instead of raft-boltdb for raft storage: performance and stability. In microbenchmarks I've done, raft-wal is roughly 10% faster than raft-boltdb for normal operations. That's nice. On the stability front, since it is a data store that's designed specifically for raft storage, there's no freelist to contend with. Which means that, unlike raft-boltdb, there's no cruft to build up over time as the raft log is repeatedly truncated. When using raft-boltdb, eventually you're going to need to rotate your nodes out in order to compact boltdb, otherwise the freelist will grow large and negatively impact write performance, which will negatively impact the stability of your cluster.

@VioletHynes VioletHynes added the hashicorp-contributed-pr If the PR is HashiCorp (i.e. not-community) contributed label Jul 6, 2023
@raskchanky
Copy link
Contributor Author

@banks If you get a few free moments, I wonder if you could take a quick peek at this and see if it looks reasonable. I know you also did some work on a verifier as part of integrating raft-wal into Consul, but I wasn't sure 1) if that was also appropriate here and 2) if so, how to actually implement that.

Copy link
Member

@banks banks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great Josh. Couple of comments inline including the file existence check which I don't think is quite right.

For the verifier, yes it's just as applicable to us as it is to Consul. It basically provides some confidence that the LogStore is not corrupting data (whether it it BoltDB or raft-wal) which otherwise is extremely hard to detect.

Implementing requires a few things:

  1. config to enable and setup how often to write checkpoints (see consul docs)
  2. The current active node (raft leader) needs to use that config to periodically apply a special Raft entry. This part could be a little more complex in Vault because the code that runs on the "active node" in general is not aware of rafty things and the raft storage is not directly aware of whether it is the leader or not or at least doesn't run different operations if it is the leader right now IIRC. We should be able to figure out some way to make this work and I hope without breaking too many abstractions (ideally it would be confined to the Raft backend not spread through Vault' HA code when it's raft specific but we can see how that goes). That code needs to do something like this: https://github.com/hashicorp/consul/blob/e235c8be3c67ed1389af017a76b29a8452b86453/agent/consul/leader_log_verification.go
  3. We need code that defines how to classify a checkpoint operation in Raft vs any other one and how to report success or failures to logs. This should probably live in the raft backend package and look something like this: https://github.com/hashicorp/consul/blob/e235c8be3c67ed1389af017a76b29a8452b86453/agent/consul/server_log_verification.go
  4. plumbing to wire that all up: https://github.com/hashicorp/consul/blob/e235c8be3c67ed1389af017a76b29a8452b86453/agent/consul/server.go#L1076-L1083
  5. This is the most subtle bit: the way we record checksums on the checkpoints may or may not play nicely with Vault's existing usage of chunking and/or FSM. In Consul I had to add a Shim to the FSM because of the way raft-chunking expects the raft extra data field to be used. Since we didn't fix that yet, and since we have to be compatible with older Vault builds to avoid crashed during upgrade anyway, we may need a similar shim and some careful testing of the mixed-version as well as mixed-enabled vs disabled configs. See https://github.com/hashicorp/consul/blob/e235c8be3c67ed1389af017a76b29a8452b86453/agent/consul/fsm/log_verification_chunking_shim.go#L18

physical/raft/raft.go Outdated Show resolved Hide resolved
physical/raft/raft.go Outdated Show resolved Hide resolved
physical/raft/raft_test.go Outdated Show resolved Hide resolved
@github-actions
Copy link

github-actions bot commented Aug 23, 2023

CI Results:
All Go tests succeeded! ✅

@raskchanky
Copy link
Contributor Author

@banks Thanks for all the tips. I think I got items 1-4 on your list implemented (modulo some better acceptance tests). Item 5 on your list has me a bit puzzled still, in terms of where it goes.

@banks
Copy link
Member

banks commented Aug 29, 2023

@raskchanky item 5 may not be needed though I suspect it will. The way you tell is:

  1. Start up a Vault cluster where the leader is using this branch with verifier enabled but at least one follower is using current HEAD or a previous release version
  2. See if it crashes

😄

The problem is that the verifier relies on being able to write additional data into the Extensions field in each raft log. go-raftchunking was the original user of this field and even tried to design itself with a way that other extensions can also use it because it re-encodes existing Extension data into it's own and vice versa.

The problem though is that the log verified needs to be able to write to the Extension field at a lower level than go-raftchunking. go-raftchunking is middleware that wraps the entire Raft Commit -> FSM Apply flow from the outside while log verifier needs to have the checksums computed and written to the log from inside the leader's LogStore abstraction at the lowest level of Raft stack.

The good news is that verifier only cares about Extension on Checkpoint log entries which by definition will never need to be chunked by go-raftchunking so the two things can operate indepently.

The bad news is that go-raftchunking's FSM layer assumes that if any log has a non-nil Extensions field, that it must have been encoded by raftchunking.Apply on the other side and so tries to decode it as it's own protobuf state and errors if it can't. Error during apply causes a panic since Raft can't really make progress if it can't apply a log.

Long term it would be nicer to change raftchunking so that it had it's own magic prefix and ignored others but I didn't do that for Consul because even if we did we'd also need some sort of migration process because logs on disk would have been persisted the old way etc. It was simpler to just add the FSM shim I linked to before which used a heuristic which I could guarantee was always safe to "do the right thing" and intercept the new checkpoint log entries before they reach the raftchunking.FSM.Apply and cause an error. Actually they should be a no-op at the FSM layer since all the validation happens at the VerifyingLogStore layer so we just need to intercept them to prevent the error.

Does that make sense?

Not reviewed the other changes here yet, happy to chat about this if you want to figure out the best approach together.

Copy link
Member

@banks banks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks super close @raskchanky! 🎉

I think the one thing we should review is the encoding of checkpoints so that we can avoid the overhead of double-parsing every single log on all servers at different FSM layers! I don't think that's too gross from a quick look but we can talk about it. It would mean bypassing or slightly refactoring applyLogs to let us write the raw raft.Log somehow.

physical/raft/raft.go Outdated Show resolved Hide resolved
physical/raft/raft.go Outdated Show resolved Hide resolved
go.mod Show resolved Hide resolved
physical/raft/raft.go Outdated Show resolved Hide resolved
physical/raft/raft.go Show resolved Hide resolved
physical/raft/raft.go Outdated Show resolved Hide resolved
physical/raft/raft.go Outdated Show resolved Hide resolved
physical/raft/raft.go Outdated Show resolved Hide resolved
physical/raft/raft.go Outdated Show resolved Hide resolved
physical/raft/raft_test.go Show resolved Hide resolved
@raskchanky raskchanky marked this pull request as ready for review September 8, 2023 17:55
Copy link
Member

@banks banks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking awesome @raskchanky!

I'll submit this now although I have noted a couple things I'm going to come back to later. Mostly minor nits but one or two places where what you have works but in theory could possibly hit edge cases now or in the future so probably best to tighten those up!

go.mod Show resolved Hide resolved
physical/raft/fsm.go Outdated Show resolved Hide resolved
physical/raft/fsm.go Outdated Show resolved Hide resolved
physical/raft/raft.go Outdated Show resolved Hide resolved
physical/raft/raft.go Outdated Show resolved Hide resolved
physical/raft/raft.go Outdated Show resolved Hide resolved
physical/raft/raft.go Outdated Show resolved Hide resolved
physical/raft/raft.go Outdated Show resolved Hide resolved
physical/raft/raft.go Show resolved Hide resolved
vault/raft.go Outdated Show resolved Hide resolved
@raskchanky raskchanky added this to the 1.16.0-rc1 milestone Jan 12, 2024
hghaf099 and others added 2 commits January 16, 2024 16:27
* adding a migration test from boltdb to raftwal and back
adding a migration test using snapshot restore

* feedback
Copy link
Member

@banks banks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots if nits in here that really don't matter so leave them to you if you think they are worth fiddling with.

The one thing I think we should look at before merge is the BatchApply handling of the empty log - it seems to work now but seems alarmingly brittle, near-missing panics and bad violations of assumptions by pure chance in several places that have no explicit intention to allow this behavior! More inline. I don't think this should be too hard and at worst I'd consider accepting it by just adding inline comments to all those places where we happen to to do the "right" thing by chance now so it's at least harder to accidentaly break later!

physical/raft/fsm.go Outdated Show resolved Hide resolved
physical/raft/raft.go Outdated Show resolved Hide resolved
Comment on lines +732 to +736
if boltStore, ok := b.stableStore.(*raftboltdb.BoltStore); ok {
bss := boltStore.Stats()
logStoreStats = &bss
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we have tests that cover these metrics being produced? I don't see any usages of CollectMetrics in raft_test.go at least. If not it's probably worth (at least) adding to a TODO list for manual testing to be sure we don't regress those here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I'm following what kind of test you're looking for. CollectMetrics is called here as part of the main metrics collection loop. Can you be more specific?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked offline but for the sake of GH history, I just meant that I don't think we have unit tests that assert that boltdb metrics are actually reported when this is called either locally or in general in Vault.

So it would be possible to typo this change and not fail any tests but break our actual metrics around boltdb.

Ideally we'd have some sort of unit test there but since there is no precedent we can plan to add one later, but it would be wise to at least manually verify that when using BoltDB on this branch we still get stats output as before!

physical/raft/raft.go Show resolved Hide resolved
physical/raft/raft.go Outdated Show resolved Hide resolved
physical/raft/raft.go Outdated Show resolved Hide resolved
physical/raft/raft.go Show resolved Hide resolved
physical/raft/raft.go Outdated Show resolved Hide resolved
physical/raft/raft.go Show resolved Hide resolved
Copy link
Member

@banks banks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked offline but here's the other thing we should fix to get the verifier working again.

physical/raft/raft.go Outdated Show resolved Hide resolved
Copy link
Contributor Author

@raskchanky raskchanky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@banks 293f027 looks good to me

Copy link
Member

@banks banks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing job @raskchanky

This looks good to go!

@raskchanky raskchanky merged commit ef26498 into main Jan 25, 2024
110 checks passed
@raskchanky raskchanky deleted the raft-wal branch January 25, 2024 18:08
@@ -0,0 +1,3 @@
```release-note:feature
storage/raft: Add experimental support for raft-wal, a new backend engine for integrated storage.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raskchanky next time please use the correct new feature formatting for new features in the changelog.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reminder. I did correct this in a subsequent PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a good candidate for a new CI check, so that we don't have to rely on humans remembering to always do the right thing in several different scenarios.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raskchanky I've added an agenda item to discuss new requirements for the changelog checking tooling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hashicorp-contributed-pr If the PR is HashiCorp (i.e. not-community) contributed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants