-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement raft-wal #21460
Implement raft-wal #21460
Conversation
@banks If you get a few free moments, I wonder if you could take a quick peek at this and see if it looks reasonable. I know you also did some work on a verifier as part of integrating raft-wal into Consul, but I wasn't sure 1) if that was also appropriate here and 2) if so, how to actually implement that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great Josh. Couple of comments inline including the file existence check which I don't think is quite right.
For the verifier, yes it's just as applicable to us as it is to Consul. It basically provides some confidence that the LogStore is not corrupting data (whether it it BoltDB or raft-wal) which otherwise is extremely hard to detect.
Implementing requires a few things:
- config to enable and setup how often to write checkpoints (see consul docs)
- The current active node (raft leader) needs to use that config to periodically apply a special Raft entry. This part could be a little more complex in Vault because the code that runs on the "active node" in general is not aware of rafty things and the raft storage is not directly aware of whether it is the leader or not or at least doesn't run different operations if it is the leader right now IIRC. We should be able to figure out some way to make this work and I hope without breaking too many abstractions (ideally it would be confined to the Raft backend not spread through Vault' HA code when it's raft specific but we can see how that goes). That code needs to do something like this: https://github.com/hashicorp/consul/blob/e235c8be3c67ed1389af017a76b29a8452b86453/agent/consul/leader_log_verification.go
- We need code that defines how to classify a checkpoint operation in Raft vs any other one and how to report success or failures to logs. This should probably live in the raft backend package and look something like this: https://github.com/hashicorp/consul/blob/e235c8be3c67ed1389af017a76b29a8452b86453/agent/consul/server_log_verification.go
- plumbing to wire that all up: https://github.com/hashicorp/consul/blob/e235c8be3c67ed1389af017a76b29a8452b86453/agent/consul/server.go#L1076-L1083
- This is the most subtle bit: the way we record checksums on the checkpoints may or may not play nicely with Vault's existing usage of chunking and/or FSM. In Consul I had to add a Shim to the FSM because of the way raft-chunking expects the raft extra data field to be used. Since we didn't fix that yet, and since we have to be compatible with older Vault builds to avoid crashed during upgrade anyway, we may need a similar shim and some careful testing of the mixed-version as well as mixed-enabled vs disabled configs. See https://github.com/hashicorp/consul/blob/e235c8be3c67ed1389af017a76b29a8452b86453/agent/consul/fsm/log_verification_chunking_shim.go#L18
CI Results: |
@banks Thanks for all the tips. I think I got items 1-4 on your list implemented (modulo some better acceptance tests). Item 5 on your list has me a bit puzzled still, in terms of where it goes. |
@raskchanky item 5 may not be needed though I suspect it will. The way you tell is:
😄 The problem is that the verifier relies on being able to write additional data into the The problem though is that the log verified needs to be able to write to the Extension field at a lower level than go-raftchunking. The good news is that verifier only cares about Extension on Checkpoint log entries which by definition will never need to be chunked by go-raftchunking so the two things can operate indepently. The bad news is that go-raftchunking's FSM layer assumes that if any log has a non-nil Long term it would be nicer to change Does that make sense? Not reviewed the other changes here yet, happy to chat about this if you want to figure out the best approach together. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks super close @raskchanky! 🎉
I think the one thing we should review is the encoding of checkpoints so that we can avoid the overhead of double-parsing every single log on all servers at different FSM layers! I don't think that's too gross from a quick look but we can talk about it. It would mean bypassing or slightly refactoring applyLogs
to let us write the raw raft.Log
somehow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking awesome @raskchanky!
I'll submit this now although I have noted a couple things I'm going to come back to later. Mostly minor nits but one or two places where what you have works but in theory could possibly hit edge cases now or in the future so probably best to tighten those up!
* adding a migration test from boltdb to raftwal and back adding a migration test using snapshot restore * feedback
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots if nits in here that really don't matter so leave them to you if you think they are worth fiddling with.
The one thing I think we should look at before merge is the BatchApply
handling of the empty log - it seems to work now but seems alarmingly brittle, near-missing panics and bad violations of assumptions by pure chance in several places that have no explicit intention to allow this behavior! More inline. I don't think this should be too hard and at worst I'd consider accepting it by just adding inline comments to all those places where we happen to to do the "right" thing by chance now so it's at least harder to accidentaly break later!
if boltStore, ok := b.stableStore.(*raftboltdb.BoltStore); ok { | ||
bss := boltStore.Stats() | ||
logStoreStats = &bss | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we have tests that cover these metrics being produced? I don't see any usages of CollectMetrics
in raft_test.go at least. If not it's probably worth (at least) adding to a TODO list for manual testing to be sure we don't regress those here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I'm following what kind of test you're looking for. CollectMetrics
is called here as part of the main metrics collection loop. Can you be more specific?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We talked offline but for the sake of GH history, I just meant that I don't think we have unit tests that assert that boltdb metrics are actually reported when this is called either locally or in general in Vault.
So it would be possible to typo this change and not fail any tests but break our actual metrics around boltdb.
Ideally we'd have some sort of unit test there but since there is no precedent we can plan to add one later, but it would be wise to at least manually verify that when using BoltDB on this branch we still get stats output as before!
Co-authored-by: Paul Banks <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We talked offline but here's the other thing we should fix to get the verifier working again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing job @raskchanky
This looks good to go!
@@ -0,0 +1,3 @@ | |||
```release-note:feature | |||
storage/raft: Add experimental support for raft-wal, a new backend engine for integrated storage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@raskchanky next time please use the correct new feature formatting for new features in the changelog.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reminder. I did correct this in a subsequent PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a good candidate for a new CI check, so that we don't have to rely on humans remembering to always do the right thing in several different scenarios.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@raskchanky I've added an agenda item to discuss new requirements for the changelog checking tooling.
This provides a config option called
raft_wal
, which can be set to"true"
or"false"
(raft backend config ismap[string]string
) for optionally enabling the use of https://github.com/hashicorp/raft-wal for raft storage instead of BoltDB.If Vault is configured to use raft-wal but it detects raft.db in the normal spot, it continues to use raft.db and logs a warning that it's ignoring the raft-wal config, similar to what Consul does.
I modified the raft test helpers to allow a config map to be passed in, and then made most of the raft tests table driven, so they can exercise both boltdb and raft-wal. Most of the changes in that test file are mechanical, just moving existing test code into a table driven setup and adding a bit of error checking.
On the subject of "why are we doing this":
There are 2 main motivations for using raft-wal instead of raft-boltdb for raft storage: performance and stability. In microbenchmarks I've done, raft-wal is roughly 10% faster than raft-boltdb for normal operations. That's nice. On the stability front, since it is a data store that's designed specifically for raft storage, there's no freelist to contend with. Which means that, unlike raft-boltdb, there's no cruft to build up over time as the raft log is repeatedly truncated. When using raft-boltdb, eventually you're going to need to rotate your nodes out in order to compact boltdb, otherwise the freelist will grow large and negatively impact write performance, which will negatively impact the stability of your cluster.