-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: implement closed ts side-transport publisher #61137
kvserver: implement closed ts side-transport publisher #61137
Conversation
fa41557
to
47fbe89
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good. I just pushed three new commits. The first two serve as partial code reviews, making a few bug fixes, cleaning up some code (apologies for a few of the renames), getting it in a better place to wrap unit tests around, and adding a few TODOs that we'll want to address in this PR or in a follow-on PR. The third commit fixes two of the more severe issues I found when reading over this change, relating to when it's safe for a leaseholder to publish on the side-transport.
I wasn't able to get to unit tests yet, but that's still on my radar. Let's talk tomorrow about how we want to split those up. Once we're happy with everything, we can squash it all down and go through a polishing code review pass.
Needed for cockroachdb#61137. This commit updates the manner through which lease transfers (through `LeaseTransferRequest`) and range merges (through `SubsumeRequest`) handle the "transfer of power" from their outgoing leaseholder to their incoming leaseholder. Specifically, it updates the handling of these requests in order to rationalize the interaction between their evaluation and the closing of timestamps through the closed timestamp side-transport. It is now clearer when and how these requests instruct the outgoing leaseholder to relinquish its ability to advance the closed timestamp and, as a result, now possible for the requests to query and operate on the maximum closed timestamp published by the outgoing leaseholder. For lease transfers, this commit begins by addressing an existing TODO to push the revocation of the outgoing lease out of `AdminTransferLease` and into the evaluation of `LeaseTransferRequest` through a new `RevokeLease` method on the `EvalContext`. Once a lease is revoked, the side-transport will no longer be able to advance the closed timestamp under it. This was made possible by cockroachdb#59086 and was suggested by @tbg during the code review. We generally like to keep replica state changes out of "admin" requests themselves, which are intended to coordinate changes through individual non-admin requests. Admin requests generally don't even need to evaluate on a leaseholder (though they try to), so having them perform any state changes is fragile. For range merges, this commit removes the `MaybeWatchForMerge` flag from the `LocalResult` returned by `SubsumeRequest` and replaces it with a `WatchForMerge` method on the `EvalContext`. This allows the `SubsumeRequest` to launch the range merge watcher goroutine during it evaluation, which the side-transport checks for to see whether a leaseholder can advance its closed timestamp. In doing so, the `SubsumeRequest` is able to pause closed timestamps when it wants and is also able to observe and return the maximum closed timestamp _after_ the closed timestamp has stopped advancing. This is a stark improvement over the approach with the original closed timestamp system, which required a herculean effort in cockroachdb#50265 to make correct. With these refactors complete, the closed timestamp side-transport should have a much easier and safer time checking whether a given leaseholder is able to advance its closed timestamp. Release justification: Necessary for the safety of new functionality.
Needed for cockroachdb#61137. This commit updates the manner through which lease transfers (through `LeaseTransferRequest`) and range merges (through `SubsumeRequest`) handle the "transfer of power" from their outgoing leaseholder to their incoming leaseholder. Specifically, it updates the handling of these requests in order to rationalize the interaction between their evaluation and the closing of timestamps through the closed timestamp side-transport. It is now clearer when and how these requests instruct the outgoing leaseholder to relinquish its ability to advance the closed timestamp and, as a result, now possible for the requests to query and operate on the maximum closed timestamp published by the outgoing leaseholder. For lease transfers, this commit begins by addressing an existing TODO to push the revocation of the outgoing lease out of `AdminTransferLease` and into the evaluation of `LeaseTransferRequest` through a new `RevokeLease` method on the `EvalContext`. Once a lease is revoked, the side-transport will no longer be able to advance the closed timestamp under it. This was made possible by cockroachdb#59086 and was suggested by @tbg during the code review. We generally like to keep replica state changes out of "admin" requests themselves, which are intended to coordinate changes through individual non-admin requests. Admin requests generally don't even need to evaluate on a leaseholder (though they try to), so having them perform any state changes is fragile. For range merges, this commit removes the `MaybeWatchForMerge` flag from the `LocalResult` returned by `SubsumeRequest` and replaces it with a `WatchForMerge` method on the `EvalContext`. This allows the `SubsumeRequest` to launch the range merge watcher goroutine during it evaluation, which the side-transport checks for to see whether a leaseholder can advance its closed timestamp. In doing so, the `SubsumeRequest` is able to pause closed timestamps when it wants and is also able to observe and return the maximum closed timestamp _after_ the closed timestamp has stopped advancing. This is a stark improvement over the approach with the original closed timestamp system, which required a herculean effort in cockroachdb#50265 to make correct. With these refactors complete, the closed timestamp side-transport should have a much easier and safer time checking whether a given leaseholder is able to advance its closed timestamp. Release justification: Necessary for the safety of new functionality.
61221: kv: sync lease transfers and range merges with closed timestamp side-transport r=nvanbenschoten a=nvanbenschoten Needed for the safety of #61137. This commit updates the manner through which lease transfers (through `LeaseTransferRequest`) and range merges (through `SubsumeRequest`) handle the "transfer of power" from their outgoing leaseholder to their incoming leaseholder. Specifically, it updates the handling of these requests in order to rationalize the interaction between their evaluation and the closing of timestamps through the closed timestamp side-transport. It is now clearer when and how these requests instruct the outgoing leaseholder to relinquish its ability to advance the closed timestamp and, as a result, now possible for the requests to query and operate on the maximum closed timestamp published by the outgoing leaseholder. For lease transfers, this commit begins by addressing an existing TODO to push the revocation of the outgoing lease out of `AdminTransferLease` and into the evaluation of `LeaseTransferRequest` through a new `RevokeLease` method on the `EvalContext`. Once a lease is revoked, the side-transport will no longer be able to advance the closed timestamp under it. This was made possible by #59086 and was suggested by @tbg during the code review. We generally like to keep replica state changes out of "admin" requests themselves, which are intended to coordinate changes through individual non-admin requests. Admin requests generally don't even need to evaluate on a leaseholder (though they try to), so having them perform any state changes is fragile. For range merges, this commit removes the `MaybeWatchForMerge` flag from the `LocalResult` returned by `SubsumeRequest` and replaces it with a `WatchForMerge` method on the `EvalContext`. This allows the `SubsumeRequest` to launch the range merge watcher goroutine during it evaluation, which the side-transport checks for to see whether a leaseholder can advance its closed timestamp. In doing so, the `SubsumeRequest` is able to pause closed timestamps when it wants and is also able to observe and return the maximum closed timestamp _after_ the closed timestamp has stopped advancing. This is a stark improvement over the approach with the original closed timestamp system, which required a herculean effort in #50265 to make correct. With these refactors complete, the closed timestamp side-transport should have a much easier and safer time checking whether a given leaseholder is able to advance its closed timestamp. Release justification: Necessary for the safety of new functionality. 61237: util/log: ensure that all channel logs are displayed with `-show-logs` r=tbg a=knz When `-show-logs` is specified, the `log.Scope` becomes a no-op and the default configuration in the `log` package is used. This is the only time ever when the default configuration is used. Prior to this patch, only the logging for the DEV channel would make its way to the standard error (and the test output) in that case. This was unfortunate, since the intent (as spelled out in a comment already) was to display everything. This patch fixes that. Release justification: non-production code changes Release note: None Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: Raphael 'kena' Poss <[email protected]>
9f51044
to
b9133d9
Compare
c8272f4
to
f2058f5
Compare
f2058f5
to
db77cd8
Compare
db77cd8
to
91a35c7
Compare
9cb67df
to
1e39f1b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
once the msg.Snapshot = msg.SeqNum == 1
code I broke gets fixed and CI is happy.
This was a surprisingly enjoyable way to work through a complex code review. The reviewable comments will be lost to the sands of time, but the memories of force pushes and merge skew will last forever.
Reviewed 2 of 2 files at r1, 10 of 10 files at r2, 4 of 4 files at r3, 15 of 15 files at r4, 1 of 1 files at r5, 17 of 17 files at r6, 4 of 4 files at r7, 2 of 2 files at r8, 25 of 28 files at r9, 1 of 1 files at r10, 4 of 4 files at r11, 2 of 2 files at r12.
Reviewable status: complete! 1 of 0 LGTMs obtained
Rename this replica state field to reflect that fact that it only talks about the timestamps closed through Raft, not through the upcoming side-transport. There's going to be also a sidetransport_closed_timestamp state field. Release note: None
Move a function out of the proposal buffer, so it can be shared between the proposal buffer and the side transport. Release note: None
Release note: None
489a9c1
to
01cf85e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bors r+
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)
Ugh
bors r- |
Canceled. |
The side-transport is a component running on each node and periodically publishing closed timestamps on ranges with the lease on the respective node. This complements the closing of timestamps through Raft commands such that inactive ranges still have their timestamp advanced. This commit introduces only the publishing side (the consumer is coming separately) - the guy opening streaming connections to all other nodes with follower replicas for some ranges with local leases and periodically publishing closed timestamp updates on a bunch of ranges at once. Care has been taken to make the communication protocol efficient. Each stream is stateful and the information in every message is nicely compressed. See [the RFC](cockroachdb#56675) for details. Release justification: Needed for global tables. Release note: None
01cf85e
to
b60602c
Compare
bors r+ |
Build succeeded: |
@andreimatei I have been investigating some troubles with race tests in various packages. The one I was looking at was I have added some instrumentation to the test to panic if we observe more than 4000 goroutines (diff below). Since this PR, the panic is hit reliably within 30s (with
|
The side-transport is a component running on each node and periodically
publishing closed timestamps on ranges with the lease on the respective
node. This complements the closing of timestamps through Raft commands
such that inactive ranges still have their timestamp advanced.
This commit introduces only the publishing side (the consumer is coming
separately) - the guy opening streaming connections to all other nodes
with follower replicas for some ranges with local leases and
periodically publishing closed timestamp updates on a bunch of ranges at
once.
Care has been taken to make the communication protocol efficient. Each
stream is stateful and the information in every message is nicely
compressed.
See the RFC for details.
Release justification: Needed for global tables.
Release note: None