Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: implement closed ts side-transport publisher #61137

Merged

Conversation

andreimatei
Copy link
Contributor

@andreimatei andreimatei commented Feb 25, 2021

The side-transport is a component running on each node and periodically
publishing closed timestamps on ranges with the lease on the respective
node. This complements the closing of timestamps through Raft commands
such that inactive ranges still have their timestamp advanced.

This commit introduces only the publishing side (the consumer is coming
separately) - the guy opening streaming connections to all other nodes
with follower replicas for some ranges with local leases and
periodically publishing closed timestamp updates on a bunch of ranges at
once.

Care has been taken to make the communication protocol efficient. Each
stream is stateful and the information in every message is nicely
compressed.

See the RFC for details.

Release justification: Needed for global tables.
Release note: None

@andreimatei andreimatei requested a review from a team as a code owner February 25, 2021 18:55
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@andreimatei andreimatei force-pushed the closedts.side-transport-sender branch 3 times, most recently from fa41557 to 47fbe89 Compare February 25, 2021 19:56
@nvanbenschoten nvanbenschoten requested a review from a team February 26, 2021 06:23
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good. I just pushed three new commits. The first two serve as partial code reviews, making a few bug fixes, cleaning up some code (apologies for a few of the renames), getting it in a better place to wrap unit tests around, and adding a few TODOs that we'll want to address in this PR or in a follow-on PR. The third commit fixes two of the more severe issues I found when reading over this change, relating to when it's safe for a leaseholder to publish on the side-transport.

I wasn't able to get to unit tests yet, but that's still on my radar. Let's talk tomorrow about how we want to split those up. Once we're happy with everything, we can squash it all down and go through a polishing code review pass.

nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Feb 27, 2021
Needed for cockroachdb#61137.

This commit updates the manner through which lease transfers (through
`LeaseTransferRequest`) and range merges (through `SubsumeRequest`)
handle the "transfer of power" from their outgoing leaseholder to their
incoming leaseholder. Specifically, it updates the handling of these
requests in order to rationalize the interaction between their
evaluation and the closing of timestamps through the closed timestamp
side-transport. It is now clearer when and how these requests instruct
the outgoing leaseholder to relinquish its ability to advance the closed
timestamp and, as a result, now possible for the requests to query and
operate on the maximum closed timestamp published by the outgoing leaseholder.

For lease transfers, this commit begins by addressing an existing TODO
to push the revocation of the outgoing lease out of `AdminTransferLease`
and into the evaluation of `LeaseTransferRequest` through a new
`RevokeLease` method on the `EvalContext`. Once a lease is revoked, the
side-transport will no longer be able to advance the closed timestamp
under it. This was made possible by cockroachdb#59086 and was suggested by @tbg
during the code review. We generally like to keep replica state changes
out of "admin" requests themselves, which are intended to coordinate
changes through individual non-admin requests. Admin requests generally
don't even need to evaluate on a leaseholder (though they try to), so
having them perform any state changes is fragile.

For range merges, this commit removes the `MaybeWatchForMerge` flag from
the `LocalResult` returned by `SubsumeRequest` and replaces it with a
`WatchForMerge` method on the `EvalContext`. This allows the
`SubsumeRequest` to launch the range merge watcher goroutine during it
evaluation, which the side-transport checks for to see whether a
leaseholder can advance its closed timestamp. In doing so, the
`SubsumeRequest` is able to pause closed timestamps when it wants and is
also able to observe and return the maximum closed timestamp _after_ the
closed timestamp has stopped advancing. This is a stark improvement over
the approach with the original closed timestamp system, which required a
herculean effort in cockroachdb#50265 to make correct.

With these refactors complete, the closed timestamp side-transport
should have a much easier and safer time checking whether a given
leaseholder is able to advance its closed timestamp.

Release justification: Necessary for the safety of new functionality.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Mar 1, 2021
Needed for cockroachdb#61137.

This commit updates the manner through which lease transfers (through
`LeaseTransferRequest`) and range merges (through `SubsumeRequest`)
handle the "transfer of power" from their outgoing leaseholder to their
incoming leaseholder. Specifically, it updates the handling of these
requests in order to rationalize the interaction between their
evaluation and the closing of timestamps through the closed timestamp
side-transport. It is now clearer when and how these requests instruct
the outgoing leaseholder to relinquish its ability to advance the closed
timestamp and, as a result, now possible for the requests to query and
operate on the maximum closed timestamp published by the outgoing leaseholder.

For lease transfers, this commit begins by addressing an existing TODO
to push the revocation of the outgoing lease out of `AdminTransferLease`
and into the evaluation of `LeaseTransferRequest` through a new
`RevokeLease` method on the `EvalContext`. Once a lease is revoked, the
side-transport will no longer be able to advance the closed timestamp
under it. This was made possible by cockroachdb#59086 and was suggested by @tbg
during the code review. We generally like to keep replica state changes
out of "admin" requests themselves, which are intended to coordinate
changes through individual non-admin requests. Admin requests generally
don't even need to evaluate on a leaseholder (though they try to), so
having them perform any state changes is fragile.

For range merges, this commit removes the `MaybeWatchForMerge` flag from
the `LocalResult` returned by `SubsumeRequest` and replaces it with a
`WatchForMerge` method on the `EvalContext`. This allows the
`SubsumeRequest` to launch the range merge watcher goroutine during it
evaluation, which the side-transport checks for to see whether a
leaseholder can advance its closed timestamp. In doing so, the
`SubsumeRequest` is able to pause closed timestamps when it wants and is
also able to observe and return the maximum closed timestamp _after_ the
closed timestamp has stopped advancing. This is a stark improvement over
the approach with the original closed timestamp system, which required a
herculean effort in cockroachdb#50265 to make correct.

With these refactors complete, the closed timestamp side-transport
should have a much easier and safer time checking whether a given
leaseholder is able to advance its closed timestamp.

Release justification: Necessary for the safety of new functionality.
craig bot pushed a commit that referenced this pull request Mar 1, 2021
61221: kv: sync lease transfers and range merges with closed timestamp side-transport r=nvanbenschoten a=nvanbenschoten

Needed for the safety of #61137.

This commit updates the manner through which lease transfers (through `LeaseTransferRequest`) and range merges (through `SubsumeRequest`) handle the "transfer of power" from their outgoing leaseholder to their incoming leaseholder. Specifically, it updates the handling of these requests in order to rationalize the interaction between their evaluation and the closing of timestamps through the closed timestamp side-transport. It is now clearer when and how these requests instruct the outgoing leaseholder to relinquish its ability to advance the closed timestamp and, as a result, now possible for the requests to query and operate on the maximum closed timestamp published by the outgoing leaseholder.

For lease transfers, this commit begins by addressing an existing TODO to push the revocation of the outgoing lease out of `AdminTransferLease` and into the evaluation of `LeaseTransferRequest` through a new `RevokeLease` method on the `EvalContext`. Once a lease is revoked, the side-transport will no longer be able to advance the closed timestamp under it. This was made possible by #59086 and was suggested by @tbg during the code review. We generally like to keep replica state changes out of "admin" requests themselves, which are intended to coordinate changes through individual non-admin requests. Admin requests generally don't even need to evaluate on a leaseholder (though they try to), so having them perform any state changes is fragile.

For range merges, this commit removes the `MaybeWatchForMerge` flag from the `LocalResult` returned by `SubsumeRequest` and replaces it with a `WatchForMerge` method on the `EvalContext`. This allows the `SubsumeRequest` to launch the range merge watcher goroutine during it evaluation, which the side-transport checks for to see whether a leaseholder can advance its closed timestamp. In doing so, the `SubsumeRequest` is able to pause closed timestamps when it wants and is also able to observe and return the maximum closed timestamp _after_ the closed timestamp has stopped advancing. This is a stark improvement over the approach with the original closed timestamp system, which required a herculean effort in #50265 to make correct.

With these refactors complete, the closed timestamp side-transport should have a much easier and safer time checking whether a given leaseholder is able to advance its closed timestamp.

Release justification: Necessary for the safety of new functionality.

61237: util/log: ensure that all channel logs are displayed with `-show-logs` r=tbg a=knz

When `-show-logs` is specified, the `log.Scope` becomes a no-op and
the default configuration in the `log` package is used. This is the
only time ever when the default configuration is used.

Prior to this patch, only the logging for the DEV channel would
make its way to the standard error (and the test output) in that
case. This was unfortunate, since the intent (as spelled out in a
comment already) was to display everything.

This patch fixes that.

Release justification: non-production code changes

Release note: None

Co-authored-by: Nathan VanBenschoten <[email protected]>
Co-authored-by: Raphael 'kena' Poss <[email protected]>
@andreimatei andreimatei force-pushed the closedts.side-transport-sender branch from 9f51044 to b9133d9 Compare March 1, 2021 19:07
@nvanbenschoten nvanbenschoten force-pushed the closedts.side-transport-sender branch 2 times, most recently from c8272f4 to f2058f5 Compare March 2, 2021 03:00
@andreimatei andreimatei force-pushed the closedts.side-transport-sender branch from f2058f5 to db77cd8 Compare March 2, 2021 03:21
@nvanbenschoten nvanbenschoten force-pushed the closedts.side-transport-sender branch from db77cd8 to 91a35c7 Compare March 2, 2021 03:29
@andreimatei andreimatei force-pushed the closedts.side-transport-sender branch 4 times, most recently from 9cb67df to 1e39f1b Compare March 2, 2021 04:21
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm_strong: once the msg.Snapshot = msg.SeqNum == 1 code I broke gets fixed and CI is happy.

This was a surprisingly enjoyable way to work through a complex code review. The reviewable comments will be lost to the sands of time, but the memories of force pushes and merge skew will last forever.

Reviewed 2 of 2 files at r1, 10 of 10 files at r2, 4 of 4 files at r3, 15 of 15 files at r4, 1 of 1 files at r5, 17 of 17 files at r6, 4 of 4 files at r7, 2 of 2 files at r8, 25 of 28 files at r9, 1 of 1 files at r10, 4 of 4 files at r11, 2 of 2 files at r12.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained

Rename this replica state field to reflect that fact that it only talks
about the timestamps closed through Raft, not through the upcoming
side-transport. There's going to be also a
sidetransport_closed_timestamp state field.

Release note: None
Move a function out of the proposal buffer, so it can be shared between
the proposal buffer and the side transport.

Release note: None
Release note: None
@andreimatei andreimatei force-pushed the closedts.side-transport-sender branch 2 times, most recently from 489a9c1 to 01cf85e Compare March 2, 2021 05:05
Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bors r+

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)

@nvanbenschoten
Copy link
Member

Ugh

show_test.go:936: kv.closed_timestamp.side_transport_interval: description "the interval at which the closed-timestamp side-transport attempts to advance each range's closed timestamp; set to 0 to disable the side-transport." must end with period if and only if it contains a secondary sentence

bors r-

@craig
Copy link
Contributor

craig bot commented Mar 2, 2021

Canceled.

The side-transport is a component running on each node and periodically
publishing closed timestamps on ranges with the lease on the respective
node. This complements the closing of timestamps through Raft commands
such that inactive ranges still have their timestamp advanced.

This commit introduces only the publishing side (the consumer is coming
separately) - the guy opening streaming connections to all other nodes
with follower replicas for some ranges with local leases and
periodically publishing closed timestamp updates on a bunch of ranges at
once.

Care has been taken to make the communication protocol efficient. Each
stream is stateful and the information in every message is nicely
compressed.

See [the RFC](cockroachdb#56675) for details.

Release justification: Needed for global tables.
Release note: None
@nvanbenschoten nvanbenschoten force-pushed the closedts.side-transport-sender branch from 01cf85e to b60602c Compare March 2, 2021 05:45
@nvanbenschoten
Copy link
Member

bors r+

@craig
Copy link
Contributor

craig bot commented Mar 2, 2021

Build succeeded:

@craig craig bot merged commit 1b18fc1 into cockroachdb:master Mar 2, 2021
@andreimatei andreimatei deleted the closedts.side-transport-sender branch March 2, 2021 13:00
@RaduBerinde
Copy link
Member

@andreimatei I have been investigating some troubles with race tests in various packages. The one I was looking at was race: limit on 8128 simultaneously alive goroutines is exceeded, dying during importccl.

I have added some instrumentation to the test to panic if we observe more than 4000 goroutines (diff below). Since this PR, the panic is hit reliably within 30s (with make testrace PKG=./pkg/ccl/importccl/). Before this PR, it is never hit. I don't think any of the importccl tests do anything crazy, and 4000 goroutines is a lot. Most of them are goroutines set up by grpc.newClientStream. Can you investigate?

diff --git a/pkg/ccl/importccl/main_test.go b/pkg/ccl/importccl/main_test.go
index 9387b5e4da..a95a3edc4d 100644
--- a/pkg/ccl/importccl/main_test.go
+++ b/pkg/ccl/importccl/main_test.go
@@ -10,7 +10,9 @@ package importccl
 
 import (
 	"os"
+	"runtime"
 	"testing"
+	"time"
 
 	_ "github.com/cockroachdb/cockroach/pkg/ccl/storageccl"
 	"github.com/cockroachdb/cockroach/pkg/ccl/utilccl"
@@ -28,6 +30,22 @@ func TestMain(m *testing.M) {
 	randutil.SeedForTests()
 	serverutils.InitTestServerFactory(server.TestServerFactory)
 	serverutils.InitTestClusterFactory(testcluster.TestClusterFactory)
+
+	go func() {
+		ticker := time.NewTicker(1 * time.Millisecond)
+		for {
+			<-ticker.C
+			n := runtime.NumGoroutine()
+			if n > 4000 {
+				//buf := make([]byte, 1024*1024*40)
+				//n := runtime.Stack(buf, true)
+				//fmt.Println(string(buf[:n]))
+				panic("blargle")
+			}
+		}
+	}()
+
 	os.Exit(m.Run())
 }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants