-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver/rangefeed: remove future package #126490
Conversation
Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/kv/kvserver/replica_rangefeed.go
line 516 at r1 (raw file):
scanner, err := rangefeed.NewSeparatedIntentScanner(ctx, r.store.TODOEngine(), desc.RSpan()) if err != nil { stream.Disconnect(kvpb.NewError(err))
I attempted to change this to construct the IntentScanner
earlier during registerWithRangefeedRaftMuLocked
to avoid using stream.Disconnect. But I got the same test failure as seen in #52844 which is why the callback was initially implemented.
pkg/kv/kvclient/rangefeed/rangefeed_external_test.go
line 1630 at r1 (raw file):
ctx context.Context ch chan<- *kvpb.RangeFeedEvent done chan *kvpb.Error
Instead of using a channel for all sink types, we could embed the real StreamMuxer
and check for disconnect errors in StreamMuxer.ServerStreamSender
. I chose this approach because it aligns more closely with our approach prior to #96756 and also seems much easier. And I plan to change tests in processor_test.go
to use a real StreamMuxer in the next PR to expand our test coverage on StreamMuxer. What’s your preference?
d3ecdc3
to
a818e9b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 16 of 16 files at r2, 16 of 16 files at r3, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @wenyihu6)
pkg/kv/kvserver/replica_rangefeed.go
line 516 at r1 (raw file):
Previously, wenyihu6 (Wenyi Hu) wrote…
I attempted to change this to construct the
IntentScanner
earlier duringregisterWithRangefeedRaftMuLocked
to avoid using stream.Disconnect. But I got the same test failure as seen in #52844 which is why the callback was initially implemented.
It's kind of strange that a processor-wide (multiple rangefeeds can attach to the same rangefeed processor) dependency is in charge of sending an error signal to a single stream. What would it look like to add an error return value to IntentScannerConstructor
and handle the error case inside of the processor?
pkg/server/node.go
line 1867 at r3 (raw file):
// Disconnect implements the rangefeed.Stream interface. It disconnects the // stream from the StreamMuxer. The StreamMuxer is then responsible for // disconnecting the stream and cleaning up.
This comment says that the function " disconnects the stream from the StreamMuxer" and that "The StreamMuxer is then responsible for disconnecting the stream". These two sound contradictory. Are both true? Can the responsibility of each be clarified?
pkg/server/node.go
line 1938 at r3 (raw file):
streamMuxer.AddStream(req.StreamID, cancel) if err := n.stores.RangeFeed(req, streamSink); err != nil {
Can we add a comment here that this call registers a RangeFeed and returns an error if registration fails, but that it returns without blocking and without an error if the registration succeeds?
pkg/kv/kvclient/rangefeed/rangefeed_external_test.go
line 1630 at r1 (raw file):
Previously, wenyihu6 (Wenyi Hu) wrote…
Instead of using a channel for all sink types, we could embed the real
StreamMuxer
and check for disconnect errors inStreamMuxer.ServerStreamSender
. I chose this approach because it aligns more closely with our approach prior to #96756. And I plan to change tests inprocessor_test.go
to use a real StreamMuxer in the next PR to expand our test coverage on StreamMuxer. What’s your preference?
This approach seems fine to me. It's nice to avoid the dependency on StreamMuxer
.
pkg/kv/kvserver/rangefeed/processor_test.go
line 1812 at r3 (raw file):
case err := <-c.done: return err.GoError() case <-time.After(30 * time.Second):
Let's replace all of these 30 * time.Second
with testutils.DefaultSucceedsSoonDuration
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)
pkg/kv/kvserver/replica_rangefeed.go
line 516 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
It's kind of strange that a processor-wide (multiple rangefeeds can attach to the same rangefeed processor) dependency is in charge of sending an error signal to a single stream. What would it look like to add an error return value to
IntentScannerConstructor
and handle the error case inside of the processor?
Thanks for raising this! I looked into this more. There are two things that I'm not quite understanding.
This comment says
cockroach/pkg/kv/kvserver/rangefeed/processor.go
Lines 166 to 167 in 04a6ba0
// If the iterator is nil then no initialization scan will be performed and | |
// the resolved timestamp will immediately be considered initialized. |
However, the nil check is performed on the intent scanner constructor but not on the intent scanner itself
if rtsIterFunc != nil { |
cockroach/pkg/kv/kvserver/rangefeed/task.go
Line 135 in 91f53ea
func (s *SeparatedIntentScanner) ConsumeIntents( |
I'm trying to distinguish when the intent scanner constructor returns nil without an error V.S. returns nil with an error. The first case seems only possible in tests. Based on the code here
cockroach/pkg/kv/kvserver/rangefeed/task.go
Line 110 in 91f53ea
func NewSeparatedIntentScanner( |
- Additionally, we only disconnect the first registration here during
IntentScannerConstructor
. I think this is currently true because we make sure to call the intent scanner constructor under the same lock as the first registration (It just happens that we need this to avoid missing events). No otherp.Register
could happen beforeIntentScannerConstructor
is called but it feels like a brittle assumption. Should we callp.StopProcessor()
whenIntentScannerConstructor
returns an error?
I'm trying to fix these in #127204 if the concerns look right to you.
pkg/server/node.go
line 1867 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
This comment says that the function " disconnects the stream from the StreamMuxer" and that "The StreamMuxer is then responsible for disconnecting the stream". These two sound contradictory. Are both true? Can the responsibility of each be clarified?
Done.
pkg/server/node.go
line 1938 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Can we add a comment here that this call registers a RangeFeed and returns an error if registration fails, but that it returns without blocking and without an error if the registration succeeds?
Done.
pkg/kv/kvclient/rangefeed/rangefeed_external_test.go
line 1630 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
This approach seems fine to me. It's nice to avoid the dependency on
StreamMuxer
.
Ack, cool.
pkg/kv/kvserver/rangefeed/processor_test.go
line 1812 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Let's replace all of these
30 * time.Second
withtestutils.DefaultSucceedsSoonDuration
.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delayed turnaround. I was looking into a support issue.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 6 files at r4, 3 of 3 files at r5, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @wenyihu6)
pkg/server/node.go
line 1961 at r5 (raw file):
// StreamMuxer to detach the stream. The StreamMuxer is then responsible for // handling the actual disconnection and additional cleanup. Note that Caller // should not rely on immediate disconnection as cleanup take place async.
"takes"
Previously, to reduce O(ranges) goroutines and avoid blocking on rangefeed completion, we introduced the future package to manage rangefeed completion by invoking a future callback when an error occurred. The future package makes it difficult to justify which goroutine the callback runs on and which locks are being held. This patch introduces a new approach, replacing the future usage. Each registration now uses the embedded StreamMuxer.DisconnectStreamWithError method to signal rangefeed completion. StreamMuxer will manage the shutdown logic and additional stream cleanup. Part of: cockroachdb#126561 Release note: none
TFTR! bors r=nvanbenschoten |
Previously, when `IntentScannerConstructor` returned an error, only the first rangefeed registration on the replica was disconnected as part of the error handling. This worked because `IntentScannerConstructor` was called before the first `Register` function finished to avoid missing events. But this feels like a brittle assumption. Additionally, we should not continue running `initialSan.Run` when the constructor returns an error as this could lead to nil pointer panics. Comments also incorrectly stated that the resolved timestamp is considered immediately initialized if the provided iterator is nil. We actually check for `IntentScannerConstructor`, not the iterator itself. Passing a nil `IntentScannerConstructor` should only be possible in tests, and it shouldn't be possible for `IntentScannerConstructor` to return a nil `IntentScanner` without an error. This patch fixes these issues by updating the comments and stopping the entire processor when the constructor returns an error. Related: cockroachdb#126490 Release note: none Epic: none
Previously, when `IntentScannerConstructor` returned an error, only the first rangefeed registration on the replica was disconnected as part of the error handling. This worked because `IntentScannerConstructor` was called before the first `Register` function finished to avoid missing events. But this feels like a brittle assumption. Additionally, we should not continue running `initialSan.Run` when the constructor returns an error as this could lead to nil pointer panics. Comments also incorrectly stated that the resolved timestamp is considered immediately initialized if the provided iterator is nil. We actually check for `IntentScannerConstructor`, not the iterator itself. Passing a nil `IntentScannerConstructor` should only be possible in tests, and it shouldn't be possible for `IntentScannerConstructor` to return a nil `IntentScanner` without an error. This patch fixes these issues by updating the comments and stopping the entire processor when the constructor returns an error. Related: cockroachdb#126490 Release note: none Epic: none
Previously, to reduce O(ranges) goroutines and avoid blocking on rangefeed
completion, we introduced the future package to manage rangefeed completion by
invoking a future callback when an error occurred. The future package makes it
difficult to justify which goroutine the callback runs on and which locks are
being held. This patch introduces a new approach, replacing the future usage.
Each registration now uses the embedded StreamMuxer.DisconnectStreamWithError
method to signal rangefeed completion. StreamMuxer will manage the shutdown
logic and additional stream cleanup.
Part of: #126561
Release note: none