-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rangefeed: create catchup iterators eagerly #111045
Conversation
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually quite nice, small and focused. I just have few nits, but @erikgrinaker should have final approval.
iter, err := rangefeed.NewCatchUpIterator(r.store.TODOEngine(), rSpan.AsRawSpanWithNoLocals(), | ||
args.Timestamp, iterSemRelease, pacer) | ||
if err != nil { | ||
r.raftMu.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to keep AssertHeld?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want. The sole reason for that was because we passed closure to method that executes it within its worker. But all that work must be done while we are still holding the lock (in this method above). Now everything is done within the method body under the lock, so no point in having the assert.
// out of container, it is safe to close it regardless of startup success or | ||
// failure. | ||
type CatchupIteratorContainer struct { | ||
syncutil.Mutex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not sure who we are trying to serialize with.... presumably registration/filter stuff running and racing with some other error encountered elsewhere...
(totally optional/): maybe type CatchupIteratorContainer atomic.Pointer[CatchUpIterator.
Get method is simply swap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if context is cancelled while processor is handling registration request. So I'll be uneasy mutating it from two places. I also think that linter will eventually flake on data race even if that's safe to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love the idea of atomic pointer + swap. I'll change that.
|
||
// Get moves iterator out of container. Calling Close on container won't close | ||
// the iterator after that. Safe to call on empty container. | ||
func (c *CatchupIteratorContainer) Get() (iter *CatchUpIterator) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Maybe rename to Detach() or Move() or Release()?
136c17d
to
d0bad64
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me address the nits real quick!
// out of container, it is safe to close it regardless of startup success or | ||
// failure. | ||
type CatchupIteratorContainer struct { | ||
syncutil.Mutex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if context is cancelled while processor is handling registration request. So I'll be uneasy mutating it from two places. I also think that linter will eventually flake on data race even if that's safe to do.
iter, err := rangefeed.NewCatchUpIterator(r.store.TODOEngine(), rSpan.AsRawSpanWithNoLocals(), | ||
args.Timestamp, iterSemRelease, pacer) | ||
if err != nil { | ||
r.raftMu.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want. The sole reason for that was because we passed closure to method that executes it within its worker. But all that work must be done while we are still holding the lock (in this method above). Now everything is done within the method body under the lock, so no point in having the assert.
// out of container, it is safe to close it regardless of startup success or | ||
// failure. | ||
type CatchupIteratorContainer struct { | ||
syncutil.Mutex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love the idea of atomic pointer + swap. I'll change that.
d0bad64
to
483b6aa
Compare
420ad31
to
a5d7730
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911 and @erikgrinaker)
do we have an epic for this?
…On Thu, Sep 21, 2023 at 2:28 PM Yevgeniy Miretskiy ***@***.***> wrote:
***@***.**** approved this pull request.
*Reviewable <https://reviewable.io/reviews/cockroachdb/cockroach/111045>*
status: [image: ] complete! 0 of 0 LGTMs obtained (waiting on
@aliher1911 <https://github.com/aliher1911> and @erikgrinaker
<https://github.com/erikgrinaker>)
—
Reply to this email directly, view it on GitHub
<#111045 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANA4FVCPREW4QLE2O2T2N7TX3SBOTANCNFSM6AAAAAA5BUXQVQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
f0cfab1
to
3257904
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you look at the original motivation for plumbing through the constructor in #69613? I think this is fine, because all the lifecycle events appear to synchronize with raftMu
, but I wonder if we're missing something.
3257904
to
cf35678
Compare
The original PR you mentioned tried to move iterator creation from registration work loop which is guaranteed to run after current raft lock was released and some data might get through. But it was passing a function instead of constructing iterator upfront. Since all possible locks are held while we are waiting for registration I don't see why we can't create iterator upfront other than trying to avoid its creation on failure path. Container was trying to do the same, but just handling it on error paths might be better. The only difference that it was handled implicitly by defer and someone didn't start using the iterator. Moving contruction to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the simplification.
Moving contruction to registerWithRangefeedRaftMuLocked() seems ugly as we will have to pull limiter callbacks down the stack because they are attached to the interator. So passing iterator down and letting registerWithRangefeedRaftMuLocked() handle closing seem not uglier than passing callback for the sake of iterator creation to me.
Yeah, this is better, thanks.
cf35678
to
b50a6f0
Compare
Previously, catchup iterators were created in the main rangefeed processor work loop. This is negatively affecting scheduler based processors as this operation could be slow. This commit makes iterator creation eager, simplifying error handling and making rangefeed times delays lower. Epic: none Release note: None
b50a6f0
to
54b8a73
Compare
bors r=erikgrinaker |
Build succeeded: |
Previously, catchup iterators were created in the main rangefeed processor work loop. This is negatively affecting scheduler based processors as this operation could be slow.
This commit makes iterator creation eager, simplifying error handling and making rangefeed times delays lower.
Epic: CRDB-26372
Fixes: #111060
Fixes: #111040
Release note: None