Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rangefeed: Ensure Close is safe even if Start failed #110942

Merged
merged 1 commit into from
Sep 21, 2023

Conversation

miretskiy
Copy link
Contributor

Rangefeed Start may fail if the attempt to start async task (the rangefeed) fails due to server shutdown. If that happens, Close call would block indefinitely, waiting for the rangefeed tasks that was never started, to terminate.

Fixes #110350

Release note: None

@miretskiy miretskiy requested a review from a team September 19, 2023 23:23
@blathers-crl
Copy link

blathers-crl bot commented Sep 19, 2023

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@miretskiy
Copy link
Contributor Author

The change is verified via stress test on the unit test linked in the issue.

@@ -228,6 +224,7 @@ func (f *RangeFeed) Start(ctx context.Context, spans []roachpb.Span) error {
defer pprof.SetGoroutineLabels(ctx)
ctx = pprof.WithLabels(ctx, pprof.Labels(append(f.extraPProfLabels, "rangefeed", f.name)...))
pprof.SetGoroutineLabels(ctx)
f.running.Add(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is still a small race here where the caller calls Close() before the goroutine is spawned (or the counter is incremented). This is probably fine, since we'll reap the goroutine shortly anyway due to the cancelled context, but we could avoid it by incrementing this on the main goroutine and decrementing it both in the RunAsyncTask error case and when the goroutine terminates.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we may have to do this, because the API contract says that it's invalid to call Add() on a zero-valued counter after calling Wait().

https://pkg.go.dev/sync#WaitGroup.Add

@erikgrinaker erikgrinaker self-requested a review September 20, 2023 08:46
Rangefeed Start may fail if the attempt to start async
task (the rangefeed) fails due to server shutdown.
If that happens, Close call would block indefinitely,
waiting for the rangefeed tasks that was never started,
to terminate.

Fixes cockroachdb#110350

Release note: None
Copy link
Contributor Author

@miretskiy miretskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911 and @erikgrinaker)


pkg/kv/kvclient/rangefeed/rangefeed.go line 227 at r1 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

Actually, we may have to do this, because the API contract says that it's invalid to call Add() on a zero-valued counter after calling Wait().

https://pkg.go.dev/sync#WaitGroup.Add

Yeah; on the one hand, you shouldn't call Close if Start errored out; on the other hand, better safe than sorry.

Copy link
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911 and @miretskiy)


pkg/kv/kvclient/rangefeed/rangefeed.go line 227 at r1 (raw file):

Previously, miretskiy (Yevgeniy Miretskiy) wrote…

Yeah; on the one hand, you shouldn't call Close if Start errored out; on the other hand, better safe than sorry.

This can race even if Start() succeeds: the caller may call Close() before the goroutine gets scheduled.

@miretskiy
Copy link
Contributor Author

Bors r+

@craig
Copy link
Contributor

craig bot commented Sep 21, 2023

Build succeeded:

@craig craig bot merged commit f3497de into cockroachdb:master Sep 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

c2c: slow quiesce failure TestDataDriven/initial_scan_spanconfigs : rangefeed client hangs on close
3 participants