Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

singleflight: spruce up singleflight #91038

Merged
merged 5 commits into from
Dec 15, 2022

Conversation

andreimatei
Copy link
Contributor

@andreimatei andreimatei commented Nov 1, 2022

This patch improves singleflight with a couple of features that were
done manually by most callers. The singleflight now optionally makes a flight's
context not respond to the cancelation of the caller's ctx, the flights
now run in stopper tasks, and stopper quiescence cancels the flights'
contexts. The callers were doing some of these things, but
inconsistently.

Also, the flights now get a tracing span and callers that join an
existing span get log messages demarcating the wait time. We've wanted
this multiple times when debugging, for example in the context of table
descriptor lease acquisitions and range descriptor resolving. To
accomodate for this, the contract of singleflight.DoChan() gets a bit
more complicated: the singleflight captures the tracing spans of all
non-leaders, and these callers have to call a new ReaderClosed() method
on the DoChan() result if they're not going to wait for the flight
result.

Fixes #90854

Release note: None
Epic: None

@andreimatei andreimatei requested review from a team as code owners November 1, 2022 00:15
@andreimatei andreimatei requested a review from a team November 1, 2022 00:15
@andreimatei andreimatei changed the title kvserver: fix quiescing of closedts nudger singleflight: spruce up singleflight Nov 1, 2022
@andreimatei andreimatei force-pushed the small.trace-singleflight branch from 84b59f2 to 446ffb8 Compare November 1, 2022 00:18
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r1, 7 of 23 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @nvanbenschoten)


pkg/util/syncutil/singleflight/singleflight.go line 87 at r2 (raw file):

}

// Do executes and returns the results of the given function, making

One thing that's not clear is what will happen if ctx is canceled while the function is running, in particular, while it's running on behalf of another caller. I know you're not changing the behavior, but, by adding a context, I think you're making it more confusing.

Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @nvanbenschoten)


pkg/util/syncutil/singleflight/singleflight.go line 87 at r2 (raw file):

Previously, ajwerner wrote…

One thing that's not clear is what will happen if ctx is canceled while the function is running, in particular, while it's running on behalf of another caller. I know you're not changing the behavior, but, by adding a context, I think you're making it more confusing.

See now; I've added a note about it.
I would argue that I'm hardly making things worse - virtually every closure passed to these functions was already capturing a ctx. And even if I am making it marginally worse, I think it's worth it for the new logging.
Btw, this function has a single caller; everybody else uses DoChan. If you want, we can delete it. We could also have it take DoOpts, if you think.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r1, 22 of 23 files at r2, 1 of 1 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @andreimatei)


pkg/kv/kvserver/replica_rangefeed.go line 712 at r3 (raw file):

				select {
				case m.RangeFeedSlowClosedTimestampNudgeSem <- struct{}{}:
				case <-r.store.stopper.ShouldQuiesce():

Should we use the ctx again?


pkg/kv/kvserver/protectedts/ptcache/cache.go line 215 at r3 (raw file):

			lastReset = timeutil.Now()
		case <-c.stopper.ShouldQuiesce():
			if doneCh != nil {

Should we defer this in a closure up above where we declare var doneCh singleflight.Future?


pkg/util/syncutil/singleflight/singleflight.go line 49 at r3 (raw file):

}

type waiter struct {

Is this struct used? Was it replaced by futureImpl?


pkg/util/syncutil/singleflight/singleflight.go line 143 at r3 (raw file):

// Future represents the result of the DoChan() call.
type Future interface {
	// ReaderClose indicates that the reader is no longer waiting on this Future.

nit: Why ReaderClose instead of just Close? Was Close ambiguous from the external interface of this package? Would Abandon be?


pkg/util/syncutil/singleflight/singleflight.go line 196 at r3 (raw file):

	f.mu.Lock()
	if f.mu.sp != nil {
		f.mu.sp.Recordf("finished waiting on singleflight %s:%s. err: %v", groupName, key, r.Err)

Span.Recordf is different than the call to log.Eventf above. Will that be observable? Will we include the filename and line number in this event?


pkg/util/syncutil/singleflight/singleflight.go line 247 at r3 (raw file):

		return waiter, false
	}
	waiter := newFutureImpl(nil /* sp - the leader does not keep track of the span because it doesn't need to log*/)

nit: s/log*/log */


pkg/util/syncutil/singleflight/singleflight.go line 275 at r3 (raw file):

		// Copy the log tags and the span.
		ctx = logtags.AddTags(context.Background(), logtags.FromContext(ctx))
		ctx = tracing.ContextWithSpan(ctx, sp)

What is the effect of this? Do the trace events make it to the caller's span? What if the caller's context is canceled but the cancelation is not inherited?


pkg/util/syncutil/singleflight/singleflight.go line 283 at r3 (raw file):

	}

	if opts.Stop != nil {

nit: two if opts.Stop != nil branches in a row. Collapse them?


pkg/util/syncutil/singleflight/singleflight.go line 298 at r3 (raw file):

	res := Result{c.val, c.err, c.dups > 0}
	for _, waiter := range c.waiters {
		waiter.writerClose(res, g.opName, key)

Did you consider an approach that propagated the tracing events from the leader's span to the followers' so that everyone who waited on the singleflight ended up with it in their trace? Does that run into complications with singleflight operations that started before one of its followers?

@andreimatei andreimatei force-pushed the small.trace-singleflight branch from cf42b22 to 305a38c Compare November 3, 2022 21:52
@andreimatei andreimatei requested a review from a team November 3, 2022 21:52
@andreimatei andreimatei requested a review from a team as a code owner November 3, 2022 21:52
Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @nvanbenschoten)


pkg/kv/kvserver/replica_rangefeed.go line 712 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Should we use the ctx again?

yes, done


pkg/kv/kvserver/protectedts/ptcache/cache.go line 215 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Should we defer this in a closure up above where we declare var doneCh singleflight.Future?

done; please see now


pkg/util/syncutil/singleflight/singleflight.go line 49 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Is this struct used? Was it replaced by futureImpl?

yes, removed, thanks


pkg/util/syncutil/singleflight/singleflight.go line 143 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: Why ReaderClose instead of just Close? Was Close ambiguous from the external interface of this package? Would Abandon be?

This all changed.


pkg/util/syncutil/singleflight/singleflight.go line 196 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Span.Recordf is different than the call to log.Eventf above. Will that be observable? Will we include the filename and line number in this event?

this all changed


pkg/util/syncutil/singleflight/singleflight.go line 247 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: s/log*/log */

done


pkg/util/syncutil/singleflight/singleflight.go line 275 at r3 (raw file):

Do the trace events make it to the caller's span?

The events make it to a child span opened at the beginning of this function.

What if the caller's context is canceled but the cancelation is not inherited?

That's why we're creating a child span - to allow it to outlive the caller's span. Added a note about it.


pkg/util/syncutil/singleflight/singleflight.go line 283 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: two if opts.Stop != nil branches in a row. Collapse them?

done


pkg/util/syncutil/singleflight/singleflight.go line 298 at r3 (raw file):

Did you consider...

I have. But I can consider it again.
I considered it again, and implemented.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 3 files at r5, 27 of 27 files at r7, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @andreimatei)


pkg/util/syncutil/singleflight/singleflight.go line 58 at r7 (raw file):

	/////////////////////////////////////////////////////////////////////////////
	// These fields are read and written with the singleflight mutex held before
	// the channel is closed, and are read but not written after the chanel is

s/chanel/channel/


pkg/util/syncutil/singleflight/singleflight.go line 208 at r7 (raw file):

// then the recording of the flight will be ingested into ctx even if this
// caller was not the flight's leader.
func (f Future) Result(ctx context.Context) Result {

Consider renaming to use a verb like WaitForResult or ResolveResult. I did not find it clear at the callers of this method that this was blocking and would return early on context cancellation.


pkg/util/syncutil/singleflight/singleflight.go line 227 at r7 (raw file):

				log.Eventf(ctx, "waiting for singleflight interrupted: %v", ctx.Err())
			}
			return Result{

Did you consider having Result return an error, separate from the result, in cases of context cancellation? Is it useful to callers to distinguish between cases where the signleflight call failed and cases where its own context was canceled and so it did not wait for the future to be resolved?


pkg/util/tracing/span.go line 370 at r7 (raw file):

	}
	var rec Trace
	if recType != tracingpb.RecordingOff {

Why do we need this condition here but not above in GetTraceRecording?


pkg/util/tracing/span_inner.go line 96 at r7 (raw file):

	recType tracingpb.RecordingType, finishing bool,
) tracingpb.Recording {
	if s.isNoop() {

Should this call GetTraceRecording() to avoid redundancy?

@andreimatei andreimatei force-pushed the small.trace-singleflight branch from 305a38c to c9e87d8 Compare November 23, 2022 17:38
Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @nvanbenschoten)


pkg/util/syncutil/singleflight/singleflight.go line 58 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

s/chanel/channel/

done


pkg/util/syncutil/singleflight/singleflight.go line 208 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Consider renaming to use a verb like WaitForResult or ResolveResult. I did not find it clear at the callers of this method that this was blocking and would return early on context cancellation.

renamed to WaitForResult


pkg/util/syncutil/singleflight/singleflight.go line 227 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Did you consider having Result return an error, separate from the result, in cases of context cancellation? Is it useful to callers to distinguish between cases where the signleflight call failed and cases where its own context was canceled and so it did not wait for the future to be resolved?

That doesn't appear to be useful to any current callers. I can't very clearly imagine one needing to distinguish... But I can do it if you think so.
FWIW, the error message that a leader or follower gets on ctx cancelation says something like "interrupted during singleflight".


pkg/util/tracing/span.go line 370 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Why do we need this condition here but not above in GetTraceRecording?

it's not needed here either; removed


pkg/util/tracing/span_inner.go line 96 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Should this call GetTraceRecording() to avoid redundancy?

done

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

@ajwerner do you want to give this another pass as well?

Reviewed 2 of 3 files at r9, 28 of 28 files at r11, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner and @andreimatei)


pkg/sql/catalog/lease/lease.go line 537 at r11 (raw file):

	if result.Err != nil {
		return false, result.Err

nit: stray line


pkg/util/tracing/crdbspan.go line 251 at r11 (raw file):

}

func (t Trace) String() string {

Trace looks huge. Are you sure we want this defined on a value receiver? We also then call Flatten, which has a ptr receiver.

@andreimatei andreimatei force-pushed the small.trace-singleflight branch from c9e87d8 to cc35de9 Compare December 1, 2022 23:04
Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: thanks for doing it

Reviewed 1 of 3 files at r9, 2 of 28 files at r11.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @andreimatei and @nvanbenschoten)

@andreimatei andreimatei force-pushed the small.trace-singleflight branch from cc35de9 to 5915f5b Compare December 1, 2022 23:22
Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @ajwerner and @nvanbenschoten)


pkg/sql/catalog/lease/lease.go line 537 at r11 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: stray line

removed


pkg/util/tracing/crdbspan.go line 251 at r11 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Trace looks huge. Are you sure we want this defined on a value receiver? We also then call Flatten, which has a ptr receiver.

reverted

@andreimatei andreimatei force-pushed the small.trace-singleflight branch 5 times, most recently from c2c8554 to fe6dd36 Compare December 7, 2022 16:21
@andreimatei andreimatei force-pushed the small.trace-singleflight branch from fe6dd36 to 477e714 Compare December 7, 2022 18:43
@andreimatei andreimatei requested a review from a team December 7, 2022 18:43
@andreimatei andreimatei force-pushed the small.trace-singleflight branch 2 times, most recently from 34a52d3 to f543dfb Compare December 8, 2022 20:48
Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ajwerner check out the last commit, pls

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @ajwerner and @nvanbenschoten)

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last commit LGTM

Clarify what part of the span.getStructuredRecording() result is shared
with the span.

Release note: None
AddStructuredRecord() did not have a good reason to take a pointer (the
event struct is small enough). Taking a ptr was confusing - it suggested
sharing where the function was actually making a copy internally.

Release note: None
The nudger tasks were looking at context cancelation, but their context
is never canceled. This patch makes them look at stopper quiescence,
which is probaly what was always intended.

Release note: None
This patch improves singleflight with a couple of features that were
done manually by most callers. The singleflight now optionally makes a flight's
context not respond to the cancelation of the caller's ctx, the flights
now run in stopper tasks, and stopper quiescence cancels the flights'
contexts. The callers were doing some of these things, but
inconsistently.

Also, the flights now get a tracing span and callers that join an
existing span get a copy of the recording of the flight leader. We've
wanted this multiple times when debugging, for example in the context of
table descriptor lease acquisitions and range descriptor resolving. The
interface for getting the results out of DoChan() had to change a bit as
a result.

Release note: None
Since the previous commit, descriptor leasing is correctly included in
statement traces. As such, a bunch of rtt tests changed their counters.
For these tests, leasing is not actually interesting; it's more
representative to benchmark these tests against nodes that already have
the respective leases. This patch does that by ensuring that various
objects are leased before the test.
For the has_<foo>_privilege() tests, the rtt number dropped to 0. These
tests had flavors that were working across multiple objects. Since
we're now asserting 0 requests for one object, testing across multiple
objects is no longer interesting. I've deleted these tests.

Release note: None
@andreimatei andreimatei force-pushed the small.trace-singleflight branch from f543dfb to 68cd03a Compare December 14, 2022 22:27
Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @ajwerner and @nvanbenschoten)

@craig
Copy link
Contributor

craig bot commented Dec 15, 2022

Build succeeded:

@craig craig bot merged commit 9c8ac7b into cockroachdb:master Dec 15, 2022
@andreimatei andreimatei deleted the small.trace-singleflight branch December 17, 2022 21:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql/catalog/lease: log lease acquisition in traces
4 participants