Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: add RevertSpans and RevertSpansFanout #97845

Merged
merged 4 commits into from
Mar 19, 2023

Conversation

stevendanna
Copy link
Collaborator

@stevendanna stevendanna commented Mar 1, 2023

This PR adds RevertSpans and RevertSpansFanout.

RevertSpans de-duplicates some code between import rollback and streaming cutover.

Along the way, it fixes a small bug that existed in RevertTables: Only a single RevertRangeRequest is permitted in a batch since RevertRangeRequest has the isAlone flag set. As a result, a future caller of RevertTables would have encountered a fatal error from KV.

RevertSpansFanout uses DistSQL's PartitionSpans to manually fan out multiple RevertRange requests. Since all users of revert range request currently set a limit on the number of keys touched, dist sender doesn't fanout such request.

Epic: none

Release note: None

@stevendanna stevendanna requested review from a team as code owners March 1, 2023 14:24
@stevendanna stevendanna requested review from rhu713 and rytaft and removed request for a team March 1, 2023 14:24
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@stevendanna stevendanna requested a review from msbutler March 1, 2023 14:26
@stevendanna stevendanna force-pushed the parallel-revert-range branch from 6af38f9 to 40db612 Compare March 1, 2023 14:57
Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 6 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @msbutler, @rhu713, @rytaft, and @stevendanna)


pkg/sql/revert.go line 49 at r1 (raw file):

var maxRevertSpanNumWorkers = settings.RegisterIntSetting(
	settings.TenantWritable,
	"max_revert_span_parallelism",

nit: we usually have several parts in a cluster setting name, delimited by periods, with the first part being about the layer, so perhaps name it something like sql.revert.max_span_parallelism.


pkg/sql/revert.go line 51 at r1 (raw file):

	"max_revert_span_parallelism",
	"the maximum number of workers used to issue RevertRange request",
	8)

Do we want to add NonNegativeInt or PositiveInt validation?


pkg/sql/revert.go line 140 at r1 (raw file):

		workerPartitions = spanPartitions
	} else {
		workerPartitions = make([]SpanPartition, maxWorkerCount)

If we allow the setting to be set to 0, then we need to handle zero case differently.


pkg/sql/revert.go line 150 at r1 (raw file):

	progressWorker := func(ctx context.Context) error {
		for sp := range completedSpanCh {
			if onCompletedCallback != nil {

nit: perhaps take the if out of the loop and perhaps just short-circuit if the calback is nil.

@rytaft rytaft removed their request for review March 1, 2023 22:12
@stevendanna stevendanna force-pushed the parallel-revert-range branch from 40db612 to ff6c3a7 Compare March 2, 2023 11:57
Copy link
Collaborator Author

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @msbutler, @rhu713, and @yuzefovich)


pkg/sql/revert.go line 51 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Do we want to add NonNegativeInt or PositiveInt validation?

Thanks! PositiveInt should do.


pkg/sql/revert.go line 140 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

If we allow the setting to be set to 0, then we need to handle zero case differently.

Added the PositiveInt validation so the zero case should be OK now.


pkg/sql/revert.go line 150 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: perhaps take the if out of the loop and perhaps just short-circuit if the calback is nil.

While addressing this comment, I noticed that I wasn't really handling the error return from this callback correctly. So, perhaps I've gone a bit overboard, but I've done a few things:

  1. short-circuit the 1 worker case to avoid all planning and just call RevertSpans
  2. short-circuit the case of a nil onCompletedCallback to avoid the extra layer of goroutines altogether
  3. Added an errCh that workers get the last onCompletedCallback error back from.

Copy link
Collaborator

@msbutler msbutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is awesome! eager to see benchmarks!

// the cutover process.
type cutoverProgressTracker struct {
minProgressUpdateInterval time.Duration
progMetric *metric.Gauge
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rename this rangesToDoMetric or something a bit more descriptive? It seems hardcoded to update the number of remaining ranges.

span = resp.ResumeSpan

if resp.ResumeSpan != nil {
if !resp.ResumeSpan.Valid() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: perhaps we add some logging around resume key spans and the number of attempts to revert the original span?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a VEventf above that can be added on each revert range call. Note that a resume span is absolutely expected.

pkg/sql/revert_test.go Show resolved Hide resolved
pkg/sql/revert_test.go Show resolved Hide resolved
pkg/sql/revert_test.go Show resolved Hide resolved
pkg/sql/revert.go Show resolved Hide resolved
settings.TenantWritable,
"sql.revert.max_span_parallelism",
"the maximum number of workers used to issue RevertRange request",
8,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this scale by the number of nodes in the cluster?

@stevendanna stevendanna force-pushed the parallel-revert-range branch from ff6c3a7 to fbc3106 Compare March 3, 2023 17:54
@stevendanna stevendanna force-pushed the parallel-revert-range branch from fbc3106 to 2ade877 Compare March 17, 2023 22:46
Under the race detector, kvclient uses a transport that disallows
mutations of of a request send to the transport. This change ensures
we create a new request for each resume span rather than re-using the
old request.

Epic: none

Release note: None
This PR adds RevertSpans and RevertSpansFanout.

RevertSpans de-duplicates some code between import rollback and
streaming cutover.

Along the way, it fixes a small bug that existed in RevertTables: Only
a single RevertRangeRequest is permitted in a batch since
RevertRangeRequest has the isAlone flag set. As a result, a future
caller of RevertTables would have encountered a fatal error from KV.

RevertSpansFanout uses DistSQL's PartitionSpans to manually fan out
multiple RevertRange requests. Since all users of revert range request
currently set a limit on the number of keys touched, dist sender
doesn't fanout such request.

Release note: None
@stevendanna stevendanna force-pushed the parallel-revert-range branch from 2ade877 to 5adc4f2 Compare March 18, 2023 17:38
@stevendanna stevendanna requested a review from a team as a code owner March 18, 2023 17:38
@stevendanna stevendanna requested review from a team, herkolategan and srosenberg and removed request for a team March 18, 2023 17:38
@@ -7568,12 +7568,6 @@ expires until the statement bundle is collected`,
filter = kvpb.MVCCFilter_All
}

req := &kvpb.ExportRequest{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this commit merged elsewhere, correct?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh did it? Let me go check, perhaps I hadn't git-pulled before working on this.

pkg/sql/revert.go Show resolved Hide resolved
pkg/sql/revert.go Outdated Show resolved Hide resolved
pkg/sql/revert_test.go Show resolved Hide resolved
@stevendanna
Copy link
Collaborator Author

bors r=msbutler

@craig
Copy link
Contributor

craig bot commented Mar 19, 2023

Build failed:

@stevendanna
Copy link
Collaborator Author

bors retry

@craig
Copy link
Contributor

craig bot commented Mar 19, 2023

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Mar 19, 2023

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Mar 19, 2023

Build succeeded:

@craig craig bot merged commit 9aa4f34 into cockroachdb:master Mar 19, 2023
msbutler added a commit to msbutler/cockroach that referenced this pull request Mar 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants