Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ttl: improve row-level TTL performance using DistSQL #84728

Merged
merged 1 commit into from
Aug 8, 2022
Merged

ttl: improve row-level TTL performance using DistSQL #84728

merged 1 commit into from
Aug 8, 2022

Conversation

ecwall
Copy link
Contributor

@ecwall ecwall commented Jul 20, 2022

fixes #76914

Release note (performance improvement): The row-level TTL job has been modified
to distribute work using DistSQL. This usually results in the leaseholder nodes
managing deleting of the spans they own.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@ecwall ecwall changed the title sql: ttl dist sql ttl: improve row-level TTL performance using DistSQL Jul 20, 2022
@ecwall ecwall marked this pull request as ready for review July 28, 2022 13:08
@ecwall ecwall requested a review from a team July 28, 2022 13:08
@ecwall ecwall requested a review from a team as a code owner July 28, 2022 13:08
@ecwall ecwall requested review from otan, rafiss and yuzefovich and removed request for a team July 28, 2022 13:08
Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! I have some nits as well as a suggestion - it'd be good to check that the physical planning of DistSQL processors is as expected. You could do so in a "execbuilder" test - e.g. take a look at pkg/sql/opt/exec/execbuilder/distsql_join for inspiration (or any other distsql_* file in that folder).

Reviewed 5 of 11 files at r2, 10 of 10 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ecwall, @otan, and @rafiss)


pkg/sql/distsql_running.go line 752 at r3 (raw file):

}

// NewErrorOnlyMetadataCallbackWriter creates a new MetadataCallbackWriter that supports receiving an error.

Given the name of the function, I'd expect that this would take the callback that should be called whenever a metadata is received.

nit: wrap at 80 characters.


pkg/sql/distsql_running.go line 755 at r3 (raw file):

func NewErrorOnlyMetadataCallbackWriter() *MetadataCallbackWriter {
	return NewMetadataCallbackWriter(
		NewRowResultWriter(nil),

nit: using errOnlyResultWriter here seems cleaner.


pkg/sql/exec_util.go line 1524 at r3 (raw file):

	// ReturnStatsError causes stats errors to be returned instead of logged as warnings.
	ReturnStatsError bool
	// ChangeTableDescriptorVersionDuringDelete flag to change the table descriptor

nit: a verb seems to be missing in the comment.


pkg/sql/execinfrapb/flow_diagram.go line 585 at r3 (raw file):

// summary implements the diagramCellType interface.
func (s *TTLSpec) summary() (string, []string) {
	return "TTL", []string{}

nit: this determines what is shown on DistSQL diagrams - do we not want to include some details? Up to you.


pkg/sql/execinfrapb/processors_ttl.proto line 1 at r3 (raw file):

// Copyright 2019 The Cockroach Authors.

nit: s/2019/2022/.


pkg/sql/execinfrapb/processors_ttl.proto line 27 at r3 (raw file):

import "jobs/jobspb/jobs.proto";

message TTLSpec {

nit: it'd be good to add some comments to the fields in this proto.


pkg/sql/ttl/ttljob/ttljob.go line 251 at r3 (raw file):

		physicalPlan := planCtx.NewPhysicalPlan()
		// All of the progress information is sent through the metadata stream, so we

Do we want to expose that progress information somehow?

Update: I see that each processor updates the job progress, it'd probably be worth to mention that in the comment here.


pkg/sql/ttl/ttljob/ttljob_processor.go line 50 at r3 (raw file):

func (t *ttlProcessor) Start(ctx context.Context) {
	ctx = t.StartInternal(ctx, "t")

nit: probably s/"t"/"ttl"/ or something like that.


pkg/sql/ttl/ttljob/ttljob_processor.go line 271 at r3 (raw file):

	cutoff := details.Cutoff
	selectBuilder := makeSelectQueryBuilder(
		details.TableID,

nit: use tableID.


pkg/sql/ttl/ttljob/ttljob_query_builder.go line 74 at r3 (raw file):

	cachedArgs := make([]interface{}, 0, 1+len(pkColumns)*2)
	cachedArgs = append(cachedArgs, cutoff)
	endPK := rangeToProcess.endPK

nit: it'd be good to be consistent - either define startPK as well or don't define endPK.

@ecwall ecwall requested a review from a team as a code owner August 1, 2022 15:12
Copy link
Contributor Author

@ecwall ecwall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @otan, @rafiss, and @yuzefovich)


pkg/sql/distsql_running.go line 752 at r3 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Given the name of the function, I'd expect that this would take the callback that should be called whenever a metadata is received.

nit: wrap at 80 characters.

I renamed it to NewMetadataOnlyMetadataCallbackWriter since it should never receive row results.

Wrapped line at 80 chars.

Code quote:

NewErrorOnlyMetadataCallbackWriter

pkg/sql/execinfrapb/flow_diagram.go line 585 at r3 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: this determines what is shown on DistSQL diagrams - do we not want to include some details? Up to you.

Added.


pkg/sql/ttl/ttljob/ttljob.go line 251 at r3 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Do we want to expose that progress information somehow?

Update: I see that each processor updates the job progress, it'd probably be worth to mention that in the comment here.

Done.

Code quote:

physicalPlan.AddNoInputStage(

pkg/sql/ttl/ttljob/ttljob_query_builder.go line 74 at r3 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

nit: it'd be good to be consistent - either define startPK as well or don't define endPK.

endPK is used multiple times so I defined a variable for it, but I can define one for startPK too for consistency even though it is only used once.

Code quote:

endPK := rangeToProcess.endPK

@ecwall ecwall requested a review from yuzefovich August 2, 2022 18:45
Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this look good! just small nits. i focused my review only on the ttljob parts, so anything changing the distsql framework should get signoff from yahor/sql-queries

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ecwall, @otan, and @yuzefovich)


pkg/sql/ttl/ttljob/ttljob_processor.go line 447 at r5 (raw file):

		processorID,
		output,
		nil, /* memMonitor */

should the job do memory monitoring? if not, it would be useful for a comment to explain why


pkg/sql/ttl/ttljob/ttljob_test.go line 558 at r5 (raw file):

					AOSTDuration:                  &zeroDuration,
					ReturnStatsError:              true,
					RequireMultipleSpanPartitions: numNodes > 0,

based on the if numNodes == 0 condition above, won't this always be true?

Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: modulo Rafi's comments.

Reviewed 5 of 10 files at r4, 5 of 5 files at r5, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ecwall and @otan)


pkg/sql/execinfrapb/flow_diagram.go line 587 at r5 (raw file):

	details := s.RowLevelTTLDetails
	spans := s.Spans
	spanStrings := make([]string, len(spans))

This is currently ignored and not included into the summary - is this intentional? Also if you do include it, then it might makes sense to print out only a single span - take a look at TableReaderSpec.summary for an example (if we print out hundreds of spans, then the diagram will be unreadable).


pkg/sql/execinfrapb/processors_ttl.proto line 29 at r5 (raw file):

message TTLSpec {

  // JobID of the job that ran the ttlProcessor

nit: missing period.


pkg/sql/ttl/ttljob/ttljob_query_builder.go line 74 at r3 (raw file):

Previously, ecwall (Evan Wall) wrote…

endPK is used multiple times so I defined a variable for it, but I can define one for startPK too for consistency even though it is only used once.

Oh, I didn't see the multiple uses - up to you then.

Copy link
Contributor Author

@ecwall ecwall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @otan, @rafiss, and @yuzefovich)


pkg/sql/execinfrapb/flow_diagram.go line 587 at r5 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

This is currently ignored and not included into the summary - is this intentional? Also if you do include it, then it might makes sense to print out only a single span - take a look at TableReaderSpec.summary for an example (if we print out hundreds of spans, then the diagram will be unreadable).

Removed


pkg/sql/ttl/ttljob/ttljob_processor.go line 447 at r5 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

should the job do memory monitoring? if not, it would be useful for a comment to explain why

I'm not sure if it should or not. I copied it from newBulkRowWriterProcessor where it was nil.


pkg/sql/ttl/ttljob/ttljob_test.go line 558 at r5 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

based on the if numNodes == 0 condition above, won't this always be true?

changed to tc.numNodes

Copy link
Contributor Author

@ecwall ecwall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @otan, @rafiss, and @yuzefovich)


pkg/sql/ttl/ttljob/ttljob_query_builder.go line 74 at r3 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Oh, I didn't see the multiple uses - up to you then.

I'll leave it in then.

@ecwall ecwall requested a review from rafiss August 3, 2022 00:47
fixes #76914

Release note (performance improvement): The row-level TTL job has been modified
to distribute work using DistSQL. This usually results in the leaseholder nodes
managing deleting of the spans they own.
Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

bors r+

Reviewed 4 of 11 files at r2, 1 of 10 files at r3, 3 of 10 files at r4, 1 of 5 files at r5, 6 of 6 files at r6, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @otan, @rafiss, and @yuzefovich)

@craig craig bot merged commit d58473e into cockroachdb:master Aug 8, 2022
@craig
Copy link
Contributor

craig bot commented Aug 8, 2022

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ttl: improve row-level TTL performance using DistSQL
4 participants