-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ttl: improve row-level TTL performance using DistSQL #84728
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! I have some nits as well as a suggestion - it'd be good to check that the physical planning of DistSQL processors is as expected. You could do so in a "execbuilder" test - e.g. take a look at pkg/sql/opt/exec/execbuilder/distsql_join
for inspiration (or any other distsql_*
file in that folder).
Reviewed 5 of 11 files at r2, 10 of 10 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ecwall, @otan, and @rafiss)
pkg/sql/distsql_running.go
line 752 at r3 (raw file):
} // NewErrorOnlyMetadataCallbackWriter creates a new MetadataCallbackWriter that supports receiving an error.
Given the name of the function, I'd expect that this would take the callback that should be called whenever a metadata is received.
nit: wrap at 80 characters.
pkg/sql/distsql_running.go
line 755 at r3 (raw file):
func NewErrorOnlyMetadataCallbackWriter() *MetadataCallbackWriter { return NewMetadataCallbackWriter( NewRowResultWriter(nil),
nit: using errOnlyResultWriter
here seems cleaner.
pkg/sql/exec_util.go
line 1524 at r3 (raw file):
// ReturnStatsError causes stats errors to be returned instead of logged as warnings. ReturnStatsError bool // ChangeTableDescriptorVersionDuringDelete flag to change the table descriptor
nit: a verb seems to be missing in the comment.
pkg/sql/execinfrapb/flow_diagram.go
line 585 at r3 (raw file):
// summary implements the diagramCellType interface. func (s *TTLSpec) summary() (string, []string) { return "TTL", []string{}
nit: this determines what is shown on DistSQL diagrams - do we not want to include some details? Up to you.
pkg/sql/execinfrapb/processors_ttl.proto
line 1 at r3 (raw file):
// Copyright 2019 The Cockroach Authors.
nit: s/2019/2022/
.
pkg/sql/execinfrapb/processors_ttl.proto
line 27 at r3 (raw file):
import "jobs/jobspb/jobs.proto"; message TTLSpec {
nit: it'd be good to add some comments to the fields in this proto.
pkg/sql/ttl/ttljob/ttljob.go
line 251 at r3 (raw file):
physicalPlan := planCtx.NewPhysicalPlan() // All of the progress information is sent through the metadata stream, so we
Do we want to expose that progress information somehow?
Update: I see that each processor updates the job progress, it'd probably be worth to mention that in the comment here.
pkg/sql/ttl/ttljob/ttljob_processor.go
line 50 at r3 (raw file):
func (t *ttlProcessor) Start(ctx context.Context) { ctx = t.StartInternal(ctx, "t")
nit: probably s/"t"/"ttl"/
or something like that.
pkg/sql/ttl/ttljob/ttljob_processor.go
line 271 at r3 (raw file):
cutoff := details.Cutoff selectBuilder := makeSelectQueryBuilder( details.TableID,
nit: use tableID
.
pkg/sql/ttl/ttljob/ttljob_query_builder.go
line 74 at r3 (raw file):
cachedArgs := make([]interface{}, 0, 1+len(pkColumns)*2) cachedArgs = append(cachedArgs, cutoff) endPK := rangeToProcess.endPK
nit: it'd be good to be consistent - either define startPK
as well or don't define endPK
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @otan, @rafiss, and @yuzefovich)
pkg/sql/distsql_running.go
line 752 at r3 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Given the name of the function, I'd expect that this would take the callback that should be called whenever a metadata is received.
nit: wrap at 80 characters.
I renamed it to NewMetadataOnlyMetadataCallbackWriter
since it should never receive row results.
Wrapped line at 80 chars.
Code quote:
NewErrorOnlyMetadataCallbackWriter
pkg/sql/execinfrapb/flow_diagram.go
line 585 at r3 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: this determines what is shown on DistSQL diagrams - do we not want to include some details? Up to you.
Added.
pkg/sql/ttl/ttljob/ttljob.go
line 251 at r3 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Do we want to expose that progress information somehow?
Update: I see that each processor updates the job progress, it'd probably be worth to mention that in the comment here.
Done.
Code quote:
physicalPlan.AddNoInputStage(
pkg/sql/ttl/ttljob/ttljob_query_builder.go
line 74 at r3 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: it'd be good to be consistent - either define
startPK
as well or don't defineendPK
.
endPK
is used multiple times so I defined a variable for it, but I can define one for startPK
too for consistency even though it is only used once.
Code quote:
endPK := rangeToProcess.endPK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this look good! just small nits. i focused my review only on the ttljob parts, so anything changing the distsql framework should get signoff from yahor/sql-queries
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ecwall, @otan, and @yuzefovich)
pkg/sql/ttl/ttljob/ttljob_processor.go
line 447 at r5 (raw file):
processorID, output, nil, /* memMonitor */
should the job do memory monitoring? if not, it would be useful for a comment to explain why
pkg/sql/ttl/ttljob/ttljob_test.go
line 558 at r5 (raw file):
AOSTDuration: &zeroDuration, ReturnStatsError: true, RequireMultipleSpanPartitions: numNodes > 0,
based on the if numNodes == 0
condition above, won't this always be true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 10 files at r4, 5 of 5 files at r5, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ecwall and @otan)
pkg/sql/execinfrapb/flow_diagram.go
line 587 at r5 (raw file):
details := s.RowLevelTTLDetails spans := s.Spans spanStrings := make([]string, len(spans))
This is currently ignored and not included into the summary - is this intentional? Also if you do include it, then it might makes sense to print out only a single span - take a look at TableReaderSpec.summary
for an example (if we print out hundreds of spans, then the diagram will be unreadable).
pkg/sql/execinfrapb/processors_ttl.proto
line 29 at r5 (raw file):
message TTLSpec { // JobID of the job that ran the ttlProcessor
nit: missing period.
pkg/sql/ttl/ttljob/ttljob_query_builder.go
line 74 at r3 (raw file):
Previously, ecwall (Evan Wall) wrote…
endPK
is used multiple times so I defined a variable for it, but I can define one forstartPK
too for consistency even though it is only used once.
Oh, I didn't see the multiple uses - up to you then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @otan, @rafiss, and @yuzefovich)
pkg/sql/execinfrapb/flow_diagram.go
line 587 at r5 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
This is currently ignored and not included into the summary - is this intentional? Also if you do include it, then it might makes sense to print out only a single span - take a look at
TableReaderSpec.summary
for an example (if we print out hundreds of spans, then the diagram will be unreadable).
Removed
pkg/sql/ttl/ttljob/ttljob_processor.go
line 447 at r5 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
should the job do memory monitoring? if not, it would be useful for a comment to explain why
I'm not sure if it should or not. I copied it from newBulkRowWriterProcessor
where it was nil
.
pkg/sql/ttl/ttljob/ttljob_test.go
line 558 at r5 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
based on the
if numNodes == 0
condition above, won't this always be true?
changed to tc.numNodes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @otan, @rafiss, and @yuzefovich)
pkg/sql/ttl/ttljob/ttljob_query_builder.go
line 74 at r3 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Oh, I didn't see the multiple uses - up to you then.
I'll leave it in then.
fixes #76914 Release note (performance improvement): The row-level TTL job has been modified to distribute work using DistSQL. This usually results in the leaseholder nodes managing deleting of the spans they own.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
bors r+
Reviewed 4 of 11 files at r2, 1 of 10 files at r3, 3 of 10 files at r4, 1 of 5 files at r5, 6 of 6 files at r6, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @otan, @rafiss, and @yuzefovich)
Build succeeded: |
fixes #76914
Release note (performance improvement): The row-level TTL job has been modified
to distribute work using DistSQL. This usually results in the leaseholder nodes
managing deleting of the spans they own.