-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ttl: ensure each DistSQL processor only deletes its own data #86057
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ecwall and @rafiss)
pkg/sql/ttl/ttljob/ttljob_processor.go
line 220 at r1 (raw file):
progress := md.Progress existingRowCount := progress.Details.(*jobspb.Progress_RowLevelTTL).RowLevelTTL.RowCount progress.Details.(*jobspb.Progress_RowLevelTTL).RowLevelTTL.RowCount += processorRowCount
should this use atomic.LoadInt64
first to access the processorRowCount?
pkg/sql/ttl/ttljob/ttljob_processor.go
line 222 at r1 (raw file):
progress.Details.(*jobspb.Progress_RowLevelTTL).RowLevelTTL.RowCount += processorRowCount ju.UpdateProgress(progress) log.Infof(
how frequently will this log? it feels like it could be something that should only be shown with log.Vinfof(ctx, 2, ...)
pkg/sql/ttl/ttljob/ttljob_processor.go
line 233 at r1 (raw file):
// rangeRowCount should be checked even if the function returns an error because it may have partially succeeded func (ttl *ttlProcessor) runTTLOnRange(
nit: function receivers are usually one character by convention. otherwise this is easily confused with the ttl
package. (and use the same for all the functions that have this receiver.) i'd say use either t
or p
.
pkg/sql/ttl/ttljob/ttljob_processor.go
line 302 at r1 (raw file):
Previously, ecwall (Evan Wall) wrote…
It looks like the nodes can select outside of the range they are leaseholders for so I need to fix that, but I'll do it separately.
this should be a TODO in the code, not just a comment on the PR
pkg/sql/ttl/ttljob/ttljob_processor.go
line 303 at r1 (raw file):
expiredRowsPKs, err := selectBuilder.run(ctx, ie) log.Dev.Infof( ctx,
nit: use log.Vinfof(ctx, 2, ...)
pkg/sql/ttl/ttljob/ttljob_processor.go
line 305 at r1 (raw file):
ctx, "TTL selected rows jobID=%d processorID=%d tableID=%d expiredRowsPKs=%s", ttlSpec.JobID, ttl.ProcessorID, details.TableID, expiredRowsPKs,
how big is expiredRowsPKs
? the log won't be useful if this prints out hundreds/thousands of entries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rafiss)
pkg/sql/ttl/ttljob/ttljob_processor.go
line 220 at r1 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
should this use
atomic.LoadInt64
first to access the processorRowCount?
processorRowCount
isn't being modified anymore by this point so there isn't a race between updates and reads.
pkg/sql/ttl/ttljob/ttljob_processor.go
line 222 at r1 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
how frequently will this log? it feels like it could be something that should only be shown with
log.Vinfof(ctx, 2, ...)
It will log once per leaseholder per TTL run. I can change it to that though.
pkg/sql/ttl/ttljob/ttljob_processor.go
line 233 at r1 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
nit: function receivers are usually one character by convention. otherwise this is easily confused with the
ttl
package. (and use the same for all the functions that have this receiver.) i'd say use eithert
orp
.
Changed to t
.
pkg/sql/ttl/ttljob/ttljob_processor.go
line 302 at r1 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
this should be a TODO in the code, not just a comment on the PR
Addressed in prior PR
pkg/sql/ttl/ttljob/ttljob_processor.go
line 303 at r1 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
nit: use
log.Vinfof(ctx, 2, ...)
Addressed in prior PR.
pkg/sql/ttl/ttljob/ttljob_processor.go
line 305 at r1 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
how big is
expiredRowsPKs
? the log won't be useful if this prints out hundreds/thousands of entries.
Removed log.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rafiss)
pkg/sql/ttl/ttljob/ttljob_processor.go
line 157 at r4 (raw file):
ri := kvcoord.MakeRangeIterator(serverCfg.DistSender) for _, span := range ttlSpec.Spans { spanStartKey := span.Key
This logic works correctly in the multi-node test, but the single node tests are failing. I don't quite understand the relationship between a span and a range and need help with this logic.
I think we need to backport fixes to the key decoding to 22.1 if TTL feature is enabled for tenants in serverless. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, i just had questions since i don't fully understand the fix
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ecwall and @rafiss)
pkg/sql/ttl/ttljob/ttljob_processor.go
line 153 at r6 (raw file):
// Iterate over every range to feed work for the goroutine processors. var alloc tree.DatumAlloc for _, span := range ttlSpec.Spans {
interesting fix -- why don't we need the rangeIterator any more? could you explain in the commit message?
pkg/sql/ttl/ttljob/ttljob_processor.go
line 374 at r6 (raw file):
rKey := key.AsRawKey() // If any of these errors, that means we reached an "empty" key, which
i didn't follow why these two checks should be removed. could you explain in the commit message?
pkg/sql/ttl/ttljob/ttljob_test.go
line 198 at r6 (raw file):
} // todo(ewall): migrate usages to verifyExpiredRows and switch SPLIT AT usage to SplitTable
i thought we already did the SplitTable change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @otan and @rafiss)
pkg/sql/ttl/ttljob/ttljob_processor.go
line 153 at r6 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
interesting fix -- why don't we need the rangeIterator any more? could you explain in the commit message?
I think the range iterator was to group the batches by range to make them more efficient, but ttlSpec.Spans
(sourced from DistSQLPlanner.PartitionSpans
) already only has the spans relevant to this node.
I think in the future we can remove the parallel Span
processing and increase the batch size to reduce the number of config params, but I need to test this in the workload.
pkg/sql/ttl/ttljob/ttljob_processor.go
line 374 at r6 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
i didn't follow why these two checks should be removed. could you explain in the commit message?
I don't think these branches were being hit anymore. @otan said tests were previously failing without them, but now the tests pass without them: https://cockroachlabs.slack.com/archives/C01RX2G8LT1/p1661281166206439
I'll update the message.
pkg/sql/ttl/ttljob/ttljob_test.go
line 198 at r6 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
i thought we already did the SplitTable change?
I did for the multi-node tests, but other tests were failing because of some Voter error so I need to look into it more.
refs #85800 Fix span/range calculation to filter out rows owned by other nodes. Rows are grouped by node via DistSQLPlanner.PartitionSpans instead of RangeIterator. Therefore, rows in the same range but a different table or index no longer need to be filtered out. Release note: None Release justification: TTL performance bug fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, thanks for the explanations
Reviewed 3 of 4 files at r5, 1 of 1 files at r6, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @otan and @rafiss)
bors r=rafiss |
Build succeeded: |
refs #85800
Fix span/range calculation to filter out rows owned by other nodes.
Rows are grouped by node via DistSQLPlanner.PartitionSpans instead
of RangeIterator. Therefore, rows in the same range but a different table
or index no longer need to be filtered out.
Release note: None
Release justification: TTL performance bug fix.