-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
changefeedccl: periodic pts record updates #76605
Conversation
0f87fba
to
519906f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very nice.
Reviewed 1 of 5 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @miretskiy and @samiskin)
-- commits, line 14 at r1:
This definitely needs a release note. It's an internal change, and probably nobody cares about this
internal implementation detail; but it's a release note worty.
At the very least mentioning that changefeeds can run reliably on tables w/ low ttl.
Definitely need to mention the setting to control the frequency of such updates because this frequency effectively
overrides gcttl set on a table.
pkg/ccl/changefeedccl/changefeed.go, line 46 at r1 (raw file):
} func shouldProtectTimestamps(codec keys.SQLCodec) bool {
pls add a todo to remove this restriction once tenant based pts is in...
pkg/ccl/changefeedccl/changefeed_processors.go, line 1172 at r1 (raw file):
} if err := cf.flowCtx.Stopper().RunAsyncTask(ctx, "changefeed-pts-manager", func(ctx context.Context) {
not sure if we need this. Why not do this while trying to checkpoint job progress?
b60c090
to
b12ddd9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @miretskiy)
Previously, miretskiy (Yevgeniy Miretskiy) wrote…
This definitely needs a release note. It's an internal change, and probably nobody cares about this
internal implementation detail; but it's a release note worty.
At the very least mentioning that changefeeds can run reliably on tables w/ low ttl.
Definitely need to mention the setting to control the frequency of such updates because this frequency effectively
overrides gcttl set on a table.
Done.
pkg/ccl/changefeedccl/changefeed.go, line 46 at r1 (raw file):
Previously, miretskiy (Yevgeniy Miretskiy) wrote…
pls add a todo to remove this restriction once tenant based pts is in...
Done.
pkg/ccl/changefeedccl/changefeed_processors.go, line 1172 at r1 (raw file):
Previously, miretskiy (Yevgeniy Miretskiy) wrote…
not sure if we need this. Why not do this while trying to checkpoint job progress?
I just had it this way since you mentioned it being a separate loop on a timer initially. I wasn't sure whether it'd be considered more or less complex to either have a piece of functionality be more distinct from other functionality versus the timing of events being more deterministic where recurring events happen in lockstep with resolved events.
I do like that keeping it with checkpoints keeps table changes in one place and having it there lets it share the "min checkpoint frequency" option. I made the changes to move it to resolved 👍.
b12ddd9
to
acd1cc2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @miretskiy and @samiskin)
-- commits, line 16 at r3:
i don't think it's a bug fix -- it's an enterprise change.
acd1cc2
to
0b5e2b0
Compare
Previously changefeeds only laid down protected timestamp records to protect against either an ongoing backfill or the changefeed lagging behind. This is insufficient in cases such as if the gcttl is very short, recurring errors retry the changefeed for too long, or in upcoming work to enable serverless to shut down idle changefeeds. This PR removes the manual PTS protection on backfills and begins an async routine on the changeFrontier that updates the protected timestamp record to the current highwater mark. Fixes cockroachdb#76247 Release note (enterprise change): changefeeds running on tables with a low gcttl will function more reliably due to protected timestamps being maintained for the changefeed targets at the resolved timestamp of the changefeed. The frequency at which the protected timestamp is updated to the resolved timestamp can be configured through the `changefeed.protect_timestamp_interval` cluster setting. If the changefeed lags too far behind such that storage of old data becomes an issue, cancelling the changefeed will release the protected timestamps and allow garbage collection to resume. If `protect_data_from_gc_on_pause` is unset, pausing the changefeed will release the existing protected timestamp record.
0b5e2b0
to
54203d6
Compare
bors r+ |
Build failed (retrying...): |
Build succeeded: |
@@ -46,32 +43,30 @@ func emitResolvedTimestamp( | |||
return nil | |||
} | |||
|
|||
func shouldProtectTimestamps(codec keys.SQLCodec) bool { | |||
// TODO(smiskin): Remove this restriction once tenant based pts are enabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we TODO this @samiskin @adityamaru, given #73727 (comment) ?
Previously changefeeds only laid down protected timestamp records to
protect against either an ongoing backfill or the changefeed lagging
behind. This is insufficient in cases such as if the gcttl is very
short, recurring errors retry the changefeed for too long, or in
upcoming work to enable serverless to shut down idle changefeeds.
This PR removes the manual PTS protection on backfills and begins an
async routine on the changeFrontier that updates the protected timestamp
record to the current highwater mark.
Fixes #76247
Release note (enterprise change): changefeeds running on tables with a
low gcttl will function more reliably due to protected timestamps being
maintained for the changefeed targets at the resolved timestamp of the
changefeed. The frequency at which the protected timestamp is updated
to the resolved timestamp can be configured through the
changefeed.protect_timestamp_interval
cluster setting. If thechangefeed lags too far behind such that storage of old data becomes an
issue, cancelling the changefeed will release the protected timestamps
and allow garbage collection to resume. If
protect_data_from_gc_on_pause
is unset, pausing the changefeed willrelease the existing protected timestamp record.