-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cdc: Lagging span checkpoint with end_time option might be broken #108464
Comments
Would it not do so because the ranges are never sufficiently "lagging" in terms of their timestamp with respect to the frontier? |
I think we start from non-zero frontier (cursor). We then never emit a single checkpoint until we reach end time. |
Ah that makes sense, I guess ideally we send resolved timestamps at the end time instead (at least once) |
Improve blocking buffer observability by adding: * `changefeed.buffer_entries.allocated_memory` -- gauge keeping track of currently allocated memory for the events added to the blocking buffer. * `changefeed.buffer_entries.<event_type>` -- counters keeping track of the number of event type events (flush, KV, resolved) added. Informs cockroachdb#108464 Release note: None
Changefeed supports a mode where the user wants to emit all events that occurred since some time in the past (`cursor`), and end the changefeed (`end_time) at the time in the near future. In this mode, the rangefeed catchup scan starting from `cursor` position could take some time -- maybe even a lot of time -- and in this case, the very first checkpoint kvfeed will observe will be after `end_time`. All of the events, including checkpoints after `end_time` are skipped, as they should. However, this meant that no changefeed checkpoint records could be produced until entire changefeed completes. This PR ensures that once the `end_time` is reached, we will emit 1 "resolved event" for that span, so that changefeed can produce span based checkpoint if needed. Fixes cockroachdb#108464 Release note: None
Improve blocking buffer observability by adding: * `changefeed.buffer_entries.allocated_memory` -- gauge keeping track of currently allocated memory for the events added to the blocking buffer. * `changefeed.buffer_entries.<event_type>` -- counters keeping track of the number of event type events (flush, KV, resolved) added. Informs cockroachdb#108464 Release note: None
Changefeed supports a mode where the user wants to emit all events that occurred since some time in the past (`cursor`), and end the changefeed (`end_time) at the time in the near future. In this mode, the rangefeed catchup scan starting from `cursor` position could take some time -- maybe even a lot of time -- and in this case, the very first checkpoint kvfeed will observe will be after `end_time`. All of the events, including checkpoints after `end_time` are skipped, as they should. However, this meant that no changefeed checkpoint records could be produced until entire changefeed completes. This PR ensures that once the `end_time` is reached, we will emit 1 "resolved event" for that span, so that changefeed can produce span based checkpoint if needed. Fixes cockroachdb#108464 Release note: None
Improve blocking buffer observability by adding: * `changefeed.buffer_entries.allocated_memory` -- gauge keeping track of currently allocated memory for the events added to the blocking buffer. * `changefeed.buffer_entries.<event_type>` -- counters keeping track of the number of event type events (flush, KV, resolved) added. Informs cockroachdb#108464 Release note: None
Changefeed supports a mode where the user wants to emit all events that occurred since some time in the past (`cursor`), and end the changefeed (`end_time) at the time in the near future. In this mode, the rangefeed catchup scan starting from `cursor` position could take some time -- maybe even a lot of time -- and in this case, the very first checkpoint kvfeed will observe will be after `end_time`. All of the events, including checkpoints after `end_time` are skipped, as they should. However, this meant that no changefeed checkpoint records could be produced until entire changefeed completes. This PR ensures that once the `end_time` is reached, we will emit 1 "resolved event" for that span, so that changefeed can produce span based checkpoint if needed. Fixes cockroachdb#108464 Release note: None
Improve blocking buffer observability by adding: * `changefeed.buffer_entries.allocated_memory` -- gauge keeping track of currently allocated memory for the events added to the blocking buffer. * `changefeed.buffer_entries.<event_type>` -- counters keeping track of the number of event type events (flush, KV, resolved) added. Informs cockroachdb#108464 Release note: None
Changefeed supports a mode where the user wants to emit all events that occurred since some time in the past (`cursor`), and end the changefeed (`end_time) at the time in the near future. In this mode, the rangefeed catchup scan starting from `cursor` position could take some time -- maybe even a lot of time -- and in this case, the very first checkpoint kvfeed will observe will be after `end_time`. All of the events, including checkpoints after `end_time` are skipped, as they should. However, this meant that no changefeed checkpoint records could be produced until entire changefeed completes. This PR ensures that once the `end_time` is reached, we will emit 1 "resolved event" for that span, so that changefeed can produce span based checkpoint if needed. Fixes cockroachdb#108464 Release note: None
Improve blocking buffer observability by adding: * `changefeed.buffer_entries.allocated_memory` -- gauge keeping track of currently allocated memory for the events added to the blocking buffer. * `changefeed.buffer_entries.<event_type>` -- counters keeping track of the number of event type events (flush, KV, resolved) added. Informs cockroachdb#108464 Release note: None
109439: changefeedccl: Emit span resolved event when end time reached r=miretskiy a=miretskiy Changefeed supports a mode where the user wants to emit all events that occurred since some time in the past (`cursor`), and end the changefeed (`end_time) at the time in the near future. In this mode, the rangefeed catchup scan starting from `cursor` position could take some time -- maybe even a lot of time -- and in this case, the very first checkpoint kvfeed will observe will be after `end_time`. All of the events, including checkpoints after `end_time` are skipped, as they should. However, this meant that no changefeed checkpoint records could be produced until entire changefeed completes. This PR ensures that once the `end_time` is reached, we will emit 1 "resolved event" for that span, so that changefeed can produce span based checkpoint if needed. Fixes #108464 Release note: None 110267: roachtest: during c2c/shutdown, shutdown main driver if shutdown executor fails r=stevendanna a=msbutler During #110166, the c2c/shutdown test fataled while the job shutdown executor was running, yet the test kept running for quite a while because the goroutine that manages the c2c job had not realized the test failed. This patch refactors the c2c/shutdown tests such that when the job shutdown executor detects a failure, it cancels the context used by the goroutine managing the c2c job. Informs #110166 Release note: none 110329: rangefeed: reuse annotated context in `ScheduledProcessor.process()` r=erikgrinaker a=erikgrinaker Context construction is expensive enough to show up in CPU profiles. With 20k rangefeeds/node on an idle cluster, this made up 1% of overall CPU usage, or 4% of rangefeed scheduler CPU usage. Epic: none Release note: None Co-authored-by: Yevgeniy Miretskiy <[email protected]> Co-authored-by: Michael Butler <[email protected]> Co-authored-by: Erik Grinaker <[email protected]>
Lagging span checkpointing supposed to address a case when
changefeed executes rangefeeds against large table, and some of
the spans might be lagging behind the rest of the spans (due
to catchup scan throttling, or any other reasons).
In those cases, changefeed suppose to produce span based checkpoint
that is meant to ensure that catchup scan completes.
Consider the case where changefeed executes against large table, but
has both cursor and end_time options specified, with end_time in the past
(something like: I want to re-emit all events for the past hour, and then exit).
In this case, changefeed will start the catchup scan from cursor position, and
is supposed to terminate at end_time.
It appears (but needs to be verified) that in those cases, as soon. as the event
for the end_time is reached, all subsequent events are skipped -- which is fine.
What's not fine is that in this case we will never emit range based checkpoint, and that
will render lagging span checkpointing ineffective.
(Granted, this is a bit of a corner case, and was discovered via #108157).
Jira issue: CRDB-30482
The text was updated successfully, but these errors were encountered: