Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cdc: max_behind_nanos does not update when jobs are stuck during initial scans #97043

Open
jayshrivastava opened this issue Feb 13, 2023 · 2 comments
Labels
A-cdc Change Data Capture C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-cdc

Comments

@jayshrivastava
Copy link
Contributor

jayshrivastava commented Feb 13, 2023

See https://github.com/cockroachlabs/support/issues/2053#issuecomment-1427839384.

We only update this metric when we start emitting resolved events for changefeeds. Say one changefeed gets stuck during the initial scan (we emit no resolved events until the initial scan finishes), we won't update the max_behind_nanos metric and customers won't know the changefeed is stuck.

We could initialize changefeed jobs running with an initial scan with some seed highwater / resolved event.

Also related: #93919

Jira issue: CRDB-24482

Epic CRDB-8669

@jayshrivastava jayshrivastava added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-cdc Change Data Capture labels Feb 13, 2023
@blathers-crl blathers-crl bot added the T-cdc label Feb 13, 2023
@blathers-crl
Copy link

blathers-crl bot commented Feb 13, 2023

cc @cockroachdb/cdc

@jayshrivastava jayshrivastava changed the title cdc: max_behind_nanos does not update without checkpoints cdc: max_behind_nanos does not update when jobs are stuck Feb 13, 2023
@jayshrivastava jayshrivastava changed the title cdc: max_behind_nanos does not update when jobs are stuck cdc: max_behind_nanos does not update when jobs are stuck during initial scans Feb 15, 2023
@Leeeeeeeroy-Jenkins
Copy link

A quick note on this ticket. The metric max_behind_nanos also does not seem to get updated. Could we consider this as part of the scope of work?

jayshrivastava added a commit to jayshrivastava/cockroach that referenced this issue Apr 11, 2023
Previously, this node-level metric would measure the maximum time between
the present and the oldest checkpoint seen by a change aggregator. Since
this metric was updated by in-memory checkpoints, it was prone to odd
behavior. For example:
- When a node restarts and a changefeed immediately begins a catchup scan,
  there are no checkpoints for this changefeed available to calculate the value
  of this metric. It's possible that an "inifinite" catchup scan could trigger
  where the metric would never get updated (

 had the description "Largest commit-to-emit duration of any running feed",

Informs: cockroachdb#97931
Closes: cockroachdb#97043
Closes: cockroachdb#99409
<what was there before: Previously, ...>
<why it needed to change: This was inadequate because ...>
<what you did about it: To address this, this patch ...>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-cdc Change Data Capture C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-cdc
Projects
None yet
Development

No branches or pull requests

2 participants