-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cdc: Stuck changefeed #108040
Comments
miretskiy
added
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
A-cdc
Change Data Capture
T-cdc
labels
Aug 2, 2023
cc @cockroachdb/cdc |
Reproduces readily with the following roachtest from #107722:
It splits off 100.000 ranges, ingests 1 billion rows, then creates the following changefeed: CREATE CHANGEFEED FOR kv.kv INTO 'null://' WITH cursor = <before_inserts>, end_time = <now>, format = 'json', min_checkpoint_frequency = '10s'; The changefeed will stop making progress after a few minutes, with catchup scans blocked sending to the stream, because the stream processors are blocked on quota acquisition as shown above. This happens across all nodes. |
This was referenced Aug 2, 2023
craig bot
pushed a commit
that referenced
this issue
Aug 2, 2023
104228: dev: replace `stress` with `--runs_per_test` r=rail a=rickystewart `--runs_per_test` does the same thing that `stress` does (runs a test many times to weed out flakes), but better, in that: 1. Better Bazel support -- we had to hack Bazel support into `stress` to keep it working 2. Bazel gives us statistics and control over how the tests are scheduled in a way that is standardized as opposed to the ad-hoc arguments one can pass to `stress` 3. Bazel can collect logs and artifacts for different test runs and expose them to the user in a way that `stress` cannot 4. Drop-in support for remote execution when we have access to it, obviating the need for `roachprod-stress` For now, the default implementation of `dev test --stress` is to run the test 1,000 times, stopping the build if any test fails. This is meant to replicate how people use `stress` (i.e., just start it running and stop if there are any failures). The 1,000 figure is arbitrary and meant to just be a "very high number" of times to run unit tests. The default figure of 1,000 can be adjusted with `--count`. Also update documentation with some new recipes and add extra logging to warn about the change in behavior. Closes #102879 Epic: none Release note: None 107765: sql: remove calls to CreateTestServerParams r=yuzefovich a=yuzefovich This PR removes calls to `tests.CreateTestServerParams` outside of `sql_test` tests as well as does some miscellaneous cleanups along the way. The rationale for doing this is that vast majority of callers didn't actually need the setup performed by the helper but inherited the disabling of test tenant. It seems more prudent for each test to be explicit about the kind of testing knobs and settings it needs. Informs: #76378. Epic: CRDB-18499. Release note: None 107889: concurrency: enable joint claims on the request scan path r=nvanbenschoten a=arulajmani This patch addresses a TODO that was blocking joint claims on a request's scan path and adds a test to show joint claims work. In particular, previously the lock mode of a queued request was hardcoded to use locking strength `lock.Intent`. We now use the correct lock mode by snapshotting an immutable copy of the request's lock strength on to the queuedGuard when inserting a it into a lock's wait queue. Informs #102275 Release note: None 108052: changefeedccl: Release allocation when skipping events r=miretskiy a=miretskiy The changefeed (or KV feed to be precise) may skip some events when "scan boundary" is reached. Scan boundary is a timestamp when certain event occurs -- usually a schema change. But, it may also occur when the `end_time` option is set. The KV feed ignores events that have MVCC timestamp greater or equal to the scan boundary event. Unfortunately, due to a long outstanding bug, the memory allocation associated with the event would not be released when KV feed decides to skip the event. Because of this, allocated memory was "leaked" and not reclaimed. If enough additional events arrive, those leaked events may account for all of the memory budget, thus leading to inability for additional events to be added. This bug impacts any changefeeds running with the `end_time` option set. It might also impact changefeeds that observe normal schema change event, though this situation is highly unlikely(the same transaction that perform schema change had to have modified sufficient number of rows in the table to fill up all of the memory budget). Fixes #108040 Release note (enterprise change): Fix a potential "deadlock" when running changefeed with `end_time` option set. 108066: dev: set `COCKROACH_DEV_LICENSE` for `compose` r=rail a=rickystewart This test won't run without the license key set. Epic: CRDB-17171 Release note: None Co-authored-by: Ricky Stewart <[email protected]> Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Arul Ajmani <[email protected]> Co-authored-by: Yevgeniy Miretskiy <[email protected]>
blathers-crl bot
pushed a commit
that referenced
this issue
Aug 2, 2023
The changefeed (or KV feed to be precise) may skip some events when "scan boundary" is reached. Scan boundary is a timestamp when certain event occurs -- usually a schema change. But, it may also occur when the `end_time` option is set. The KV feed ignores events that have MVCC timestamp greater or equal to the scan boundary event. Unfortunately, due to a long outstanding bug, the memory allocation associated with the event would not be released when KV feed decides to skip the event. Because of this, allocated memory was "leaked" and not reclaimed. If enough additional events arrive, those leaked events may account for all of the memory budget, thus leading to inability for additional events to be added. This bug impacts any changefeeds running with the `end_time` option set. It might also impact changefeeds that observe normal schema change event, though this situation is highly unlikely (the same transaction that perform schema change had to have modified sufficient number of rows in the table to fill up all of the memory budget). Fixes #108040 Release note (enterprise change): Fix a potential "deadlock" when running changefeed with `end_time` option set.
miretskiy
pushed a commit
to miretskiy/cockroach
that referenced
this issue
Aug 2, 2023
The changefeed (or KV feed to be precise) may skip some events when "scan boundary" is reached. Scan boundary is a timestamp when certain event occurs -- usually a schema change. But, it may also occur when the `end_time` option is set. The KV feed ignores events that have MVCC timestamp greater or equal to the scan boundary event. Unfortunately, due to a long outstanding bug, the memory allocation associated with the event would not be released when KV feed decides to skip the event. Because of this, allocated memory was "leaked" and not reclaimed. If enough additional events arrive, those leaked events may account for all of the memory budget, thus leading to inability for additional events to be added. This bug impacts any changefeeds running with the `end_time` option set. It might also impact changefeeds that observe normal schema change event, though this situation is highly unlikely (the same transaction that perform schema change had to have modified sufficient number of rows in the table to fill up all of the memory budget). Fixes cockroachdb#108040 Release note (enterprise change): Fix a potential "deadlock" when running changefeed with `end_time` option set.
miretskiy
pushed a commit
to miretskiy/cockroach
that referenced
this issue
Aug 2, 2023
The changefeed (or KV feed to be precise) may skip some events when "scan boundary" is reached. Scan boundary is a timestamp when certain event occurs -- usually a schema change. But, it may also occur when the `end_time` option is set. The KV feed ignores events that have MVCC timestamp greater or equal to the scan boundary event. Unfortunately, due to a long outstanding bug, the memory allocation associated with the event would not be released when KV feed decides to skip the event. Because of this, allocated memory was "leaked" and not reclaimed. If enough additional events arrive, those leaked events may account for all of the memory budget, thus leading to inability for additional events to be added. This bug impacts any changefeeds running with the `end_time` option set. It might also impact changefeeds that observe normal schema change event, though this situation is highly unlikely (the same transaction that perform schema change had to have modified sufficient number of rows in the table to fill up all of the memory budget). Fixes cockroachdb#108040 Release note (enterprise change): Fix a potential "deadlock" when running changefeed with `end_time` option set.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
As reported by @erikgrinaker , running experiment emitting into null:// sink deadlocked changefeed.
Relevant stacks below:
Jira issue: CRDB-30296
Epic CRDB-11783
The text was updated successfully, but these errors were encountered: