streamingccl: don't complete producer job on cutover #117515

msbutler · 2024-01-08T20:50:57Z

Previously on cutover, the producer job would complete, removing the producer
side protected timestamp. To ensure a smooth fast failback to any timestamp
greater or equal to the cutover timestamp, however, the original producer side
should keep the pts around.

This patch retains the producer side timestamp by delaying the completion of
the producer job until stream_replication.job_liveness_timeout (default 3
days) elapses after the final heartbeat during the replication job. A future PR
will deprecate this setting and allow users to directly set this producer side
retention period.

Epic: none

Release note: none

blathers-crl · 2024-01-08T20:51:00Z

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.}

cockroach-teamcity · 2024-01-08T20:51:03Z

This change is

Previously on cutover, the producer job would complete, removing the producer side protected timestamp. To ensure a smooth fast failback to any timestamp greater or equal to the cutover timestamp, however, the original producer side should keep the pts around. This patch retains the producer side timestamp by delaying the completion of the producer job until `stream_replication.job_liveness_timeout` (default 3 days) elapses after the final heartbeat during the replication job. A future PR will deprecate this setting and allow users to directly set this producer side retention period. Epic: none Release note: none

dt · 2024-01-09T15:38:39Z

default 3 days

This seems like a long time to me to keep accumulating history on a source cluster if the dev/test/other cluster decides it is done replicating and goes off on its way (it also seems like a long time if it just vanishes, but I don't need to litigate the default here).

Should we, in this success case, move the expiration to t+12h instead?

msbutler · 2024-01-09T16:20:24Z

The timeout is currently set by the cluster setting stream_replication.job_liveness_timeout. I was planning to punt manipulating the default timeout once i deprecated this setting in point 3 described here: https://cockroachlabs.slack.com/archives/C03JCUUSCD6/p1704745263201289?thread_ts=1704733221.079619&cid=C03JCUUSCD6

msbutler · 2024-01-09T16:31:43Z

TFTR!

bors r=stevendanna

craig · 2024-01-09T19:12:21Z

Build succeeded:

Bazel Essential CI (Cockroach)

A previous pr cockroachdb#117515 lowered the producer job liveness timout in the test to allow the producer job to succeed quickly on completion. The PR lowered it to 100ms which is too low, causing the job to fail on liveness timeouts. This patch bumps the timout to 1s, preventing the liveness timeout. Fixes cockroachdb#117605 Release note: none

117258: sql: use high priority for populating RoleMemberCache r=Xiang-Gu a=Xiang-Gu Previously, when the RoleMemberCache is invalid, it launches a new txn in a singleflight to read from `system.role_members` table to populate the cache. If, however, the original txn has previously laid a write intent on the same system table, then we end up having a deadlock: original txn waits for this new txn; this new txn waits for original txn. Fixes #117144 Release note (bug fix): Fixed a bug where concurrent GRANTs can cause deadlocks. 117571: build,bazci: add `pkg/build/engflow` package r=rail a=rickystewart ... and extract logic for extracting test results into a helper function. Release note: None Epic: CRDB-8308 117687: streamingccl: deflake TestPartitionedStreamClient r=dt a=msbutler A previous pr #117515 lowered the producer job liveness timout in the test to allow the producer job to succeed quickly on completion. The PR lowered it to 100ms which is too low, causing the job to fail on liveness timeouts. This patch bumps the timout to 1s, preventing the liveness timeout. Fixes #117605 Release note: none 117691: roachtest: pull tpc-e image from GAR r=srosenberg,renatolabs a=rail Previously, we pulled the tpc-e image from Docker Hub. Now that we move most of our CI images to GAR, this image will be pulled from GAR as well Epic: RE-539 Release note: None Co-authored-by: Xiang Gu <[email protected]> Co-authored-by: Ricky Stewart <[email protected]> Co-authored-by: Michael Butler <[email protected]> Co-authored-by: Rail Aliiev <[email protected]>

msbutler added the T-disaster-recovery label Jan 8, 2024

msbutler self-assigned this Jan 8, 2024

msbutler force-pushed the butler-pts-c2c branch 9 times, most recently from 8c3a7f2 to 4440ff8 Compare January 9, 2024 00:06

msbutler force-pushed the butler-pts-c2c branch from 4440ff8 to 6cc2185 Compare January 9, 2024 14:07

msbutler marked this pull request as ready for review January 9, 2024 14:08

msbutler requested a review from a team as a code owner January 9, 2024 14:08

msbutler requested review from adityamaru, dt and stevendanna and removed request for a team and adityamaru January 9, 2024 14:08

stevendanna approved these changes Jan 9, 2024

View reviewed changes

dt approved these changes Jan 9, 2024

View reviewed changes

craig bot merged commit 2c3b07e into cockroachdb:master Jan 9, 2024
8 of 10 checks passed

msbutler mentioned this pull request Jan 10, 2024

streamingccl: introduce syntax to manage original destination side data protection after cutover #117616

Closed

msbutler mentioned this pull request Jan 11, 2024

streamingccl: deflake TestPartitionedStreamClient #117687

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

streamingccl: don't complete producer job on cutover #117515

streamingccl: don't complete producer job on cutover #117515

msbutler commented Jan 8, 2024 •

edited

Loading

blathers-crl bot commented Jan 8, 2024

cockroach-teamcity commented Jan 8, 2024

dt commented Jan 9, 2024

msbutler commented Jan 9, 2024

msbutler commented Jan 9, 2024

craig bot commented Jan 9, 2024

streamingccl: don't complete producer job on cutover #117515

streamingccl: don't complete producer job on cutover #117515

Conversation

msbutler commented Jan 8, 2024 • edited Loading

blathers-crl bot commented Jan 8, 2024

cockroach-teamcity commented Jan 8, 2024

dt commented Jan 9, 2024

msbutler commented Jan 9, 2024

msbutler commented Jan 9, 2024

craig bot commented Jan 9, 2024

msbutler commented Jan 8, 2024 •

edited

Loading