-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
streamingccl: don't complete producer job on cutover #117515
Conversation
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
8c3a7f2
to
4440ff8
Compare
Previously on cutover, the producer job would complete, removing the producer side protected timestamp. To ensure a smooth fast failback to any timestamp greater or equal to the cutover timestamp, however, the original producer side should keep the pts around. This patch retains the producer side timestamp by delaying the completion of the producer job until `stream_replication.job_liveness_timeout` (default 3 days) elapses after the final heartbeat during the replication job. A future PR will deprecate this setting and allow users to directly set this producer side retention period. Epic: none Release note: none
4440ff8
to
6cc2185
Compare
This seems like a long time to me to keep accumulating history on a source cluster if the dev/test/other cluster decides it is done replicating and goes off on its way (it also seems like a long time if it just vanishes, but I don't need to litigate the default here). Should we, in this success case, move the expiration to t+12h instead? |
The timeout is currently set by the cluster setting |
TFTR! bors r=stevendanna |
Build succeeded: |
A previous pr cockroachdb#117515 lowered the producer job liveness timout in the test to allow the producer job to succeed quickly on completion. The PR lowered it to 100ms which is too low, causing the job to fail on liveness timeouts. This patch bumps the timout to 1s, preventing the liveness timeout. Fixes cockroachdb#117605 Release note: none
117258: sql: use high priority for populating RoleMemberCache r=Xiang-Gu a=Xiang-Gu Previously, when the RoleMemberCache is invalid, it launches a new txn in a singleflight to read from `system.role_members` table to populate the cache. If, however, the original txn has previously laid a write intent on the same system table, then we end up having a deadlock: original txn waits for this new txn; this new txn waits for original txn. Fixes #117144 Release note (bug fix): Fixed a bug where concurrent GRANTs can cause deadlocks. 117571: build,bazci: add `pkg/build/engflow` package r=rail a=rickystewart ... and extract logic for extracting test results into a helper function. Release note: None Epic: CRDB-8308 117687: streamingccl: deflake TestPartitionedStreamClient r=dt a=msbutler A previous pr #117515 lowered the producer job liveness timout in the test to allow the producer job to succeed quickly on completion. The PR lowered it to 100ms which is too low, causing the job to fail on liveness timeouts. This patch bumps the timout to 1s, preventing the liveness timeout. Fixes #117605 Release note: none 117691: roachtest: pull tpc-e image from GAR r=srosenberg,renatolabs a=rail Previously, we pulled the tpc-e image from Docker Hub. Now that we move most of our CI images to GAR, this image will be pulled from GAR as well Epic: RE-539 Release note: None Co-authored-by: Xiang Gu <[email protected]> Co-authored-by: Ricky Stewart <[email protected]> Co-authored-by: Michael Butler <[email protected]> Co-authored-by: Rail Aliiev <[email protected]>
Previously on cutover, the producer job would complete, removing the producer
side protected timestamp. To ensure a smooth fast failback to any timestamp
greater or equal to the cutover timestamp, however, the original producer side
should keep the pts around.
This patch retains the producer side timestamp by delaying the completion of
the producer job until
stream_replication.job_liveness_timeout
(default 3days) elapses after the final heartbeat during the replication job. A future PR
will deprecate this setting and allow users to directly set this producer side
retention period.
Epic: none
Release note: none