streamingccl: tighten replication timestamp semantics #92788

adityamaru · 2022-12-01T03:51:50Z

Previously, each partition would reach out to the
source cluster and pick its own timestamp from which it would start ingesting MVCC versions. This timestamp was used by the rangefeed setup by the partition, to run its initial scan. Eventually, all the partitions would replicate up until a certain timestamp and cause the frontier to be bumped but it was possible for different partitions to begin ingesting at different timestamps.

This change makes it such that during replication planning when we create the producer job on the source cluster, we return a timestamp alongwith the StreamID. This becomes the timestamp at which each ingestion partition sets up the inital scan of the rangefeed, and consequently become the inital timestamp at which all data is ingested. We stash this timestamp in the replication job details and never update its value. On future resumptions of the replication job, if there is a progress high water, we will not run an initial rangefeed scan but instead start the rangefeed from the previous progress highwater.

The motivation for this change was to know the lower bound on both the source and destination cluster for MVCC versions that have been streamed. This is necessary to bound the fingerprinting on both clusters to ensure a match.

Release note: None

Fixes: #92742

cockroach-teamcity · 2022-12-01T03:52:01Z

This change is

lidorcarmel

lgtm

pkg/ccl/streamingccl/streamingest/stream_ingestion_job.go

pkg/ccl/streamingccl/streamingest/stream_replication_e2e_test.go

pkg/ccl/streamingccl/streamproducer/replication_stream_test.go

Previously, each partition would reach out to the source cluster and pick its own timestamp from which it would start ingesting MVCC versions. This timestamp was used by the rangefeed setup by the partition, to run its initial scan. Eventually, all the partitions would replicate up until a certain timestamp and cause the frontier to be bumped but it was possible for different partitions to begin ingesting at different timestamps. This change makes it such that during replication planning when we create the producer job on the source cluster, we return a timestamp alongwith the StreamID. This becomes the timestamp at which each ingestion partition sets up the inital scan of the rangefeed, and consequently become the inital timestamp at which all data is ingested. We stash this timestamp in the replication job details and never update its value. On future resumptions of the replication job, if there is a progress high water, we will not run an initial rangefeed scan but instead start the rangefeed from the previous progress highwater. The motivation for this change was to know the lower bound on both the source and destination cluster for MVCC versions that have been streamed. This is necessary to bound the fingerprinting on both clusters to ensure a match. Release note: None Fixes: cockroachdb#92742

adityamaru · 2022-12-12T01:00:49Z

Both failures are flakes:

TFTR!

bors r=lidorcarmel

craig · 2022-12-12T01:30:27Z

Build failed:

Bazel Essential CI (Cockroach)

adityamaru · 2022-12-12T13:41:55Z

Failed on TestClusterRestoreFailCleanup for a seemingly unrelated reason. Investigating

stevendanna · 2022-12-12T13:51:46Z

pkg/sql/execinfrapb/processors_bulk_io.proto

+  // start ingesting data in the replication job. This timestamp is empty unless
+  // the replication job resumes after a progress checkpoint has been recorded.
+  // While it is empty we use the InitialScanTimestamp described below.
+  optional util.hlc.Timestamp previous_high_water_timestamp = 2 [(gogoproto.nullable) = false];


I like the new name.

adityamaru · 2022-12-12T15:47:45Z

bors retry

craig · 2022-12-12T16:32:48Z

Build failed:

Bazel Essential CI (Cockroach)

adityamaru · 2022-12-12T17:14:19Z

Now its TestComposeGSS, third time is the charm. I'll file something for TestComposeGSS.

bors retry

craig · 2022-12-12T18:13:07Z

Build failed (retrying...):

Bazel Essential CI (Cockroach)

craig · 2022-12-12T18:49:35Z

Build failed (retrying...):

Bazel Essential CI (Cockroach)

craig · 2022-12-12T21:37:14Z

Build failed (retrying...):

Bazel Essential CI (Cockroach)

craig · 2022-12-13T00:13:20Z

Build failed (retrying...):

Bazel Essential CI (Cockroach)

craig · 2022-12-13T04:07:43Z

Build failed (retrying...):

Bazel Essential CI (Cockroach)

craig · 2022-12-13T10:19:15Z

Build failed (retrying...):

Bazel Essential CI (Cockroach)

craig · 2022-12-13T14:13:14Z

Build succeeded:

Bazel Essential CI (Cockroach)

adityamaru mentioned this pull request Dec 1, 2022

streamingccl: tighten the timestamp we use to start replicating #92698

Closed

adityamaru force-pushed the start-time-fixup branch from d604467 to 6cf6ab4 Compare December 1, 2022 14:49

adityamaru marked this pull request as ready for review December 1, 2022 16:02

adityamaru requested review from a team as code owners December 1, 2022 16:02

adityamaru requested a review from a team December 1, 2022 16:02

adityamaru requested a review from a team as a code owner December 1, 2022 16:02

adityamaru requested review from benbardin, stevendanna, lidorcarmel and a team and removed request for a team and benbardin December 1, 2022 16:02

adityamaru mentioned this pull request Dec 5, 2022

backupccl: issue protected timestamps during on restore spans #91991

Merged

lidorcarmel approved these changes Dec 7, 2022

View reviewed changes

adityamaru force-pushed the start-time-fixup branch 3 times, most recently from 4a364f5 to 8fafe9f Compare December 11, 2022 16:34

adityamaru requested a review from a team December 11, 2022 20:36

adityamaru force-pushed the start-time-fixup branch from f4225c6 to 523b79d Compare December 11, 2022 22:31

stevendanna approved these changes Dec 12, 2022

View reviewed changes

adityamaru mentioned this pull request Dec 12, 2022

c2c: SHOW TENANT...REPLICATION should show more #93447

Closed

craig bot merged commit bdfde49 into cockroachdb:master Dec 13, 2022

adityamaru deleted the start-time-fixup branch December 13, 2022 14:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

streamingccl: tighten replication timestamp semantics #92788

streamingccl: tighten replication timestamp semantics #92788

adityamaru commented Dec 1, 2022

cockroach-teamcity commented Dec 1, 2022

lidorcarmel left a comment

adityamaru commented Dec 12, 2022

craig bot commented Dec 12, 2022

adityamaru commented Dec 12, 2022

stevendanna Dec 12, 2022

adityamaru commented Dec 12, 2022

craig bot commented Dec 12, 2022

adityamaru commented Dec 12, 2022 •

edited

Loading

craig bot commented Dec 12, 2022

craig bot commented Dec 12, 2022

craig bot commented Dec 12, 2022

craig bot commented Dec 13, 2022

craig bot commented Dec 13, 2022

craig bot commented Dec 13, 2022

craig bot commented Dec 13, 2022

streamingccl: tighten replication timestamp semantics #92788

streamingccl: tighten replication timestamp semantics #92788

Conversation

adityamaru commented Dec 1, 2022

cockroach-teamcity commented Dec 1, 2022

lidorcarmel left a comment

Choose a reason for hiding this comment

adityamaru commented Dec 12, 2022

craig bot commented Dec 12, 2022

adityamaru commented Dec 12, 2022

stevendanna Dec 12, 2022

Choose a reason for hiding this comment

adityamaru commented Dec 12, 2022

craig bot commented Dec 12, 2022

adityamaru commented Dec 12, 2022 • edited Loading

craig bot commented Dec 12, 2022

craig bot commented Dec 12, 2022

craig bot commented Dec 12, 2022

craig bot commented Dec 13, 2022

craig bot commented Dec 13, 2022

craig bot commented Dec 13, 2022

craig bot commented Dec 13, 2022

adityamaru commented Dec 12, 2022 •

edited

Loading