Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

c2c: pick a uniform start timestamp across partitions when creating a tenant replication stream #92742

Closed
adityamaru opened this issue Nov 30, 2022 · 2 comments · Fixed by #92788
Assignees
Labels
A-disaster-recovery C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-disaster-recovery

Comments

@adityamaru
Copy link
Contributor

adityamaru commented Nov 30, 2022

Previously, each partition would reach out to the
source cluster and pick its own timestamp from which it would start ingesting MVCC versions. This timestamp was used by the rangefeed setup by the partition, to run its initial scan. Eventually, all the partitions would replicate up until a certain timestamp and cause the frontier to be bumped but it was possible for different partitions to begin ingesting at different timestamps.

This change makes it such that during replication planning when we create the producer job on the source cluster, we return a timestamp along with the StreamID. This becomes the timestamp at which each ingestion partition sets up the initial scan of the rangefeed, and consequently becomes the initial timestamp at which all data is ingested. We stash this timestamp in the replication job details and never update its value.

The motivation for this change was to know the lower bound on both the source and destination cluster for MVCC versions that have been streamed. This is necessary to bound the fingerprinting on both clusters to ensure a match.

Epic: CRDB-18749

Jira issue: CRDB-21946

@adityamaru adityamaru added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-disaster-recovery labels Nov 30, 2022
@blathers-crl
Copy link

blathers-crl bot commented Nov 30, 2022

cc @cockroachdb/disaster-recovery

@blathers-crl
Copy link

blathers-crl bot commented Nov 30, 2022

cc @cockroachdb/disaster-recovery

adityamaru added a commit to adityamaru/cockroach that referenced this issue Dec 1, 2022
Previously, each partition would reach out to the
source cluster and pick its own timestamp from which it
would start ingesting MVCC versions. This timestamp was
used by the rangefeed setup by the partition, to run its
initial scan. Eventually, all the partitions would replicate
up until a certain timestamp and cause the frontier to be
bumped but it was possible for different partitions to begin
ingesting at different timestamps.

This change makes it such that during replication planning when
we create the producer job on the source cluster, we return a timestamp
alongwith the StreamID. This becomes the timestamp at which each
ingestion partition sets up the inital scan of the rangefeed,
and consequently become the inital timestamp at which all data
is ingested. We stash this timestamp in the replication job
details and never update its value. On future resumptions of the
replication job, if there is a progress high water, we will not
run an initial rangefeed scan but instead start the rangefeed from
the previous progress highwater.

The motivation for this change was to know the lower bound on
both the source and destination cluster for MVCC versions that
have been streamed. This is necessary to bound the fingerprinting
on both clusters to ensure a match.

Release note: None

Fixes: cockroachdb#92742
adityamaru added a commit to adityamaru/cockroach that referenced this issue Dec 11, 2022
Previously, each partition would reach out to the
source cluster and pick its own timestamp from which it
would start ingesting MVCC versions. This timestamp was
used by the rangefeed setup by the partition, to run its
initial scan. Eventually, all the partitions would replicate
up until a certain timestamp and cause the frontier to be
bumped but it was possible for different partitions to begin
ingesting at different timestamps.

This change makes it such that during replication planning when
we create the producer job on the source cluster, we return a timestamp
alongwith the StreamID. This becomes the timestamp at which each
ingestion partition sets up the inital scan of the rangefeed,
and consequently become the inital timestamp at which all data
is ingested. We stash this timestamp in the replication job
details and never update its value. On future resumptions of the
replication job, if there is a progress high water, we will not
run an initial rangefeed scan but instead start the rangefeed from
the previous progress highwater.

The motivation for this change was to know the lower bound on
both the source and destination cluster for MVCC versions that
have been streamed. This is necessary to bound the fingerprinting
on both clusters to ensure a match.

Release note: None

Fixes: cockroachdb#92742
craig bot pushed a commit that referenced this issue Dec 13, 2022
92788: streamingccl: tighten replication timestamp semantics r=lidorcarmel a=adityamaru

Previously, each partition would reach out to the
source cluster and pick its own timestamp from which it would start ingesting MVCC versions. This timestamp was used by the rangefeed setup by the partition, to run its initial scan. Eventually, all the partitions would replicate up until a certain timestamp and cause the frontier to be bumped but it was possible for different partitions to begin ingesting at different timestamps.

This change makes it such that during replication planning when we create the producer job on the source cluster, we return a timestamp alongwith the StreamID. This becomes the timestamp at which each ingestion partition sets up the inital scan of the rangefeed, and consequently become the inital timestamp at which all data is ingested. We stash this timestamp in the replication job details and never update its value. On future resumptions of the replication job, if there is a progress high water, we will not run an initial rangefeed scan but instead start the rangefeed from the previous progress highwater.

The motivation for this change was to know the lower bound on both the source and destination cluster for MVCC versions that have been streamed. This is necessary to bound the fingerprinting on both clusters to ensure a match.

Release note: None

Fixes: #92742

Co-authored-by: adityamaru <[email protected]>
@craig craig bot closed this as completed in 523b79d Dec 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-disaster-recovery C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-disaster-recovery
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant