-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
streamingccl: tighten replication timestamp semantics #92788
Conversation
d604467
to
6cf6ab4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
pkg/ccl/streamingccl/streamingest/stream_replication_e2e_test.go
Outdated
Show resolved
Hide resolved
pkg/ccl/streamingccl/streamingest/stream_replication_e2e_test.go
Outdated
Show resolved
Hide resolved
4a364f5
to
8fafe9f
Compare
Previously, each partition would reach out to the source cluster and pick its own timestamp from which it would start ingesting MVCC versions. This timestamp was used by the rangefeed setup by the partition, to run its initial scan. Eventually, all the partitions would replicate up until a certain timestamp and cause the frontier to be bumped but it was possible for different partitions to begin ingesting at different timestamps. This change makes it such that during replication planning when we create the producer job on the source cluster, we return a timestamp alongwith the StreamID. This becomes the timestamp at which each ingestion partition sets up the inital scan of the rangefeed, and consequently become the inital timestamp at which all data is ingested. We stash this timestamp in the replication job details and never update its value. On future resumptions of the replication job, if there is a progress high water, we will not run an initial rangefeed scan but instead start the rangefeed from the previous progress highwater. The motivation for this change was to know the lower bound on both the source and destination cluster for MVCC versions that have been streamed. This is necessary to bound the fingerprinting on both clusters to ensure a match. Release note: None Fixes: cockroachdb#92742
f4225c6
to
523b79d
Compare
Both failures are flakes: TFTR! bors r=lidorcarmel |
Build failed: |
Failed on |
// start ingesting data in the replication job. This timestamp is empty unless | ||
// the replication job resumes after a progress checkpoint has been recorded. | ||
// While it is empty we use the InitialScanTimestamp described below. | ||
optional util.hlc.Timestamp previous_high_water_timestamp = 2 [(gogoproto.nullable) = false]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the new name.
bors retry |
Build failed: |
Now its bors retry |
Build failed (retrying...): |
Build failed (retrying...): |
Build failed (retrying...): |
Build failed (retrying...): |
Build failed (retrying...): |
Build failed (retrying...): |
Build succeeded: |
Previously, each partition would reach out to the
source cluster and pick its own timestamp from which it would start ingesting MVCC versions. This timestamp was used by the rangefeed setup by the partition, to run its initial scan. Eventually, all the partitions would replicate up until a certain timestamp and cause the frontier to be bumped but it was possible for different partitions to begin ingesting at different timestamps.
This change makes it such that during replication planning when we create the producer job on the source cluster, we return a timestamp alongwith the StreamID. This becomes the timestamp at which each ingestion partition sets up the inital scan of the rangefeed, and consequently become the inital timestamp at which all data is ingested. We stash this timestamp in the replication job details and never update its value. On future resumptions of the replication job, if there is a progress high water, we will not run an initial rangefeed scan but instead start the rangefeed from the previous progress highwater.
The motivation for this change was to know the lower bound on both the source and destination cluster for MVCC versions that have been streamed. This is necessary to bound the fingerprinting on both clusters to ensure a match.
Release note: None
Fixes: #92742