-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
c2c: improve initial scan performance #97592
Labels
A-disaster-recovery
C-investigation
Further steps needed to qualify. C-label will change.
T-disaster-recovery
Comments
msbutler
added
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
C-test-failure
Broken test (automatically or manually discovered).
C-investigation
Further steps needed to qualify. C-label will change.
T-disaster-recovery
labels
Feb 23, 2023
cc @cockroachdb/disaster-recovery |
exalate-issue-sync
bot
removed
the
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
label
Mar 2, 2023
exalate-issue-sync
bot
changed the title
c2c: investigate lagging and uneven work distribution on nodes during initial scan
c2c: improve initial scan performance
Mar 2, 2023
lidorcarmel
added a commit
to lidorcarmel/cockroach
that referenced
this issue
May 17, 2023
Before this change, c2c buffered an SST in memory and flushed it to KV, then, if the SST spans multiple ranges (likely) the addSST fails and therefore split and rewritten to KV. We see about 20-30 retries per addSST happening in a roachtest. The code to split these SSTs on range boundaries already exists, but disabled without a RangeCache. This pr passes a RangeCache to the SST batcher to enable flushing on range boundaries. Epic: CRDB-24777 Informs: cockroachdb#97592 Release note: None
blathers-crl bot
pushed a commit
that referenced
this issue
Jun 5, 2023
Before this change, c2c buffered an SST in memory and flushed it to KV, then, if the SST spans multiple ranges (likely) the addSST fails and therefore split and rewritten to KV. We see about 20-30 retries per addSST happening in a roachtest. The code to split these SSTs on range boundaries already exists, but disabled without a RangeCache. This pr passes a RangeCache to the SST batcher to enable flushing on range boundaries. Epic: CRDB-24777 Informs: #97592 Release note: None
msbutler
removed
the
C-test-failure
Broken test (automatically or manually discovered).
label
Jun 14, 2023
There is always more to do here. But I'm going to close this for now as @adityamaru made nice improvements here and we can open new issues for specific additional improvements. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-disaster-recovery
C-investigation
Further steps needed to qualify. C-label will change.
T-disaster-recovery
Occasionaly during an initial scan, a node on the destination cluster can fall behind on work, leading to poor initial scan performance. This issue tracks why this occurs.
This can be observed in the upcoming
c2c/tpcc/warehouses=1000/duration=60/cutover=30
roachtest metrics: node 6 (red in right graphs) on the destination side fell very behind during the first 10 minutes of ingestion, relative to other nodes, and then ended up with the bulk of the work!Specifically between 14:30 to 14:40 the lagging node still ingested SSTs via replication, but not much though the stream ingestions processor. For some reason, after 14:40, node6 picked up it's work and began directly ingesting the bulk of the SSTs.
Jira issue: CRDB-24777
Epic CRDB-19402
The text was updated successfully, but these errors were encountered: