-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
backupccl: update restore/with-pause test #97800
Conversation
Pre frontier checkpointing metrics: this 80 GB Restore with three pauses + ~3 minutes of sleep takes around 40 minutes, a throughput around 18 mb/s/node. |
This patch replaces the restore2TB/nodes=10/with-pause test with the new restore/pause/tpce/80GB/aws/nodes=4/cpus=8 test. The test now uses the new 80GB tpce fixture, and always pauses the restore job after around 25%, 50%, 75% completion. This patch also increases the portability of the new restore roachtest framework, allowing the new test to use its facilities. Going forward, this test will make it easier to benchmark and test future changes to restore checkpointing. Fixes cockroachdb#94093 Informs cockroachdb#87843 Release note: None
92a31b4
to
493a875
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm.
PrometheusNameSpace, Subsystem: "restore", Name: "duration"}, []string{"test_name"}) | ||
// TODO(msbutler): to test the correctness of checkpointing, we should | ||
// restore the same fixture without pausing it and fingerprint both restored | ||
// databases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can hardcode the fingerprint.. (we still not get it from the unpaused run but it can be done manually).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, i plan to do that once this fingerprint bug is addressed #97916
return sp.c.RunE(ctx, sp.c.Node(1), sp.restoreCmd(target, "")) | ||
} | ||
|
||
func (sp *restoreSpecs) runDetached(ctx context.Context, target string) (jobspb.JobID, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can now delete the old runRestoreDetached?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, in a follow up PR, i plan delete all the old infra and bank tests :D
TFTR! bors r=lidorcarmel |
Build succeeded: |
This patch replaces the restore2TB/nodes=10/with-pause test with the new restore/pause/tpce/80GB/aws/nodes=4/cpus=8 test. The test now uses the new 80GB tpce fixture, and always pauses the restore job after around 25%, 50%, 75% completion.
This patch also increases the portability of the new restore roachtest framework, allowing the new test to use its facilities.
Going forward, this test will make it easier to benchmark and test future changes to restore checkpointing.
Fixes #94093
Informs #87843
Release note: None