Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backupccl: update restore/with-pause test #97800

Merged
merged 1 commit into from
Mar 3, 2023

Conversation

msbutler
Copy link
Collaborator

This patch replaces the restore2TB/nodes=10/with-pause test with the new restore/pause/tpce/80GB/aws/nodes=4/cpus=8 test. The test now uses the new 80GB tpce fixture, and always pauses the restore job after around 25%, 50%, 75% completion.

This patch also increases the portability of the new restore roachtest framework, allowing the new test to use its facilities.

Going forward, this test will make it easier to benchmark and test future changes to restore checkpointing.

Fixes #94093
Informs #87843

Release note: None

@msbutler msbutler requested a review from a team as a code owner February 28, 2023 18:32
@msbutler msbutler self-assigned this Feb 28, 2023
@msbutler msbutler requested review from smg260 and renatolabs and removed request for a team February 28, 2023 18:32
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@msbutler
Copy link
Collaborator Author

Pre frontier checkpointing metrics: this 80 GB Restore with three pauses + ~3 minutes of sleep takes around 40 minutes, a throughput around 18 mb/s/node.

This patch replaces the restore2TB/nodes=10/with-pause test with the new
restore/pause/tpce/80GB/aws/nodes=4/cpus=8 test. The test now uses the new 80GB
tpce fixture, and always pauses the restore job after around 25%, 50%, 75%
completion.

This patch also increases the portability of the new restore roachtest
framework, allowing the new test to use its facilities.

Going forward, this test will make it easier to benchmark and test future
changes to restore checkpointing.

Fixes cockroachdb#94093
Informs cockroachdb#87843

Release note: None
Copy link
Contributor

@lidorcarmel lidorcarmel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

PrometheusNameSpace, Subsystem: "restore", Name: "duration"}, []string{"test_name"})
// TODO(msbutler): to test the correctness of checkpointing, we should
// restore the same fixture without pausing it and fingerprint both restored
// databases.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can hardcode the fingerprint.. (we still not get it from the unpaused run but it can be done manually).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i plan to do that once this fingerprint bug is addressed #97916

return sp.c.RunE(ctx, sp.c.Node(1), sp.restoreCmd(target, ""))
}

func (sp *restoreSpecs) runDetached(ctx context.Context, target string) (jobspb.JobID, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can now delete the old runRestoreDetached?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, in a follow up PR, i plan delete all the old infra and bank tests :D

@msbutler
Copy link
Collaborator Author

msbutler commented Mar 3, 2023

TFTR!

bors r=lidorcarmel

@craig
Copy link
Contributor

craig bot commented Mar 3, 2023

Build succeeded:

@craig craig bot merged commit bf7dcba into cockroachdb:master Mar 3, 2023
@msbutler msbutler deleted the butler-restore-pause-test branch March 4, 2023 00:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

backupccl: in restore/with-pause roachtest, pause at 25/50/75% progress, not every 15 mins
3 participants