Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backupccl: introduce new restore roachtest framework #94143

Merged
merged 1 commit into from
Jan 8, 2023

Conversation

msbutler
Copy link
Collaborator

@msbutler msbutler commented Dec 22, 2022

This patch introduces a new framework for writing restore roachtests that
minimizes code reuse and leverages our new backup fixture organization. The
framework makes it easy to write a new test using a variety of knobs like:

  • hardware: cloud provider, disk volume, # of nodes, # of cpus
  • backup fixture: workload, workload scale

The patch is the first in an ongoing effort to redo our roachtests, and
introduces 3 new roachtests:

  • restore/tpce/400GB: the default configuration: 4 nodes, 8vcpus, 1000 GB EBS,
    restore a tpce backup fixture (25,000 customers, around 400 GB).
  • restore/tpce/400GB/gce: same config as above, run on gce.
  • restore/tpce/8TB/nodes=10: the big one!

Notice that this patch also introduces a new naming convention for restore
tests. The default test is named restore/tpce/400GB and only contains the
basic workload. Each other test name will contain the workload and any specs
which deviate from the default config. For example restore/tpce/400GB/gce
only switches the cloud provider and holds all other variables constant; thus
only the workload and 'gce' are needed in the name.

Future patches will add more tests that use this framework.

Informs #92699

Release note: None

@msbutler msbutler requested a review from a team December 22, 2022 16:03
@msbutler msbutler requested a review from a team as a code owner December 22, 2022 16:03
@msbutler msbutler self-assigned this Dec 22, 2022
@msbutler msbutler requested review from herkolategan and srosenberg and removed request for a team December 22, 2022 16:03
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@srosenberg srosenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @herkolategan, @msbutler, and @rhu713)


pkg/cmd/roachtest/tests/restore.go line 415 at r1 (raw file):

func registerRestore(r registry.Registry) {
	// TODO(msbutler): delete old tests

Nit: seems a bit vague; e.g., I wouldn't necessarily know what the "old tests" are.


pkg/cmd/roachtest/tests/restore.go line 716 at r1 (raw file):

				m.Go(hc.Runner)

				// TODO(msbutler): export to prom/grafana

Meanwhile, you could start node_exporter on each node (see prometheus.Init(...)); this way we'll get system metrics in [1]; crdb metrics are already scraped. Note, this only works for GCE.

[1] https://grafana.testeng.crdb.io/d/Rb2Vb7uZk/system

@msbutler msbutler requested a review from lidorcarmel January 3, 2023 16:59
Copy link
Contributor

@lidorcarmel lidorcarmel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, mostly small comments, the only meaningful one is about releases, not sure how this works when we introduce new releases. maybe you're leaving that for later which is fine.

pkg/cmd/roachtest/tests/restore.go Outdated Show resolved Hide resolved
pkg/cmd/roachtest/tests/restore.go Show resolved Hide resolved
pkg/cmd/roachtest/tests/restore.go Outdated Show resolved Hide resolved
pkg/cmd/roachtest/tests/restore.go Outdated Show resolved Hide resolved
pkg/cmd/roachtest/tests/restore.go Show resolved Hide resolved
pkg/cmd/roachtest/tests/restore.go Outdated Show resolved Hide resolved
pkg/cmd/roachtest/tests/restore.go Outdated Show resolved Hide resolved
pkg/cmd/roachtest/tests/restore.go Outdated Show resolved Hide resolved
pkg/cmd/roachtest/tests/restore.go Outdated Show resolved Hide resolved
pkg/cmd/roachtest/tests/restore.go Outdated Show resolved Hide resolved
@msbutler msbutler force-pushed the butler-restore-revamp branch 5 times, most recently from 77a4979 to 72101b1 Compare January 6, 2023 18:34
pkg/cmd/roachtest/tests/restore.go Outdated Show resolved Hide resolved
@msbutler msbutler force-pushed the butler-restore-revamp branch from 72101b1 to 8dd0b44 Compare January 6, 2023 20:24
This patch introduces a new framework for writing restore roachtests that
minimizes code reuse and leverages our new backup fixture organization. The
framework makes it easy to write a new test using a variety of knobs like:
- hardware: cloud provider, disk volume, # of nodes, # of cpus
- backup fixture: workload, workload scale

The patch is the first in an ongoing effort to redo our roachtests, and
introduces 3 new roachtests:
- restore/tpce/400GB: the default configuration: 4 nodes, 8vcpus, 1000 GB EBS,
  restore a tpce backup fixture (25,000 customers, around 400 GB).
- restore/tpce/400GB/gce: same config as above, run on gce.
- restore/tpce/8TB/nodes=10: the big one!

Notice that this patch also introduces a new naming convention for restore
tests.  The default test is named `restore/tpce/400GB` and only contains the
basic workload. Each other test name will contain the workload and any specs
which deviate from the default config. For example `restore/tpce/400GB/gce`
only switches the cloud provider and holds all other variables constant; thus
only the workload and 'gce' are needed in the name.

Future patches will add more tests that use this framework.

Informs cockroachdb#92699

Release note: None

enforce naming convention
@msbutler msbutler force-pushed the butler-restore-revamp branch from 8dd0b44 to 4056275 Compare January 6, 2023 21:51
@msbutler
Copy link
Collaborator Author

msbutler commented Jan 7, 2023

TFTR!!

bors r=lidorcarmel

@craig
Copy link
Contributor

craig bot commented Jan 8, 2023

Build succeeded:

srosenberg added a commit to srosenberg/cockroach that referenced this pull request Jul 2, 2023
In [1], new restore tests have been added to load fixtures
from `s3://cockroach-fixtures`. Since the regional bucket
is located in us-east-1, and roachprod provisions in
us-east-2, egress during nightly and weekly runs amounted
to non-trivial cost.

This change replaces the bucket with the regional
`s3://cockroach-fixtures-us-east-2`, which was replicated
from us-east-1. As a precaution, other roachtest which
depend on `s3://cockroach-fixtures` but do not currently
run in AWS, are now guarded by a conditional skip.
A new issue (cockroachdb#105968) has been created to address
the problem of localizing fixtures under a given
cloud provider.

[1] cockroachdb#94143

Epic: none
Informs: cockroachdb#105968

Release note: None
craig bot pushed a commit that referenced this pull request Jul 3, 2023
105969: roachtest: load AWS fixtures from us-east-2 to avoid egress r=herkolategan a=srosenberg

In [1], new restore tests have been added to load fixtures from `s3://cockroach-fixtures`. Since the regional bucket is located in us-east-1, and roachprod provisions in us-east-2, egress during nightly and weekly runs amounted to non-trivial cost.

This change replaces the bucket with the regional
`s3://cockroach-fixtures-us-east-2`, which was replicated from us-east-1. As a precaution, other roachtest which depend on `s3://cockroach-fixtures` but do not currently run in AWS, are now guarded by a conditional skip. A new issue (#105968) has been created to address
the problem of localizing fixtures under a given
cloud provider.

[1] #94143

Epic: none
Informs: #105968

Release note: None

Co-authored-by: Stan Rosenberg <[email protected]>
blathers-crl bot pushed a commit that referenced this pull request Jul 3, 2023
In [1], new restore tests have been added to load fixtures
from `s3://cockroach-fixtures`. Since the regional bucket
is located in us-east-1, and roachprod provisions in
us-east-2, egress during nightly and weekly runs amounted
to non-trivial cost.

This change replaces the bucket with the regional
`s3://cockroach-fixtures-us-east-2`, which was replicated
from us-east-1. As a precaution, other roachtest which
depend on `s3://cockroach-fixtures` but do not currently
run in AWS, are now guarded by a conditional skip.
A new issue (#105968) has been created to address
the problem of localizing fixtures under a given
cloud provider.

[1] #94143

Epic: none
Informs: #105968

Release note: None
srosenberg added a commit to srosenberg/cockroach that referenced this pull request Jul 3, 2023
In [1], new restore tests have been added to load fixtures
from `s3://cockroach-fixtures`. Since the regional bucket
is located in us-east-1, and roachprod provisions in
us-east-2, egress during nightly and weekly runs amounted
to non-trivial cost.

This change replaces the bucket with the regional
`s3://cockroach-fixtures-us-east-2`, which was replicated
from us-east-1. As a precaution, other roachtest which
depend on `s3://cockroach-fixtures` but do not currently
run in AWS, are now guarded by a conditional skip.
A new issue (cockroachdb#105968) has been created to address
the problem of localizing fixtures under a given
cloud provider.

[1] cockroachdb#94143

Epic: none
Informs: cockroachdb#105968

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants