roach{prod,test}: add first-class support for disk snapshots #103757

irfansharif · 2023-05-23T04:19:43Z

Part of #89978. Pre-cursor to #83826. Part of #98703.

Long-lived disk snapshots can drastically reduce testing time for scale tests. Tests, whether run by hand or through CI, need only run the long running fixture generating code (importing some dataset, generating it organically through workload, etc.) once snapshot fingerprints are changed, fingerprints that incorporate the major crdb version that generated them.

Here's an example run that freshly generates disk snapshots:

=== RUN   admission-control/index-backfill
no existing snapshots found for admission-control/index-backfill (ac-index-backfill), doing pre-work
created volume snapshot ac-index-backfill-0001-vunknown-1-n2-standard-8 for volume irfansharif-snapshot-0001-1 on irfansharif-snapshot-0001-1/n1
using 1 newly created snapshot(s) with prefix "ac-index-backfill"
detached and deleted volume irfansharif-snapshot-0001-1 from irfansharif-snapshot-0001
created volume irfansharif-snapshot-0001-1
attached volume irfansharif-snapshot-0001-1 to irfansharif-snapshot-0001
mounted irfansharif-snapshot-0001-1 to irfansharif-snapshot-0001
--- PASS: admission-control/index-backfill (79.14s)

Here's a subsequent run that makes use of the aforementioned disk snapshots:

=== RUN   admission-control/index-backfill
using 1 pre-existing snapshot(s) with prefix "ac-index-backfill"
detached and deleted volume irfansharif-snapshot-0001-1 from irfansharif-snapshot-0001
created volume irfansharif-snapshot-0001-1
attached volume irfansharif-snapshot-0001-1 to irfansharif-snapshot-0001
mounted irfansharif-snapshot-0001-1 to irfansharif-snapshot-0001
--- PASS: admission-control/index-backfill (43.47s)

We add the following APIs to the roachtest.Cluster interface, for tests to interact with disk snapshots. admission-control/index-backfill is an example test making use of these APIs.

  type Cluster interface {
      // ...

      // CreateSnapshot creates volume snapshots of the cluster using
      // the given prefix. These snapshots can later be retrieved,
      // deleted or applied to already instantiated clusters.
      CreateSnapshot(ctx context.Context, snapshotPrefix string) error

      // ListSnapshots lists the individual volume snapshots that
      // satisfy the search criteria.
      ListSnapshots(
        ctx context.Context, vslo vm.VolumeSnapshotListOpts,
      ) ([]vm.VolumeSnapshot, error)

      // DeleteSnapshots permanently deletes the given snapshots.
      DeleteSnapshots(
        ctx context.Context, snapshots ...vm.VolumeSnapshot,
      ) error

      // ApplySnapshots applies the given volume snapshots to the
      // underlying cluster. This is a destructive operation as far as
      // existing state is concerned - all already-attached volumes are
      // detached and deleted to make room for new snapshot-derived
      // volumes. The new volumes are created using the same specs
      // (size, disk type, etc.) as the original cluster.
      ApplySnapshots(
        ctx context.Context, snapshots []vm.VolumeSnapshot,
      ) error
  }

These Cluster APIs are in turn is powered by the following additions to the vm.Provider interface, implemented by each cloud provider. GCE is the fully spec-ed out one for now.

  type Provider interface {
      // ...

      // CreateVolume creates a new volume using the given options.
      CreateVolume(l *logger.Logger, vco VolumeCreateOpts) (Volume, error)

      // ListVolumes lists all volumes already attached to the given VM.
      ListVolumes(l *logger.Logger, vm *VM) ([]Volume, error)

      // DeleteVolume detaches and deletes the given volume from the
      // given VM.
      DeleteVolume(l *logger.Logger, volume Volume, vm *VM) error

      // AttachVolume attaches the given volume to the given VM.
      AttachVolume(l *logger.Logger, volume Volume, vm *VM) (string, error)

      // CreateVolumeSnapshot creates a snapshot of the given volume,
      // using the given options.
      CreateVolumeSnapshot(
        l *logger.Logger, volume Volume, vsco VolumeSnapshotCreateOpts,
      ) (VolumeSnapshot, error)

      // ListVolumeSnapshots lists the individual volume snapshots that
      // satisfy the search criteria.
      ListVolumeSnapshots(
        l *logger.Logger, vslo VolumeSnapshotListOpts,
      ) ([]VolumeSnapshot, error)

      // DeleteVolumeSnapshot permanently deletes the given snapshot.
      DeleteVolumeSnapshot(l *logger.Logger, snapshot VolumeSnapshot) error
  }

Since these snapshots necessarily outlive the tests, and we don't want them dangling perpetually, we introduce a prune-dangling roachtest that acts as a poor man's cron job, sifting through expired snapshots (>30days) and deleting them. For GCE at least it's not obvious to me how to create these snapshots in cloud buckets with a TTL built in, hence this hack. It looks like this (with change to the TTL):

=== RUN   prune-dangling
pruned old snapshot ac-index-backfill-0001-vunknown-1-n2-standard-8
--- PASS: prune-dangling (8.59s)

We add expose some of these APIs through the roachprod binary directly.

$ roachprod snapshot --help
  snapshot enables creating/listing/deleting/applying cluster snapshots

  Usage:
    roachprod snapshot [command]

  Available Commands:
    create      snapshot a named cluster, using the given snapshot name and description
    list        list all snapshots for the given cloud provider, optionally filtering by the given name
    delete      delete all snapshots for the given cloud provider optionally filtering by the given name
    apply       apply the named snapshots from the given cloud provider to the named cluster

About admission-control/index-backfill. It's a fully featured test that uses the TPC-C 100k dataset and runs a foreground load for 20k customers. It takes >4hrs to import this data set; with disk snapshots this step is skipped entirely and takes a few minutes. The actual test is trivial, we run the foreground load for 1hr and run a large index backfill concurrently. Before #98308, this results in wild performance oscillations. It's still a bit wild after flow control, but less so.

We slightly extend the tpc-e harness to make this happen, adding a few smarts: exposing a 'during' helper to run backfills concurrently with foreground load, integrate with --skip-init, estimated setup times, prometheus, and disk snapshots of course.

Release note: None

cockroach-teamcity · 2023-05-23T04:19:56Z

This change is

Pure code movement. We'll make use of it outside this file in subsequent commits. Release note: None

These tests have been stable for a few months now. Reduce to a weekly cadence. Release note: None

smg260

Nice! Doubtlessly will result in significant time (and cost) savings. Just minor observations/questions from me.

Reviewed 2 of 21 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @srosenberg, and @sumeerbhola)

pkg/cmd/roachtest/cluster/cluster_interface.go line 144 at r1 (raw file):

	// prefix. These snapshots can later be retrieved, deleted or applied to
	// already instantiated clusters.
	CreateSnapshot(ctx context.Context, snapshotPrefix string) error

Seems like also returning a vm.VolumeSnapshot would make sense here.

pkg/cmd/roachtest/tests/admission_control_index_backfill.go line 51 at r1 (raw file):

				// TODO(irfansharif): Search by taking in the other parts of the
				// snapshot fingerprint, i.e. the node count, the version, etc.
				Name: snapshotPrefix,

Nit: Name would be clearer as NamePrefix

pkg/cmd/roachtest/tests/admission_control_index_backfill.go line 59 at r1 (raw file):

				t.L().Printf("no existing snapshots found for %s (%s), doing pre-work", t.Name(), snapshotPrefix)
				// TODO(irfansharif): Add validation that we're some released
				// version, probably the predecessor one. Also ensure that any

This is where we'd do the "fixture" work to get the disk in a state to snapshot, right?

Doesn't it make sense to only c.ApplySnapshots(..) when not in this branch (as opposed to always, below)

pkg/cmd/roachtest/tests/prune_dangling_snapshots_and_disks.go line 28 at r1 (raw file):

// (GCE) let you store volume snapshots in buckets with a pre-configured TTL. So
// we use this nightly roachtest as a poor man's cron job.
func registerPruneDanglingSnapshotsAndDisks(r registry.Registry) {

A nice idea that ultimately should belong elsewhere as part of the framework - perhaps in the test_runner, during clean up?

pkg/cmd/roachtest/tests/prune_dangling_snapshots_and_disks.go line 51 at r1 (raw file):

			for _, snapshot := range snapshots {
				if err := c.DeleteSnapshots(ctx, snapshot); err != nil {

Why loop if the 2nd arg is variadic? A slice of err could be returned.

sumeerbhola

drive-by review -- admission_control*.go files look good.

Reviewed 21 of 21 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif and @srosenberg)

pkg/cmd/roachtest/cluster/cluster_interface.go line 143 at r1 (raw file):

	// CreateSnapshot creates volume snapshots of the cluster using the given
	// prefix. These snapshots can later be retrieved, deleted or applied to
	// already instantiated clusters.

Maybe worth documenting the noop case for the local setup. The assumption this interface makes is that the roachtest that needed to do CreateSnapshot will proceed with then using the already populated disk and not rely on the rest of these snapshot methods actually doing something.

pkg/cmd/roachtest/cluster/cluster_interface.go line 154 at r1 (raw file):

	// concerned - all already-attached volumes are detached and deleted to make
	// room for new snapshot-derived volumes. The new volumes are created using
	// the same specs (size, disk type, etc.) as the original cluster.

is this assumed to be one volume per node? Some of the nodes may not be running CockroachDB -- does that matter?

I see the implementation assumes equal nodes in the cluster. Worth adding to the code comment.

irfansharif

TFTR!

@smg260, PTAL -- there a few more non admission_control*.go changes (around the tpce.go harness, and exposing top-level subcommands in roachprod) in addition to addressing your comments below.

@sumeerbhola, @bananabrick I fleshed out admission_control_index_backfill.go to be fully fledged. You could take another look. It's entirely optional for both y'all.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @smg260, @srosenberg, and @sumeerbhola)

pkg/cmd/roachtest/cluster/cluster_interface.go line 143 at r1 (raw file):

Previously, sumeerbhola wrote…

Maybe worth documenting the noop case for the local setup. The assumption this interface makes is that the roachtest that needed to do CreateSnapshot will proceed with then using the already populated disk and not rely on the rest of these snapshot methods actually doing something.

Done.

pkg/cmd/roachtest/cluster/cluster_interface.go line 144 at r1 (raw file):

Previously, smg260 (Miral Gadani) wrote…

Seems like also returning a vm.VolumeSnapshot would make sense here.

Done, great suggestion!

pkg/cmd/roachtest/cluster/cluster_interface.go line 154 at r1 (raw file):

Previously, sumeerbhola wrote…

is this assumed to be one volume per node? Some of the nodes may not be running CockroachDB -- does that matter?

I see the implementation assumes equal nodes in the cluster. Worth adding to the code comment.

It's assuming one volume per node but can be changed, so added a TODO. Ditto for different kinds of volumes. It doesn't matter whether CRDB is running or not - it falls back to using "unknown" as the fingerprint version. But that's not desirable -- and I'll leave it to test authors to fiddle with it and improve safeguards. Also, CRDB can't be running, else we'd risk taking inconsistent disk snapshots cluster wide and cause badness. So the CRDB version is retrieved by looking at binary directly.

pkg/cmd/roachtest/tests/admission_control_index_backfill.go line 51 at r1 (raw file):

Previously, smg260 (Miral Gadani) wrote…

Nit: Name would be clearer as NamePrefix

Done.

pkg/cmd/roachtest/tests/admission_control_index_backfill.go line 59 at r1 (raw file):

Previously, smg260 (Miral Gadani) wrote…

This is where we'd do the "fixture" work to get the disk in a state to snapshot, right?

Doesn't it make sense to only c.ApplySnapshots(..) when not in this branch (as opposed to always, below)

Done. I'd written it that way originally to always exercise the apply-snapshot slow path, to increase robustness, but it just looks awkward.

pkg/cmd/roachtest/tests/prune_dangling_snapshots_and_disks.go line 28 at r1 (raw file):

Previously, smg260 (Miral Gadani) wrote…

A nice idea that ultimately should belong elsewhere as part of the framework - perhaps in the test_runner, during clean up?

Unless you insist strongly otherwise, I feel like this is better. These things add up to some non-trivial latency and the test runner clean up is invoked too frequently for this checking. Also it's a bit nice to be able to run this manually through just:

roachtest run prune-dangling --cockroach ./cockroach --cluster irfansharif-prune

And get specific logging for it.

pkg/cmd/roachtest/tests/prune_dangling_snapshots_and_disks.go line 51 at r1 (raw file):

Previously, smg260 (Miral Gadani) wrote…

Why loop if the 2nd arg is variadic? A slice of err could be returned.

It's just to get per-pruning logging below, which I've found useful. Slice of errors is unconventional and difficult to work with - you've got to figure out what indexes have succeeded and what haven't.

smg260

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @srosenberg, and @sumeerbhola)

pkg/cmd/roachtest/tests/prune_dangling_snapshots_and_disks.go line 28 at r1 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Unless you insist strongly otherwise, I feel like this is better. These things add up to some non-trivial latency and the test runner clean up is invoked too frequently for this checking. Also it's a bit nice to be able to run this manually through just:
roachtest run prune-dangling --cockroach ./cockroach --cluster irfansharif-prune
And get specific logging for it.

Fair.

Not a strong opinion; I was alluding to a clean up phase at the end of running all tests and it was more of a general comment of how we could improve roachtest - definitely not for this PR.

irfansharif · 2023-05-26T20:32:34Z

Spoke to @smg260 offline -- running the nightly roachtest suite manually showed that our TC agents need updated GCE permissions to be able to run these gcloud snapshot/volume related operations. So checking in at skipped for now. Internal thread, https://cockroachlabs.atlassian.net/browse/DEVINF-768.

Long-lived disk snapshots can drastically reduce testing time for scale tests. Tests, whether run by hand or through CI, need only run the long running fixture generating code (importing some dataset, generating it organically through workload, etc.) once snapshot fingerprints are changed, fingerprints that incorporate the major crdb version that generated them. Here's an example run that freshly generates disk snapshots: === RUN admission-control/index-backfill no existing snapshots found for admission-control/index-backfill (ac-index-backfill), doing pre-work created volume snapshot ac-index-backfill-0001-vunknown-1-n2-standard-8 for volume irfansharif-snapshot-0001-1 on irfansharif-snapshot-0001-1/n1 using 1 newly created snapshot(s) with prefix "ac-index-backfill" detached and deleted volume irfansharif-snapshot-0001-1 from irfansharif-snapshot-0001 created volume irfansharif-snapshot-0001-1 attached volume irfansharif-snapshot-0001-1 to irfansharif-snapshot-0001 mounted irfansharif-snapshot-0001-1 to irfansharif-snapshot-0001 --- PASS: admission-control/index-backfill (79.14s) Here's a subsequent run that makes use of the aforementioned disk snapshots: === RUN admission-control/index-backfill using 1 pre-existing snapshot(s) with prefix "ac-index-backfill" detached and deleted volume irfansharif-snapshot-0001-1 from irfansharif-snapshot-0001 created volume irfansharif-snapshot-0001-1 attached volume irfansharif-snapshot-0001-1 to irfansharif-snapshot-0001 mounted irfansharif-snapshot-0001-1 to irfansharif-snapshot-0001 --- PASS: admission-control/index-backfill (43.47s) We add the following APIs to the roachtest.Cluster interface, for tests to interact with disk snapshots. admission-control/index-backfill is an example test making use of these APIs. type Cluster interface { // ... // CreateSnapshot creates volume snapshots of the cluster using // the given prefix. These snapshots can later be retrieved, // deleted or applied to already instantiated clusters. CreateSnapshot(ctx context.Context, snapshotPrefix string) error // ListSnapshots lists the individual volume snapshots that // satisfy the search criteria. ListSnapshots( ctx context.Context, vslo vm.VolumeSnapshotListOpts, ) ([]vm.VolumeSnapshot, error) // DeleteSnapshots permanently deletes the given snapshots. DeleteSnapshots( ctx context.Context, snapshots ...vm.VolumeSnapshot, ) error // ApplySnapshots applies the given volume snapshots to the // underlying cluster. This is a destructive operation as far as // existing state is concerned - all already-attached volumes are // detached and deleted to make room for new snapshot-derived // volumes. The new volumes are created using the same specs // (size, disk type, etc.) as the original cluster. ApplySnapshots( ctx context.Context, snapshots []vm.VolumeSnapshot, ) error } These Cluster APIs are in turn is powered by the following additions to the vm.Provider interface, implemented by each cloud provider. GCE is the fully spec-ed out one for now. type Provider interface { // ... // CreateVolume creates a new volume using the given options. CreateVolume(l *logger.Logger, vco VolumeCreateOpts) (Volume, error) // ListVolumes lists all volumes already attached to the given VM. ListVolumes(l *logger.Logger, vm *VM) ([]Volume, error) // DeleteVolume detaches and deletes the given volume from the // given VM. DeleteVolume(l *logger.Logger, volume Volume, vm *VM) error // AttachVolume attaches the given volume to the given VM. AttachVolume(l *logger.Logger, volume Volume, vm *VM) (string, error) // CreateVolumeSnapshot creates a snapshot of the given volume, // using the given options. CreateVolumeSnapshot( l *logger.Logger, volume Volume, vsco VolumeSnapshotCreateOpts, ) (VolumeSnapshot, error) // ListVolumeSnapshots lists the individual volume snapshots that // satisfy the search criteria. ListVolumeSnapshots( l *logger.Logger, vslo VolumeSnapshotListOpts, ) ([]VolumeSnapshot, error) // DeleteVolumeSnapshot permanently deletes the given snapshot. DeleteVolumeSnapshot(l *logger.Logger, snapshot VolumeSnapshot) error } Since these snapshots necessarily outlive the tests, and we don't want them dangling perpetually, we introduce a prune-dangling roachtest that acts as a poor man's cron job, sifting through expired snapshots (>30days) and deleting them. For GCE at least it's not obvious to me how to create these snapshots in cloud buckets with a TTL built in, hence this hack. It looks like this (with change to the TTL): === RUN prune-dangling pruned old snapshot ac-index-backfill-0001-vunknown-1-n2-standard-8 --- PASS: prune-dangling (8.59s) --- We add expose some of these APIs through the roachprod binary directly. $ roachprod snapshot --help snapshot enables creating/listing/deleting/applying cluster snapshots Usage: roachprod snapshot [command] Available Commands: create snapshot a named cluster, using the given snapshot name and description list list all snapshots for the given cloud provider, optionally filtering by the given name delete delete all snapshots for the given cloud provider optionally filtering by the given name apply apply the named snapshots from the given cloud provider to the named cluster --- About admission-control/index-backfill. It's a fully featured test that uses the TPC-C 100k dataset and runs a foreground load for 20k customers. It takes >4hrs to import this data set; with disk snapshots this step is skipped entirely and takes a few minutes. The actual test is trivial, we run the foreground load for 1hr and run a large index backfill concurrently. Before cockroachdb#98308, this results in wild performance oscillations. It's still a bit wild after flow control, but less so. We slightly extend the tpc-e harness to make this happen, adding a few smarts: exposing a 'during' helper to run backfills concurrently with foreground load, integrate with --skip-init, estimated setup times, prometheus, and disk snapshots of course. Release note: None

smg260

For posterity: OK to merge as is. @irfansharif to revisit once required permissions for TC agents are granted.

Reviewed 28 of 28 files at r4, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @srosenberg, and @sumeerbhola)

smg260

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @irfansharif, @srosenberg, and @sumeerbhola)

irfansharif · 2023-05-26T20:40:53Z

Thanks!

bors r+

craig · 2023-05-26T21:13:44Z

Build succeeded:

Bazel Essential CI (Cockroach)

Support for disk snapshots has been recently added in [1]. This change adds minor fixes which were necessary to run in a different environment which was previously tested. Specifically, several commands were missing `--project` which resulted in (silent) failures. Error propagation from `c.Parallel` was signaling an internal error instead of passing command's error. (Granted, the existing mechanism for error propagation is to be revisited in [2].) [1] cockroachdb#103757 [2] cockroachdb#104312 Epic: none Release note: None

104573: roachprod: minor fixes for snapshots r=srosenberg a=srosenberg Support for disk snapshots has been recently added in [1]. This change adds minor fixes which were necessary to run in a different environment which was previously tested. Specifically, several commands were missing `--project` which resulted in (silent) failures. Error propagation from `c.Parallel` was signaling an internal error instead of passing command's error. (Granted, the existing mechanism for error propagation is to be revisited in [2].) [1] #103757 [2] #104312 Epic: none Release note: None Co-authored-by: Stan Rosenberg <[email protected]>

irfansharif requested review from sumeerbhola and a team May 23, 2023 04:19

irfansharif requested a review from a team as a code owner May 23, 2023 04:19

irfansharif requested review from srosenberg and smg260 and removed request for a team May 23, 2023 04:19

irfansharif added 2 commits May 23, 2023 01:19

roachtest: unnest tpc-e fixture code

b22cc1a

Pure code movement. We'll make use of it outside this file in subsequent commits. Release note: None

roachtest: run multitenant-fairness tests weekly

f201f9e

These tests have been stable for a few months now. Reduce to a weekly cadence. Release note: None

irfansharif force-pushed the 230507.roachtest-snapshots branch 4 times, most recently from 0a8ba7a to f13b01d Compare May 23, 2023 13:23

smg260 reviewed May 24, 2023

View reviewed changes

sumeerbhola reviewed May 24, 2023

View reviewed changes

irfansharif force-pushed the 230507.roachtest-snapshots branch 2 times, most recently from b1c58e7 to 5dbd1bf Compare May 25, 2023 20:35

irfansharif commented May 25, 2023

View reviewed changes

irfansharif mentioned this pull request May 25, 2023

roachtest: add admission-control/index-backfill #103816

Closed

irfansharif requested a review from smg260 May 25, 2023 20:39

irfansharif force-pushed the 230507.roachtest-snapshots branch from 5dbd1bf to d6074df Compare May 25, 2023 20:40

smg260 reviewed May 26, 2023

View reviewed changes

irfansharif force-pushed the 230507.roachtest-snapshots branch from d6074df to 5ddb79b Compare May 26, 2023 20:30

irfansharif force-pushed the 230507.roachtest-snapshots branch from 5ddb79b to 23110d7 Compare May 26, 2023 20:35

smg260 approved these changes May 26, 2023

View reviewed changes

craig bot merged commit bd04744 into cockroachdb:master May 26, 2023

irfansharif deleted the 230507.roachtest-snapshots branch May 26, 2023 21:21

irfansharif mentioned this pull request May 26, 2023

admission: roachtest-ify index/column backfills impacting foreground traffic #83826

Closed

srosenberg mentioned this pull request Jun 8, 2023

roachprod: minor fixes for snapshots #104573

Merged

msbutler mentioned this pull request Jun 13, 2023

roachtest: use pd-ssd as default, even when VolumeSize is set #104762

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

roach{prod,test}: add first-class support for disk snapshots #103757

roach{prod,test}: add first-class support for disk snapshots #103757

irfansharif commented May 23, 2023 •

edited

Loading

cockroach-teamcity commented May 23, 2023

smg260 left a comment

sumeerbhola left a comment

irfansharif left a comment

smg260 left a comment

irfansharif commented May 26, 2023 •

edited

Loading

smg260 left a comment

smg260 left a comment

irfansharif commented May 26, 2023

craig bot commented May 26, 2023

roach{prod,test}: add first-class support for disk snapshots #103757

roach{prod,test}: add first-class support for disk snapshots #103757

Conversation

irfansharif commented May 23, 2023 • edited Loading

cockroach-teamcity commented May 23, 2023

smg260 left a comment

Choose a reason for hiding this comment

sumeerbhola left a comment

Choose a reason for hiding this comment

irfansharif left a comment

Choose a reason for hiding this comment

smg260 left a comment

Choose a reason for hiding this comment

irfansharif commented May 26, 2023 • edited Loading

smg260 left a comment

Choose a reason for hiding this comment

smg260 left a comment

Choose a reason for hiding this comment

irfansharif commented May 26, 2023

craig bot commented May 26, 2023

irfansharif commented May 23, 2023 •

edited

Loading

irfansharif commented May 26, 2023 •

edited

Loading