-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
roach{prod,test}: add first-class support for disk snapshots
Long-lived disk snapshots can drastically reduce testing time for scale tests. Tests, whether run by hand or through CI, need only run the long running fixture generating code (importing some dataset, generating it organically through workload, etc.) once snapshot fingerprints are changed, fingerprints that incorporate the major crdb version that generated them. Here's an example run that freshly generates disk snapshots: === RUN admission-control/index-backfill 03:57:19 admission_control_index_backfill.go:53: no existing snapshots found for admission-control/index-backfill (ac-index-backfill), doing pre-work 03:57:54 roachprod.go:1626: created volume snapshot ac-index-backfill-0001-vunknown-1-n2-standard-8 (id=6426236595187320652) for volume irfansharif-snapshot-0001-1 on irfansharif-snapshot-0001-1/n1 03:57:55 admission_control_index_backfill.go:61: using 1 newly created snapshot(s) with prefix "ac-index-backfill" 03:58:02 roachprod.go:1716: detached and deleted volume irfansharif-snapshot-0001-1 from irfansharif-snapshot-0001 03:58:28 roachprod.go:1764: created volume irfansharif-snapshot-0001-1 03:58:33 roachprod.go:1770: attached volume irfansharif-snapshot-0001-1 to irfansharif-snapshot-0001 03:58:36 roachprod.go:1783: mounted irfansharif-snapshot-0001-1 to irfansharif-snapshot-0001 --- PASS: admission-control/index-backfill (79.14s) Here's a subsequent run that makes use of the aforementioned disk snapshot: === RUN admission-control/index-backfill 04:00:40 admission_control_index_backfill.go:63: using 1 pre-existing snapshot(s) with prefix "ac-index-backfill" 04:00:47 roachprod.go:1716: detached and deleted volume irfansharif-snapshot-0001-1 from irfansharif-snapshot-0001 04:01:14 roachprod.go:1763: created volume irfansharif-snapshot-0001-1 04:01:19 roachprod.go:1769: attached volume irfansharif-snapshot-0001-1 to irfansharif-snapshot-0001 04:01:22 roachprod.go:1782: mounted irfansharif-snapshot-0001-1 to irfansharif-snapshot-0001 --- PASS: admission-control/index-backfill (43.47s) We add the following APIs to the roachtest.Cluster interface, for tests to interact with disk snapshots. admission-control/index-backfill is a placeholder test making use of these APIs. type Cluster interface { // ... // CreateSnapshot creates volume snapshots of the cluster using // the given prefix. These snapshots can later be retrieved, // deleted or applied to already instantiated clusters. CreateSnapshot(ctx context.Context, snapshotPrefix string) error // ListSnapshots lists the individual volume snapshots that // satisfy the search criteria. ListSnapshots( ctx context.Context, vslo vm.VolumeSnapshotListOpts, ) ([]vm.VolumeSnapshot, error) // DeleteSnapshots permanently deletes the given snapshots. DeleteSnapshots( ctx context.Context, snapshots ...vm.VolumeSnapshot, ) error // ApplySnapshots applies the given volume snapshots to the // underlying cluster. This is a destructive operation as far as // existing state is concerned - all already-attached volumes are // detached and deleted to make room for new snapshot-derived // volumes. The new volumes are created using the same specs // (size, disk type, etc.) as the original cluster. ApplySnapshots( ctx context.Context, snapshots []vm.VolumeSnapshot, ) error } This in turn is powered by the following additions to the vm.Provider interface, implemented by each cloud provider. type Provider interface { // ... // CreateVolume creates a new volume using the given options. CreateVolume(l *logger.Logger, vco VolumeCreateOpts) (Volume, error) // ListVolumes lists all volumes already attached to the given VM. ListVolumes(l *logger.Logger, vm *VM) ([]Volume, error) // DeleteVolume detaches and deletes the given volume from the // given VM. DeleteVolume(l *logger.Logger, volume Volume, vm *VM) error // AttachVolume attaches the given volume to the given VM. AttachVolume(l *logger.Logger, volume Volume, vm *VM) (string, error) // CreateVolumeSnapshot creates a snapshot of the given volume, // using the given options. CreateVolumeSnapshot( l *logger.Logger, volume Volume, vsco VolumeSnapshotCreateOpts, ) (VolumeSnapshot, error) // ListVolumeSnapshots lists the individual volume snapshots that // satisfy the search criteria. ListVolumeSnapshots( l *logger.Logger, vslo VolumeSnapshotListOpts, ) ([]VolumeSnapshot, error) // DeleteVolumeSnapshot permanently deletes the given snapshot. DeleteVolumeSnapshot(l *logger.Logger, snapshot VolumeSnapshot) error } Since these snapshots necessarily outlive the tests, and we don't want them dangling perpetually, we introduce a prune-dangling roachtest that acts as a poor man's cron job, sifting through expired snapshots (>30days) and deleting them. For GCE at least it's not obvious to me how to create these snapshots in cloud buckets with a TTL built in, hence this hack. It looks like this (with change to the TTL): === RUN prune-dangling 06:22:48 prune_dangling_snapshots_and_disks.go:54: pruned old snapshot ac-index-backfill-0001-vunknown-1-n2-standard-8 (id=7962137245497025996) 06:22:48 test_runner.go:1023: tearing down after success; see teardown.log --- PASS: prune-dangling (8.59s) Subsequent commits will: - [ ] Fill out admission-control/index-backfill, a non-trivial use of disk snapshots. It will cut down the test time from >4hrs to <25m. - [ ] Expose top-level commands in roachprod to manipulate these snapshots. Release note: None
- Loading branch information
1 parent
f201f9e
commit f13b01d
Showing
19 changed files
with
848 additions
and
148 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
83 changes: 83 additions & 0 deletions
83
pkg/cmd/roachtest/tests/admission_control_index_backfill.go
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,83 @@ | ||
// Copyright 2023 The Cockroach Authors. | ||
// | ||
// Use of this software is governed by the Business Source License | ||
// included in the file licenses/BSL.txt. | ||
// | ||
// As of the Change Date specified in that file, in accordance with | ||
// the Business Source License, use of this software will be governed | ||
// by the Apache License, Version 2.0, included in the file | ||
// licenses/APL.txt. | ||
|
||
package tests | ||
|
||
import ( | ||
"context" | ||
|
||
"github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster" | ||
"github.com/cockroachdb/cockroach/pkg/cmd/roachtest/registry" | ||
"github.com/cockroachdb/cockroach/pkg/cmd/roachtest/spec" | ||
"github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test" | ||
"github.com/cockroachdb/cockroach/pkg/roachprod/vm" | ||
) | ||
|
||
func registerIndexBackfill(r registry.Registry) { | ||
clusterSpec := r.MakeClusterSpec( | ||
1, /* nodeCount */ | ||
spec.CPU(8), | ||
spec.Zones("us-east1-b"), | ||
spec.VolumeSize(500), | ||
spec.Cloud(spec.GCE), | ||
) | ||
clusterSpec.InstanceType = "n2-standard-8" | ||
clusterSpec.GCEMinCPUPlatform = "Intel Ice Lake" | ||
clusterSpec.GCEVolumeType = "pd-ssd" | ||
|
||
r.Add(registry.TestSpec{ | ||
Name: "admission-control/index-backfill", | ||
Owner: registry.OwnerAdmissionControl, | ||
// TODO(irfansharif): Reduce to weekly cadence once stabilized. | ||
// Tags: registry.Tags(`weekly`), | ||
Cluster: clusterSpec, | ||
RequiresLicense: true, | ||
Run: func(ctx context.Context, t test.Test, c cluster.Cluster) { | ||
// TODO(irfansharif): Make a registry of these prefix strings. It's | ||
// important no registered name is a prefix of another. | ||
const snapshotPrefix = "ac-index-backfill" | ||
|
||
var snapshots []vm.VolumeSnapshot | ||
snapshots, err := c.ListSnapshots(ctx, vm.VolumeSnapshotListOpts{ | ||
// TODO(irfansharif): Search by taking in the other parts of the | ||
// snapshot fingerprint, i.e. the node count, the version, etc. | ||
Name: snapshotPrefix, | ||
}) | ||
if err != nil { | ||
t.Fatal(err) | ||
} | ||
if len(snapshots) == 0 { | ||
t.L().Printf("no existing snapshots found for %s (%s), doing pre-work", t.Name(), snapshotPrefix) | ||
// TODO(irfansharif): Add validation that we're some released | ||
// version, probably the predecessor one. Also ensure that any | ||
// running CRDB processes have been stopped since we're taking | ||
// raw disk snapshots. Also later we'll be unmounting/mounting | ||
// attached volumes. | ||
if err := c.CreateSnapshot(ctx, snapshotPrefix); err != nil { | ||
t.Fatal(err) | ||
} | ||
snapshots, err = c.ListSnapshots(ctx, vm.VolumeSnapshotListOpts{Name: snapshotPrefix}) | ||
if err != nil { | ||
t.Fatal(err) | ||
} | ||
t.L().Printf("using %d newly created snapshot(s) with prefix %q", len(snapshots), snapshotPrefix) | ||
} else { | ||
t.L().Printf("using %d pre-existing snapshot(s) with prefix %q", len(snapshots), snapshotPrefix) | ||
} | ||
|
||
if err := c.ApplySnapshots(ctx, snapshots); err != nil { | ||
t.Fatal(err) | ||
} | ||
|
||
// TODO(irfansharif): Actually do something using TPC-E, index | ||
// backfills and replication admission control. | ||
}, | ||
}) | ||
} |
63 changes: 63 additions & 0 deletions
63
pkg/cmd/roachtest/tests/prune_dangling_snapshots_and_disks.go
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
// Copyright 2023 The Cockroach Authors. | ||
// | ||
// Use of this software is governed by the Business Source License | ||
// included in the file licenses/BSL.txt. | ||
// | ||
// As of the Change Date specified in that file, in accordance with | ||
// the Business Source License, use of this software will be governed | ||
// by the Apache License, Version 2.0, included in the file | ||
// licenses/APL.txt. | ||
|
||
package tests | ||
|
||
import ( | ||
"context" | ||
|
||
"github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster" | ||
"github.com/cockroachdb/cockroach/pkg/cmd/roachtest/registry" | ||
"github.com/cockroachdb/cockroach/pkg/cmd/roachtest/spec" | ||
"github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test" | ||
"github.com/cockroachdb/cockroach/pkg/roachprod" | ||
"github.com/cockroachdb/cockroach/pkg/roachprod/vm" | ||
"github.com/cockroachdb/cockroach/pkg/util/timeutil" | ||
) | ||
|
||
// This test exists only to prune expired snapshots. Not all cloud providers | ||
// (GCE) let you store volume snapshots in buckets with a pre-configured TTL. So | ||
// we use this nightly roachtest as a poor man's cron job. | ||
func registerPruneDanglingSnapshotsAndDisks(r registry.Registry) { | ||
clusterSpec := r.MakeClusterSpec( | ||
1, /* nodeCount */ | ||
spec.Cloud(spec.GCE), | ||
) | ||
|
||
r.Add(registry.TestSpec{ | ||
Name: "prune-dangling", | ||
Owner: registry.OwnerTestEng, | ||
Cluster: clusterSpec, | ||
RequiresLicense: true, | ||
Run: func(ctx context.Context, t test.Test, c cluster.Cluster) { | ||
snapshots, err := c.ListSnapshots(ctx, vm.VolumeSnapshotListOpts{ | ||
CreatedBefore: timeutil.Now().Add(-1 * roachprod.SnapshotTTL), | ||
Labels: map[string]string{ | ||
vm.TagUsage: "roachtest", // only prune out snapshots created in tests | ||
}, | ||
}) | ||
if err != nil { | ||
t.Fatal(err) | ||
} | ||
|
||
for _, snapshot := range snapshots { | ||
if err := c.DeleteSnapshots(ctx, snapshot); err != nil { | ||
t.Fatal(err) | ||
} | ||
t.L().Printf("pruned old snapshot %s (id=%s)", snapshot.Name, snapshot.ID) | ||
} | ||
|
||
// TODO(irfansharif): Also prune out unattached disks. Use something | ||
// like: | ||
// | ||
// gcloud compute --project $project disks list --filter="-users:*" | ||
}, | ||
}) | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.