-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: run multiple backup types in backup/mixed-version #100755
roachtest: run multiple backup types in backup/mixed-version #100755
Conversation
f5a1c2d
to
effb462
Compare
This is a stacked PR, please review them in order (but feel free to review the second or third ones before the first one is merged):
To be reviewed here: 2nd and 3rd commits. |
6b3859b
to
10a2e67
Compare
10a2e67
to
1696465
Compare
1696465
to
5d1da19
Compare
5d1da19
to
9dd1c3c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @renatolabs and @srosenberg)
pkg/cmd/roachtest/tests/mixed_version_backup.go
line 83 at r3 (raw file):
// are included as part of a full cluster backup. It should include // every table that opts-in to cluster backup (see `system_schema.go`). // It should be updated as system tables are added or removed from
Nit: I'm always a bit dubious of scenarios where somebody needs to remember to update something somewhere. I did note this comes from a ccl
package though - GetSystemTablesToIncludeInClusterBackup
- which definitely makes it harder to get to. I'm curious as I noticed cdc
"breaks the rule" by importing from ccl
. Would this also make the exceptions list?
pkg/cmd/roachtest/tests/mixed_version_backup.go
line 227 at r3 (raw file):
tableContents interface { // Load computes the contents of a table with the given name // (passed as argument) and stores it representation internally.
Nit: 'its'
pkg/cmd/roachtest/tests/mixed_version_backup.go
line 293 at r3 (raw file):
nonce string // tables is the list of tables that were are going to verify once
Nit: we are
pkg/cmd/roachtest/tests/mixed_version_backup.go
line 330 at r3 (raw file):
// // tableNamesWithDB("db1", []string{"foo.bar, baz"}) // => []string{"db1.foo", "db1.baz"}
The code doesn't seem to match the example here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 9 of 9 files at r1, 1 of 1 files at r2, 1 of 1 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @herkolategan and @renatolabs)
pkg/cmd/roachtest/roachtestutil/mixedversion/mixedversion.go
line 229 at r1 (raw file):
// cancel the underlying context. // // There should be one channel in `bgChans` for each hook in
Maybe make this stronger? E.g., "Invariant: there is exactly one channel in bgChans
for each hook in Test.hooks.background
."
pkg/cmd/roachtest/tests/backup.go
line 166 at r1 (raw file):
for j, ip := range ips { if err := retry.WithMaxAttempts(ctx, retryConfig, 10, func() error { addr := net.JoinHostPort(ip, fmt.Sprintf("%d", port))
Nice!
pkg/cmd/roachtest/tests/mixed_version_backup.go
line 149 at r2 (raw file):
// tables, we need to quote the column names to avoid SQL syntax // errors. func quoteColumnsForSelect(columns []string) string {
Seems like a reusable (i.e., common) helper func.
pkg/cmd/roachtest/tests/mixed_version_backup.go
line 450 at r2 (raw file):
// Load computes the fingerprints for the underlying table and stores // the contents in the `fingeprrints` field.
fingerprrints
-> fingerprints
Why is this func called Load
? It's not really loading data. Maybe rename to ComputeFingerprints
?
pkg/cmd/roachtest/tests/mixed_version_backup.go
line 512 at r2 (raw file):
// in this system table row, in which case the error should be // returned to the caller. func (sr *systemTableRow) skip() bool {
skipColumn
?
pkg/cmd/roachtest/tests/mixed_version_backup.go
line 773 at r2 (raw file):
} sc.rows = rowSet
Curious how large of a memory footprint do we expect?
pkg/cmd/roachtest/tests/mixed_version_backup.go
line 1101 at r2 (raw file):
if strings.HasPrefix(table, "system.") { node := h.RandomNode(rng, mvb.roachNodes) contents, err = newSystemTableContents(ctx, mvb.cluster, node, db, table, timestamp)
Remind me why we can't fingerprint system tables?
pkg/cmd/roachtest/tests/mixed_version_backup.go
line 1439 at r2 (raw file):
// resetCluster wipes the entire cluster and starts it again with the // current latest binary. This is done prior we attempt restoring a
prior
-> before
pkg/cmd/roachtest/tests/mixed_version_backup.go
line 330 at r3 (raw file):
Previously, herkolategan (Herko Lategan) wrote…
The code doesn't seem to match the example here.
It does, but it's rather strange. We drop foo
on the floor, which is the strange part, i.e., counter-intuitive.
9dd1c3c
to
90553ff
Compare
Rebased:
To be reviewed here: the entire diff. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @herkolategan and @srosenberg)
pkg/cmd/roachtest/roachtestutil/mixedversion/mixedversion.go
line 229 at r1 (raw file):
Previously, srosenberg (Stan Rosenberg) wrote…
Maybe make this stronger? E.g., "Invariant: there is exactly one channel in
bgChans
for each hook inTest.hooks.background
."
I had slightly updated this comment (Herko had a similar suggestion). Updated to include the "Invariant" part too.
pkg/cmd/roachtest/tests/mixed_version_backup.go
line 149 at r2 (raw file):
Previously, srosenberg (Stan Rosenberg) wrote…
Seems like a reusable (i.e., common) helper func.
Yes, which is why this function is not tied to any structs used by this test. Since all roachtests share the same package at the moment, they can use this function. Or did you mean I should move the function to an util.go
file to make its reusability more explicit?
pkg/cmd/roachtest/tests/mixed_version_backup.go
line 450 at r2 (raw file):
Previously, srosenberg (Stan Rosenberg) wrote…
fingerprrints
->fingerprints
Why is this func called
Load
? It's not really loading data. Maybe rename toComputeFingerprints
?
Typo fixed.
As for the Load
function: this is implementing an interface. I chose Load
in the interface to be the more generic term to describe what happens across user and system tables. IMO, a valid interpretation of this function is that we are loading their contents in different ways: for user tables, the representation of the loaded data is a series of fingerprints. For system tables, we do load the entire contents of the table (as JSON encoded arrays -- implementation detail). I tried to convey that in the comments, but let me know if there's more I can do to make the point clearer.
pkg/cmd/roachtest/tests/mixed_version_backup.go
line 512 at r2 (raw file):
Previously, srosenberg (Stan Rosenberg) wrote…
skipColumn
?
While currently this function is only called in the context of processColumn
, it doesn't need to be so. skip
indicates that any operation on systemTableRow
should be skipped (if there was a function that processed entire rows at a time, they should also be skipped if skip()
returns true.)
pkg/cmd/roachtest/tests/mixed_version_backup.go
line 773 at r2 (raw file):
Previously, srosenberg (Stan Rosenberg) wrote…
Curious how large of a memory footprint do we expect?
Not much at all; a lot of these system tables are either empty or have very few rows in them (order of couple dozen, max).
pkg/cmd/roachtest/tests/mixed_version_backup.go
line 1101 at r2 (raw file):
Previously, srosenberg (Stan Rosenberg) wrote…
Remind me why we can't fingerprint system tables?
Because the test is running an upgrade, and then all bets are off. System tables could have new columns, new rows, etc... Even if there wasn't an upgrade going on, restore logic for system tables can sometimes be quite custom, including "rekeying" of certain keys, which make direct comparison as provided by fingerprint matching not possible. Not to mention impossible to avoid (at roachtest level) races for scheduled jobs. There's probably more reasons I'm not remembering.
pkg/cmd/roachtest/tests/mixed_version_backup.go
line 1439 at r2 (raw file):
Previously, srosenberg (Stan Rosenberg) wrote…
prior
->before
Fixed, thanks.
pkg/cmd/roachtest/tests/mixed_version_backup.go
line 83 at r3 (raw file):
Previously, herkolategan (Herko Lategan) wrote…
Nit: I'm always a bit dubious of scenarios where somebody needs to remember to update something somewhere. I did note this comes from a
ccl
package though -GetSystemTablesToIncludeInClusterBackup
- which definitely makes it harder to get to. I'm curious as I noticedcdc
"breaks the rule" by importing fromccl
. Would this also make the exceptions list?
I kind of like the explicitness of having the entire list spelled out in the test.
To verify the contents of system tables, especially in the context of an upgrade, the test is required to be aware of certain special behavior associated with the restoring of these tables (such as ignoring theversion
row in system.settings
, for example). In addition, we also need to ignore the system.zones
table at the moment (see associated comment in this file). So even if we imported the list of names from the backup package, we'd still need some manually written logic to deal with these special cases. And then at that point, it's arguable whether the benefit is still there. If, at some point, we stop backing up the system.settings
table, as an arbitrary example, the test would continue to pass but we now have dead code in the test dealing with special cases in the restore that no longer apply.
If there are changes in the system tables we backup, I'd see a failure in this test as a "feature"; we would look at the failure and acknowledge it's expected. The fix would likely be straightforward.
pkg/cmd/roachtest/tests/mixed_version_backup.go
line 227 at r3 (raw file):
Previously, herkolategan (Herko Lategan) wrote…
Nit: 'its'
Fixed, thanks.
pkg/cmd/roachtest/tests/mixed_version_backup.go
line 293 at r3 (raw file):
Previously, herkolategan (Herko Lategan) wrote…
Nit: we are
Fixed, thanks.
pkg/cmd/roachtest/tests/mixed_version_backup.go
line 330 at r3 (raw file):
Previously, srosenberg (Stan Rosenberg) wrote…
It does, but it's rather strange. We drop
foo
on the floor, which is the strange part, i.e., counter-intuitive.
Yeah, it does match the example. I updated the documentation for this function, let me know if it's clearer now.
Pushed a temporary commit (from #101786) to see if this fixes a deadlock in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @renatolabs and @srosenberg)
pkg/cmd/roachtest/tests/mixed_version_backup.go
line 330 at r3 (raw file):
Previously, renatolabs (Renato Costa) wrote…
Yeah, it does match the example. I updated the documentation for this function, let me know if it's clearer now.
Thanks, but what I meant, as with the original example, is that the input tableNamesWithDB("restored_db", []string{"bank.bank, stock"})
seems to suggest that it can support a comma separated string as well as a slice.
In other words shouldn't it be tableNamesWithDB("restored_db", []string{"bank.bank", "stock"})
? In the current state the example would output: "restored_db.bank, stock"
8e27836
to
a08ea52
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @herkolategan and @srosenberg)
pkg/cmd/roachtest/tests/mixed_version_backup.go
line 330 at r3 (raw file):
Previously, herkolategan (Herko Lategan) wrote…
Thanks, but what I meant, as with the original example, is that the input
tableNamesWithDB("restored_db", []string{"bank.bank, stock"})
seems to suggest that it can support a comma separated string as well as a slice.
In other words shouldn't it betableNamesWithDB("restored_db", []string{"bank.bank", "stock"})
? In the current state the example would output:"restored_db.bank, stock"
Oh, I see, I didn't notice that! I often make that typo, thanks for catching. Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Temporary commit dropped, this is back to what should be merged.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @herkolategan and @srosenberg)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 9 files at r4.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @herkolategan, @renatolabs, and @srosenberg)
pkg/cmd/roachtest/tests/mixed_version_backup.go
line 220 at r4 (raw file):
// tableContents is an interface to be implemented by different // methods of storing and comparing the contents of a table. A lot // of backup-related tests use the `EXPERIMENTAL_FINGEPRINTS`
typo: EXPERIMENTAL_FINGE[R]PRINTS
pkg/cmd/roachtest/tests/mixed_version_backup.go
line 239 at r4 (raw file):
// fingerprintContents implements the `tableContents` interface // using the `EXPERIMENTAL_FINGEPRINTS` functionality. This should
typo: as above
Is there a better way to organise the code in |
This adds support to database and full cluster backups in the `backup/mixed-version` roachtest. When a cluster is taken in mixed-version state, a randomly chosen backup type is performed. Since cluster backups also backup some system tables, this commit also adds logic related to verifying that the contents of system tables are properly preserved when these cluster backups are restored. Epic: CRDB-19321 Release note: None
This commit adds a second workload to the `backup/mixed-versions` test, namely `tpcc` with 100 warehouses. Previously, we would only run the `bank` workload; by running TPCC, we verify backups of databases that includes multiple tables, and the backup of tables that includes foreign key constraints. It should also put the cluster under more load than just `bank`, potentially exposing interesting edge cases that wouldn't come up before. Epic: CRDB-19321 Release note: None
a08ea52
to
932b02b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @herkolategan, @smg260, and @srosenberg)
pkg/cmd/roachtest/tests/mixed_version_backup.go
line 220 at r4 (raw file):
Previously, smg260 (Miral Gadani) wrote…
typo:
EXPERIMENTAL_FINGE[R]PRINTS
Fixed, thanks.
pkg/cmd/roachtest/tests/mixed_version_backup.go
line 239 at r4 (raw file):
Previously, smg260 (Miral Gadani) wrote…
typo: as above
Fixed, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, it is becoming quite long. I'm not sure how much can be moved to a different file, especially given the current roachtest organization where every test is under the same package (does it add more confusion if we split a test in multiple files?).
I've talked to DR about making some of the functionality in this test more generally applicable to non-mixed-version backup/restore tests. I think it makes sense to defer any moving of code to different files at that point.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @herkolategan, @smg260, and @srosenberg)
I expect this to flake due to connection warmup hanging (to be fixed by #101786). I'm going to merge it anyway so that we get some runs in; flakes should be pretty easy to identify. TFTRs! bors r=herkolategan |
Build succeeded: |
roachtest: run multiple backup types in backup/mixed-version
This adds support to database and full cluster backups in the
backup/mixed-version
roachtest. When a cluster is taken inmixed-version state, a randomly chosen backup type is performed. Since
cluster backups also backup some system tables, this commit also adds
logic related to verifying that the contents of system tables are
properly preserved when these cluster backups are restored.
roachtest: run tpcc workload in backup/mixed-versions
This commit adds a second workload to the
backup/mixed-versions
test, namely
tpcc
with 100 warehouses. Previously, we would only runthe
bank
workload; by running TPCC, we verify backups of databasesthat includes multiple tables, and the backup of tables that includes
foreign key constraints. It should also put the cluster under more
load than just
bank
, potentially exposing interesting edge casesthat wouldn't come up before.
Epic: CRDB-19321
Release note: None