Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: backport improvements to mixedversion and backup test #103912

Conversation

renatolabs
Copy link
Contributor

Backports PRs:

Please see individual PRs for details.

This commit updates the signature of functions in the `mixedversion`
package that create steps that run in the background; this includes
functions in the `Helper` struct, as well as in the main
`mixedversion.Test` struct. These functions now all return a
`mixedversion.StopFunc` that test authors can call when they wish to
stop a background function without causing the test to fail.

When the stop function is called, the context passed to the background
functions is canceled; however, that context cancelation is captured
by the framework and logged as an expected termination. If the context
is canceled through other means, the test will fail as usual.

Epic: CRDB-19321

Release note: None
This adds support to database and full cluster backups in the
`backup/mixed-version` roachtest. When a cluster is taken in
mixed-version state, a randomly chosen backup type is performed. Since
cluster backups also backup some system tables, this commit also adds
logic related to verifying that the contents of system tables are
properly preserved when these cluster backups are restored.

Epic: CRDB-19321

Release note: None
This commit adds a second workload to the `backup/mixed-versions`
test, namely `tpcc` with 100 warehouses. Previously, we would only run
the `bank` workload; by running TPCC, we verify backups of databases
that includes multiple tables, and the backup of tables that includes
foreign key constraints. It should also put the cluster under more
load than just `bank`, potentially exposing interesting edge cases
that wouldn't come up before.

Epic: CRDB-19321

Release note: None
This commit updates the `backup/mixed-version` (now
`backup-restore/mixed-version`) to also perform mixed-version
restores. At a high level, it introduces a new function that runs in
mixed-version state that will randomly attempt to restore a subset of
the backups taken up to that point during the test.

The test will verify that we are able to restore, in mixed version,
backups taken in the previous version _and_ in mixed version.

Resolves: cockroachdb#96367.

Epic: CRDB-19321

Release note: None
This commit makes some final (for now) changes to the
`backup-restore/mixed-version` roachtest. Specifically:

* we set some backup/restore related cluster settings. These are
publicly documented settings and should help expose corner cases that
might be harder to come up naturally using the default settings. This
is an area that is known to need more tests, as described in a recent
postmortem [1].

* introduce a background function that executes statements that lead
to rows being inserted into system tables, particularly those that are
generally empty in most tests.

* simplify the workload setup in the test: the `bank` workload is
responsible for testing edge cases, while `tpcc` is a workload that
should better represent customer workloads.

* verify that backups taken in mixed-version can be restored both in
the previous version and in the next version. Previously, we were only
testing the next version.

Note that most of these changes are not specificaly related to the
mixed-version context this test is in. In the future, these features
should be packaged in a format that is easier to consume by other
tests.

[1] https://cockroachlabs.atlassian.net/wiki/spaces/ENG/pages/3013804060/Postmortem+101963+revision+history+backups

Epic: none

Release note: None
When a mixed-version test (using the `mixedversion` package) starts,
the `cockroach` binary is uploaded to every node, and then the cluster
is started. If, for some reason, the cockroach process crashes in this
startup phase, the `mixedversion` package would panic while generating
an error message. The reason for that is that there was an assumption
that the connection cache was initialized at that point, which does
not hold when the test failure happened on test setup.

This fixes this issue by making sure we check for the status of the
connection cache when generating error messages. We also make sure
concurrent accesses to the connection cache are safe; while this is
not strictly necessary (no concurrent reads and writes to it right
now), it will likely help in the future as this code changes.

Epic: none

Release note: None
@renatolabs renatolabs requested a review from a team as a code owner May 25, 2023 18:56
@renatolabs renatolabs requested review from herkolategan and smg260 and removed request for a team May 25, 2023 18:56
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@srosenberg srosenberg self-requested a review May 26, 2023 03:46
@renatolabs
Copy link
Contributor Author

Ran the backup-restore/mixed-version roachtest on this branch to make sure things are good. Needless to say, it failed. 🙂

Build: https://teamcity.cockroachdb.com/viewLog.html?buildId=10249433&buildTypeId=Cockroach_Nightlies_RoachtestNightlyGceBazel&tab=buildResultsDiv&branch_Cockroach_Nightlies=103912.

The failure is a documented bug -- #103597.

Merging, TFTR!

@renatolabs renatolabs merged commit aedfd62 into cockroachdb:release-23.1 May 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants