Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backupccl: avoid RAID0ing local NVMe and GP3 storage in restore roachtests #99723

Merged

Conversation

msbutler
Copy link
Collaborator

A long restore roachtest perf investigation revealed that roachprod can RAID0 local storage and AWS GP3 storage, a configuration that does not mix well with CRDB and does not reflect a reasonable customer environment. This patch avoids this RAID0ing in the restore roachtests, stabilizing test performance.

Informs #98783

Fixes #97019

Release note: none

@msbutler msbutler self-assigned this Mar 27, 2023
@msbutler msbutler requested a review from a team as a code owner March 27, 2023 20:13
@msbutler msbutler requested review from smg260 and removed request for a team March 27, 2023 20:13
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@srosenberg srosenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @lidorcarmel, @msbutler, and @smg260)


pkg/cmd/roachtest/tests/restore.go line 420 at r1 (raw file):

		// https://github.com/cockroachdb/cockroach/issues/98783.
		//
		// This should be removed once we have found a real solution that avoids

Could you prefix the comment with TODO; it will be easier to find.

@msbutler msbutler force-pushed the butler-deflake-8tb-restore-roachtest branch from 998cd43 to e1a81bf Compare March 29, 2023 13:21
Copy link
Collaborator Author

@msbutler msbutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @lidorcarmel, @smg260, and @srosenberg)


pkg/cmd/roachtest/tests/restore.go line 420 at r1 (raw file):

Previously, srosenberg (Stan Rosenberg) wrote…

Could you prefix the comment with TODO; it will be easier to find.

Done.

…tests

A long restore roachtest perf investigation revealed that roachprod can RAID0
local storage and AWS GP3 storage, a configuration that does not mix well with
CRDB and does not reflect a reasonable customer environment. This patch avoids
this RAID0ing in the restore roachtests, stabilizing test performance.

Informs cockroachdb#98783

Fixes cockroachdb#97019

Release note: none
@msbutler msbutler force-pushed the butler-deflake-8tb-restore-roachtest branch from e1a81bf to 64edfd1 Compare March 30, 2023 12:01
@msbutler
Copy link
Collaborator Author

TFTR!

bors r=srosenberg

@craig
Copy link
Contributor

craig bot commented Mar 30, 2023

Build failed (retrying...):

@msbutler msbutler added the backport-23.1.x Flags PRs that need to be backported to 23.1 label Mar 30, 2023
@craig craig bot merged commit 476cd85 into cockroachdb:master Mar 30, 2023
@craig
Copy link
Contributor

craig bot commented Mar 30, 2023

Build succeeded:

@rail
Copy link
Member

rail commented Mar 31, 2023

Looks like this PR made Roachtest AWS panic - panic: low memory per CPU not available for AWS

@msbutler
Copy link
Collaborator Author

I know what's going on. It's sad. Easy fix coming shortly.

msbutler added a commit to msbutler/cockroach that referenced this pull request Mar 31, 2023
After cockroachdb#99723 merged as a bandaid for cockroachdb#98783, the aws roachtest nightly began to
panic because of a different roachtest papercut cockroachdb#96655. Specifically, because
roachtest filters which tests run on which cloud within the evaluation of the
test closure, tests meant to run on gce will still get registered in an AWS
run. During the registration of the gce test
`restore/tpce/400GB/gce/nodes=4/cpus=8/lowmem` _on aws_, the aws test harness
panics because the aws roachprod implementation does not have a low memory cpu
configuration. This patch prevents this panic and should be reverted once
the pr cockroachdb#99402 merges.

Epic: None

Release note: None
craig bot pushed a commit that referenced this pull request Mar 31, 2023
99312: sqlsmith: add DEFAULT expressions to newly added columns r=mgartner a=mgartner

Sqlsmith now builds `ALTER TABLE .. ADD COLUMN .. DEFAULT` statements
with default expressions that have different types than the column type.
This is allowed if the default expression type can be assignment-casted
to the column's type.

Fixes #98133

Release note: None


99348: testutils: move default test tenant message r=rharding6373 a=herkolategan

In order to reduce logging noise but still inform test authors of the default test tenant, the message has been moved to where there is a `testing.TB` interface.

Epic: CRDB-18499

99835: opt/execbuilder: add panic catching to buildRoutinePlanGenerator r=mgartner a=mgartner

This commit adds a panic catcher to callback functions created in
execbuilder and invoked during evaluation of UDFs and correlated
subqueries. It matches the panic catcher logic in `buildApplyJoin`.

Fixes #98786

Release note: None


100267: roachtest: own autoupgrade to TestEng r=renatolabs a=tbg

Discussed in #99479.

Epic: none
Release note: None


100286: roachtest: prevent aws roachtest panic r=rail a=msbutler

After #99723 merged as a bandaid for #98783, the aws roachtest nightly began to panic because of a different roachtest papercut #96655. Specifically, because roachtest filters which tests run on which cloud within the evaluation of the test closure, tests meant to run on gce will still get registered in an AWS run. During the registration of the gce test
`restore/tpce/400GB/gce/nodes=4/cpus=8/lowmem` _on aws_, the aws test harness panics because the aws roachprod implementation does not have a low memory cpu configuration. This patch prevents this panic and should be reverted once the pr #99402 merges.

Epic: None

Release note: None

100294: tenantcapabilitiestestutils: add a missing default case r=ajwerner a=ajwerner

The test should fail if we ever add a new type of capability and use it in the data driven test but don't update the test to handle it.

Epic: none

Follow-up from #100217 (review)

Release note: None

100296: rpc: correctly check for nil before cast r=ajwerner a=andrewbaptist

As part of the fix of #99104, a cast without a nil check was introduced. This PR addresses that by only casting if it is known to be not nil.

Epic: none
Fixes: #100275
Release note: None

Co-authored-by: Marcus Gartner <[email protected]>
Co-authored-by: Herko Lategan <[email protected]>
Co-authored-by: Tobias Grieger <[email protected]>
Co-authored-by: Michael Butler <[email protected]>
Co-authored-by: ajwerner <[email protected]>
Co-authored-by: Andrew Baptist <[email protected]>
blathers-crl bot pushed a commit that referenced this pull request Mar 31, 2023
After #99723 merged as a bandaid for #98783, the aws roachtest nightly began to
panic because of a different roachtest papercut #96655. Specifically, because
roachtest filters which tests run on which cloud within the evaluation of the
test closure, tests meant to run on gce will still get registered in an AWS
run. During the registration of the gce test
`restore/tpce/400GB/gce/nodes=4/cpus=8/lowmem` _on aws_, the aws test harness
panics because the aws roachprod implementation does not have a low memory cpu
configuration. This patch prevents this panic and should be reverted once
the pr #99402 merges.

Epic: None

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-23.1.x Flags PRs that need to be backported to 23.1 T-disaster-recovery
Projects
None yet
Development

Successfully merging this pull request may close these issues.

roachtest: restore/tpce/8TB/aws/nodes=10/cpus=8 failed
4 participants