-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachprod,roachtest: backport recent improvements #105831
Merged
renatolabs
merged 6 commits into
cockroachdb:release-23.1
from
renatolabs:rc/backport-100403-101222-104343-104777-105514
Jul 4, 2023
Merged
roachprod,roachtest: backport recent improvements #105831
renatolabs
merged 6 commits into
cockroachdb:release-23.1
from
renatolabs:rc/backport-100403-101222-104343-104777-105514
Jul 4, 2023
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
renatolabs
requested review from
srosenberg and
smg260
and removed request for
a team
June 29, 2023 17:11
…es fails. Previously, when encountering an error on a node, we would wait until the command finished on all the specified nodes until returning to the caller. This meant long-running workloads would continue until completion despite having already failed on at least one of the nodes. The context is now threaded through the call stack, allowing the c.Parallel to issue a cancellation for early termination. Resolves: cockroachdb#98520 Release note: none Epic: none
as we no longer wait for all the nodes to complete in the event of an error on any node.
however sometimes it is desirable for the command to complete running on all nodes despite failures. e.g. `roachprod run <cluster> 'ls /mnt/data1/cockroach'` or when collecting dmesg logs after a test. This commit has also switched to functional options for c.Parallel for more flexibility and allow calls to specify WithWaitOnFail(). Epic: none Fixes: cockroachdb#101150 Release note: None
Previously, we added fail fast behaviour when executing commands across nodes using roachtest/roachprod. There are some instances, specifically from the CLI, that should wait for all results to be returned. This PR adds `WithWaitOnFail()` to `ExecSQL()` in roachprod. Epic: none Fixes: cockroachdb#104342 Release note: None
Starting a cluster locally does a check for certificates. In the event the certificates are not found, which is a valid case, an error is printed. The start command works correctly, but the error can cause confusion: ```bash /bin/roachprod start local --secure 12:47:56 cluster_synced.go:2475: 0: COMMAND_PROBLEM: exit status 1 (1) COMMAND_PROBLEM Wraps: (2) exit status 1 Error types: (1) errors.Cmd (2) *exec.ExitError: local: initializing certs 1/1 / local: distributing certs 2/2 ``` This change modifies the command to do a check without causing an error, and also propagates any real errors that could occur while doing the check.
For some reason, the current form of `fileExistsOnFirstNode` can return `found=true` when it should return `found=false`. This can be reproduced by running the `multitenant-upgrade` roachtest and seeing it hang at: ``` multitenant_upgrade.go:154: test status: checking the pre-upgrade sql server still works after the system tenant binary upgrade ``` because of a `TLS handshake error`. Note: the test is also broken because of another issue so with this fix it should now fail with: ``` (assertions.go:333).Fail: Error Trace: github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/multitenant_upgrade.go:390 github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/multitenant_upgrade.go:189 github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/multitenant_upgrade.go:38 main/pkg/cmd/roachtest/test_runner.go:1060 GOROOT/src/runtime/asm_arm64.s:1172 Error: Not equal: expected: [][]string{[]string{"23.1"}} actual : [][]string{[]string{"22.2"}} Diff: --- Expected +++ Actual @@ -2,3 +2,3 @@ ([]string) (len=1) { - (string) (len=4) "23.1" + (string) (len=4) "22.2" } Test: multitenant-upgrade ``` Release note: None Epic: none
renatolabs
force-pushed
the
rc/backport-100403-101222-104343-104777-105514
branch
from
June 29, 2023 18:45
e93daf5
to
3a12d96
Compare
srosenberg
approved these changes
Jun 29, 2023
Nightly build with |
Failures on the build above also happen on master and are unrelated to the changes introduced here. Merging. TFTR! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This backports more recent improvements to the roachtest/roachprod infrastructure, specifically:
c.ParallelE
can be made optionally made to fail slow #101222fileExistsOnFirstNode
check #105514This will make it much easier to backport other changes that build on top of these improvements.
Epic: None
Release notes: None
Release justification: test-only changes.