-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachprod: audit and fix incorrect usages of c.Parallel wrt error handling #104312
Labels
A-testing
Testing tools and infrastructure
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-testeng
TestEng Team
Comments
srosenberg
added
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
A-testing
Testing tools and infrastructure
T-testeng
TestEng Team
labels
Jun 5, 2023
cc @cockroachdb/test-eng |
srosenberg
changed the title
roachtest: audit and fix incorrect usages of c.Parallel wrt error handling
roachprod: audit and fix incorrect usages of c.Parallel wrt error handling
Jun 5, 2023
srosenberg
added a commit
to srosenberg/cockroach
that referenced
this issue
Jun 8, 2023
Support for disk snapshots has been recently added in [1]. This change adds minor fixes which were necessary to run in a different environment which was previously tested. Specifically, several commands were missing `--project` which resulted in (silent) failures. Error propagation from `c.Parallel` was signaling an internal error instead of passing command's error. (Granted, the existing mechanism for error propagation is to be revisited in [2].) [1] cockroachdb#103757 [2] cockroachdb#104312 Epic: none Release note: None
craig bot
pushed a commit
that referenced
this issue
Jun 9, 2023
104573: roachprod: minor fixes for snapshots r=srosenberg a=srosenberg Support for disk snapshots has been recently added in [1]. This change adds minor fixes which were necessary to run in a different environment which was previously tested. Specifically, several commands were missing `--project` which resulted in (silent) failures. Error propagation from `c.Parallel` was signaling an internal error instead of passing command's error. (Granted, the existing mechanism for error propagation is to be revisited in [2].) [1] #103757 [2] #104312 Epic: none Release note: None Co-authored-by: Stan Rosenberg <[email protected]>
craig bot
pushed a commit
that referenced
this issue
Aug 17, 2023
104915: roachprod: roachtest: refactoring r=renatolabs,herkolategan a=smg260 This commit addresses some usability issues within roachprod/roachtest, and does some minor clean up. Summary: `roachprod` - `c.Parallel` now accepts `nodes Nodes` and `func(ctx context.Context, node Node)`. Previously, these was an `int n` representing the first `n` nodes to operate on, and `func(ctx context.Context, int n)`, where second arg would confusingly represented the index of the node being operated on, which was wholly dependent on a captured-by-reference slice at the call site. This change makes it explicit to the caller and the function, which exact node is targeted. - `c.Parallel` now returns `[]*RunResultDetails, boolean, error`. The slice will contain results, successful and failed, for each node (or thus far, depending on fail-fast behaviour). The boolean is a signal to the caller that at least one of the results in the slice suffered from a command error, which can be inspected in the result itself. The `error` now more idiomatically represents an unrecoverable error encountered during `Parallel`; i.e. an un retryable error in `roachprod`. In the latter case, results will be `nil`. - All calls to execute a remote command on a node are all funnelled to `c.runCmdOnSingleNode`, ensuring consistency, and reducing duplicated code. This function accepts a `RunCmdOptions` struct to configure it's behaviour, including `stdin/out` - `runCmdOnSingleNode` : in error cases, include stdout/err output (truncated after n bytes - effectively `tail`). This output will show in the `failure_n.log` and give convenient insight. Full output is always retained in the run log. - `RunResultDetails.{Stdout,Stderr,CombinedOut}` are now all `string` to represent the most common use case, and a convenience `r.Output()` added to print out the contents of `Stderr,Stdout,CombinedOut` - Remove complex legacy pipe processing logic from `roachtest` (`execCmd,execCmdEx`), previously used when `roachprod` was not a library of `roachtest` - When showing the result of a command on each node on the CLI, explicitly display `<ok>` or `<err>` followed by output/error `roachtest` - skip post-test assertions on a test failure - removal of `execCmd/execCmdRes` which were doing some complex pipe processing before calling roachprod.Run (not required since roachprod is a lib of roachtest) Minor - Clean up various redundant/unused variables - Typos - Command formatting Informs: #104312 Informs: #104316 (TBD) Epic: none Release note: none 107354: asim: add randomized range generation r=kvoli a=wenyihu6 This patch enables random range configuration to be generated. TestRandomized can now take another setting parameter rangeGen (default: uniform rangeGenType, uniform keySpaceGenType, empty weightedRand). These generators are part of the framework fields which persist across iterations. The numbers produced by the generator shape the distribution across iterations. - rangeKeyGenType: determines range generator type across iterations (default: uniformGenerator, min = 1, max = 1000) - keySpaceGenType: determines key space generator type across iterations (default: uniformGenerator, min = 1000, max = 200000) - weightedRand: if non-empty, enables weighted randomization for range distribution This provides three modes for range generation: 1. Default: currently set to uniform distribution 2. Random: randomly generates range distribution across stores 3. Weighted Randomization: enables weighted randomization for range distribution if and only if given weightedRand is non-empty Part of: #106311 Release note: None 108906: changefeedccl: make tests work offline r=miretskiy a=jayshrivastava This change updates two unit tests to pass when running offline. Fixes: #108867 Release note: None Epic: None Co-authored-by: Miral Gadani <[email protected]> Co-authored-by: wenyihu6 <[email protected]> Co-authored-by: Jayant Shrivastava <[email protected]>
Fixed by #104915 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-testing
Testing tools and infrastructure
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-testeng
TestEng Team
After implementing SSH retries and refactoring
c.Parallel
, the new implementation expects a user-defined function to return noerror
unless there was an internal (roachprod) error; instead, if the executed function resulted in an error, it's expected to be inRunResultDetails.Err
.The above (tacit) contract is violated by several existing commands, e.g.,
Wipe
,Stop
. When either of these functions errors out, the actual details of the error are not written to the logs. E.g., an attempt to reuse a cluster failed to execute eitherWipe
orStop
, but the actual details are unknown; all we get is the following message,The ask is as follows,
c.Parallel
and fix contract violationserror
should benil
unless there was an internal errorreturn res, nil
; perhaps they could use a different API, one which has a single return of type*RunResultDetails
?[1] #90451
The text was updated successfully, but these errors were encountered: