-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest: remove direct go calls #134644
roachtest: remove direct go calls #134644
Conversation
118a08e
to
b9e52e5
Compare
Previously, tests had to use bare goroutine calls to start a goroutine. This change removes all bare goroutine calls and replaces it with the new task APIs. Most of the tests already handle errors from goroutines via a channel. This logic is best left in its current state. New implementations can choose whether they want to fail a test by returning an error from a managed goroutine, or would rather handle errors some other way. The use of `errgroup.Group` and `Monitor` still remains and will be addressed in a different PR. Informs: cockroachdb#118214 Epic: None Release note: None
b9e52e5
to
716be3f
Compare
@@ -81,18 +82,10 @@ func registerAllocator(r registry.Registry) { | |||
c.Start(ctx, t.L(), startOpts, install.MakeClusterSettings(), c.Range(start+1, nodes)) | |||
c.Run(ctx, option.WithNodes(c.Node(1)), "./cockroach workload init kv --drop {pgurl:1}") | |||
for node := 1; node <= nodes; node++ { | |||
node := node | |||
// TODO(dan): Ideally, the test would fail if this queryload failed, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto!
defer wg.Done() | ||
<-upgradeFinished | ||
l.Printf("tenant upgrades finished") | ||
}() | ||
return nil | ||
}) | ||
|
||
wg.Wait() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think there's value in having a t.Wait
that blocks until all goroutines return? Similar to how m.Wait
functions? I imagine we will see more need of this when the monitor refactor happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! I do see value in it, hence a follow-up PR, that adds group management to the task manager: #135831
if _, err := runnerConn.Exec(setupQuery); err != nil { | ||
errCh <- err | ||
close(sem) | ||
return | ||
// Errors are handled in the main goroutine. | ||
return nil //nolint:returnerrcheck |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it's not immediately obvious that returning errors in t.Go
fails the test. I think it makes sense for that to be the behavior but you have to go a couple layers and peek at the implementation to confirm that is the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with Darryl's comment; we should make this explicit in the interface doc. Also, this test could technically be refactored further to explicitly return err
instead of passing it into errCh
, but I also understand if you want to reduce the footprint of all changes in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had the same thought, will need to make it clear that returning an error from a task will result in a test failure. We could alternatively provide a version of the Go/GoWithCancel methods that does not take an error return.
From the footprint perspective, I didn't want to fiddle too much with already working code. But we could definitely make some of these more ergonomic. I'll create an issue for it, since engineers tend to borrow from existing code, it would be better if the examples are improved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could alternatively provide a version of the Go/GoWithCancel methods that does not take an error return.
No bother, it's not that hard to add return nil
:)
cmd := fmt.Sprintf("./cockroach workload run kv --tolerate-errors --min-block-bytes=8 --max-block-bytes=127 {pgurl%s}", c.Node(node)) | ||
l, err := t.L().ChildLogger(fmt.Sprintf(`kv-%d`, node)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this logger was previously unused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, had to look twice before removing, and in actual fact was dead code.
@@ -87,11 +88,11 @@ func runClusterInit(ctx context.Context, t test.Test, c cluster.Cluster) { | |||
t.L().Printf("checking that the SQL conns are not failing immediately") | |||
errCh := make(chan error, len(dbs)) | |||
for _, db := range dbs { | |||
db := db |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good riddance :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for refactoring all "naked" go calls! LGTM. We should probably do SELECT_PROBABILITY=0.6 to make sure we hit enough tests to shake out any potential regressions.
Definitely, I ran a targeted job on all the affected tests with a rather large regex. But I think a 0.6 probability would also make for a good final check. Will do one before merging today/tomorrow. |
Added a comment to inform test implementors, using the task API, that returning an error from a Tasker goroutine will fail a test. Informs: cockroachdb#118214 Epic: None Release note: None
TFTRs! bors r=srosenberg,DarrylWong |
Previously, tests had to use bare goroutine calls to start a goroutine. This change removes all bare goroutine calls and replaces it with the new task APIs.
The use of
errgroup.Group
andMonitor
still remains and will be addressed in a different PR.Informs: #118214
Epic: None
Release note: None