Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: remove direct go calls #134644

Merged

Conversation

herkolategan
Copy link
Collaborator

Previously, tests had to use bare goroutine calls to start a goroutine. This change removes all bare goroutine calls and replaces it with the new task APIs.

The use of errgroup.Group and Monitor still remains and will be addressed in a different PR.

Informs: #118214

Epic: None
Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@herkolategan herkolategan force-pushed the hbl/roachtest-remove-bare-go-routines branch 4 times, most recently from 118a08e to b9e52e5 Compare November 18, 2024 17:03
Previously, tests had to use bare goroutine calls to start a goroutine. This
change removes all bare goroutine calls and replaces it with the new task APIs.
Most of the tests already handle errors from goroutines via a channel. This
logic is best left in its current state. New implementations can choose whether
they want to fail a test by returning an error from a managed goroutine, or
would rather handle errors some other way.

The use of `errgroup.Group` and `Monitor` still remains and will be addressed in
a different PR.

Informs: cockroachdb#118214

Epic: None
Release note: None
@herkolategan herkolategan force-pushed the hbl/roachtest-remove-bare-go-routines branch from b9e52e5 to 716be3f Compare November 19, 2024 14:23
@herkolategan herkolategan marked this pull request as ready for review November 20, 2024 16:41
@herkolategan herkolategan requested a review from a team as a code owner November 20, 2024 16:41
@herkolategan herkolategan requested review from nameisbhaskar and vidit-bhat and removed request for a team November 20, 2024 16:41
@@ -81,18 +82,10 @@ func registerAllocator(r registry.Registry) {
c.Start(ctx, t.L(), startOpts, install.MakeClusterSettings(), c.Range(start+1, nodes))
c.Run(ctx, option.WithNodes(c.Node(1)), "./cockroach workload init kv --drop {pgurl:1}")
for node := 1; node <= nodes; node++ {
node := node
// TODO(dan): Ideally, the test would fail if this queryload failed,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto!

defer wg.Done()
<-upgradeFinished
l.Printf("tenant upgrades finished")
}()
return nil
})

wg.Wait()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think there's value in having a t.Wait that blocks until all goroutines return? Similar to how m.Wait functions? I imagine we will see more need of this when the monitor refactor happens.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! I do see value in it, hence a follow-up PR, that adds group management to the task manager: #135831

if _, err := runnerConn.Exec(setupQuery); err != nil {
errCh <- err
close(sem)
return
// Errors are handled in the main goroutine.
return nil //nolint:returnerrcheck
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it's not immediately obvious that returning errors in t.Go fails the test. I think it makes sense for that to be the behavior but you have to go a couple layers and peek at the implementation to confirm that is the case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with Darryl's comment; we should make this explicit in the interface doc. Also, this test could technically be refactored further to explicitly return err instead of passing it into errCh, but I also understand if you want to reduce the footprint of all changes in this PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same thought, will need to make it clear that returning an error from a task will result in a test failure. We could alternatively provide a version of the Go/GoWithCancel methods that does not take an error return.

From the footprint perspective, I didn't want to fiddle too much with already working code. But we could definitely make some of these more ergonomic. I'll create an issue for it, since engineers tend to borrow from existing code, it would be better if the examples are improved.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could alternatively provide a version of the Go/GoWithCancel methods that does not take an error return.

No bother, it's not that hard to add return nil :)

cmd := fmt.Sprintf("./cockroach workload run kv --tolerate-errors --min-block-bytes=8 --max-block-bytes=127 {pgurl%s}", c.Node(node))
l, err := t.L().ChildLogger(fmt.Sprintf(`kv-%d`, node))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this logger was previously unused?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, had to look twice before removing, and in actual fact was dead code.

@@ -87,11 +88,11 @@ func runClusterInit(ctx context.Context, t test.Test, c cluster.Cluster) {
t.L().Printf("checking that the SQL conns are not failing immediately")
errCh := make(chan error, len(dbs))
for _, db := range dbs {
db := db
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good riddance :)

@srosenberg srosenberg self-requested a review November 20, 2024 19:29
Copy link
Member

@srosenberg srosenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for refactoring all "naked" go calls! LGTM. We should probably do SELECT_PROBABILITY=0.6 to make sure we hit enough tests to shake out any potential regressions.

@herkolategan
Copy link
Collaborator Author

herkolategan commented Nov 21, 2024

Thanks for refactoring all "naked" go calls! LGTM. We should probably do SELECT_PROBABILITY=0.6 to make sure we hit enough tests to shake out any potential regressions.

Definitely, I ran a targeted job on all the affected tests with a rather large regex. But I think a 0.6 probability would also make for a good final check. Will do one before merging today/tomorrow.

Added a comment to inform test implementors, using the task API, that returning
an error from a Tasker goroutine will fail a test.

Informs: cockroachdb#118214

Epic: None
Release note: None
@herkolategan
Copy link
Collaborator Author

TFTRs!

bors r=srosenberg,DarrylWong

@craig craig bot merged commit a86b127 into cockroachdb:master Nov 27, 2024
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants