Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachprod: force wait=true for stop --signal=9 #78268

Merged
merged 1 commit into from
Mar 28, 2022

Conversation

tbg
Copy link
Member

@tbg tbg commented Mar 22, 2022

Touches #77334.

Release note: None

@tbg tbg requested a review from a team as a code owner March 22, 2022 16:53
@tbg tbg requested review from a team and otan and removed request for a team March 22, 2022 16:53
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@srosenberg srosenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for PR! Minor comments, otherwise it's good to go.

Reviewed all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @otan and @tbg)


pkg/roachprod/roachprod.go, line 680 at r1 (raw file):

	// If Wait is set, roachprod waits until the PID disappears (i.e. the
	// process has terminated).
	Wait bool // forced to true when Sig == 9

Might be worthwhile to also change roachprod's DefaultStopOpts.Wait default to true.


pkg/roachprod/install/cluster_synced.go, line 228 at r1 (raw file):

func (c *SyncedCluster) Stop(ctx context.Context, l *logger.Logger, sig int, wait bool) error {
	if sig == 9 {
		// `kill -9` without wait is never what a caller wants. See #77334.

While I can't think of a specific use-case which would require wait=false, it would improve debugging experience to log in the case that the wait value is overridden below.

@tbg tbg requested a review from srosenberg March 28, 2022 08:11
Copy link
Member Author

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR! No changes, but would like you to take a look at my responses before I merge.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @otan and @srosenberg)


pkg/roachprod/roachprod.go, line 680 at r1 (raw file):

Previously, srosenberg (Stan Rosenberg) wrote…

Might be worthwhile to also change roachprod's DefaultStopOpts.Wait default to true.

I intentionally didn't do that, or things like this will rot:

	stopOpts := option.DefaultStopOpts()
	stopOpts.RoachprodOpts.Sig = 2

DefaultStopOpts() has 50 usages. I would like to minimize potential disruption during the stability period (and would like to backport this without sweating). Generally I still like the plan to move from Wait -> to NoWait but I think that should be done separately from fixing the missing wait in kill -9 that exists pervasively across the test suite and is ever desirable.


pkg/roachprod/install/cluster_synced.go, line 228 at r1 (raw file):

Previously, srosenberg (Stan Rosenberg) wrote…

While I can't think of a specific use-case which would require wait=false, it would improve debugging experience to log in the case that the wait value is overridden below.

It won't be useful to log this. Tests call c.Stop all the time and since I am averse to changing the default for the Wait flag at this moment (see other comment) it would just get spammed all over the place (into already spammy logs :-) )

Copy link
Member

@srosenberg srosenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 2 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @otan and @tbg)


pkg/roachprod/roachprod.go, line 680 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

I intentionally didn't do that, or things like this will rot:

	stopOpts := option.DefaultStopOpts()
	stopOpts.RoachprodOpts.Sig = 2

DefaultStopOpts() has 50 usages. I would like to minimize potential disruption during the stability period (and would like to backport this without sweating). Generally I still like the plan to move from Wait -> to NoWait but I think that should be done separately from fixing the missing wait in kill -9 that exists pervasively across the test suite and is ever desirable.

Sounds good; would you mind leaving a TODO for a follow-up refactoring.


pkg/roachprod/install/cluster_synced.go, line 228 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

It won't be useful to log this. Tests call c.Stop all the time and since I am averse to changing the default for the Wait flag at this moment (see other comment) it would just get spammed all over the place (into already spammy logs :-) )

Got it; logs are way too spammy as is :(

@tbg
Copy link
Member Author

tbg commented Mar 28, 2022

bors r=srosenberg

Not going to put a TODO in the code but instead leaving #77334 open.

@craig
Copy link
Contributor

craig bot commented Mar 28, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Mar 28, 2022

Build succeeded:

@craig craig bot merged commit 7d941bb into cockroachdb:master Mar 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants