Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: enhance comments surrounding draining #76172

Merged
merged 1 commit into from
Feb 15, 2022

Conversation

cameronnunez
Copy link
Contributor

This patch improves the comments related to the draining process,
fixing incorrect statements and providing more details to make
the process more easily understood.

Release note: None

@cameronnunez cameronnunez requested a review from a team as a code owner February 7, 2022 17:02
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@cameronnunez
Copy link
Contributor Author

cameronnunez commented Feb 7, 2022

@ZhouXing19 we can be a bit more precise about how we think about the drain sequence. While we have been thinking about it as:

  1. Wait as "not ready"
  2. (WIP) Wait for SQL connections to close
  3. Drain SQL connections
  4. Drain range leases & leadership

Really it's better to think about phase 3 as: Drain the SQL layer. This includes draining SQL connections, draining distributed SQL execution flows, and draining SQL table leases.

@cameronnunez
Copy link
Contributor Author

cc @taroface

@cameronnunez cameronnunez force-pushed the improve-drain-comments branch 2 times, most recently from 0f46dd8 to f69fee2 Compare February 7, 2022 17:24
server.shutdown.lease_transfer_wait duration 5s the amount of time a server waits to transfer range leases before proceeding with the rest of the shutdown process (note that the --drain-wait parameter for cockroach node drain may need adjustment after changing this setting)
server.shutdown.query_wait duration 10s the server will wait for at least this amount of time for active queries to finish (note that the --drain-wait parameter for cockroach node drain may need adjustment after changing this setting)
server.shutdown.drain_wait duration 0s the minimum amount of time a server waits in an unready state before proceeding with a drain(note that the --drain-wait parameter for cockroach node drain may need adjustment after changing this setting)
server.shutdown.lease_transfer_wait duration 5s the timeout for a single iteration of the range lease transfer phase of draining(note that the --drain-wait parameter for cockroach node drain may need adjustment after changing this setting)
Copy link
Collaborator

@taroface taroface Feb 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: missing space between "drain"/"draining" and "(note" in the two above settings

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed thanks.

server.shutdown.drain_wait duration 0s the amount of time a server waits in an unready state before proceeding with the rest of the shutdown process (note that the --drain-wait parameter for cockroach node drain may need adjustment after changing this setting)
server.shutdown.lease_transfer_wait duration 5s the amount of time a server waits to transfer range leases before proceeding with the rest of the shutdown process (note that the --drain-wait parameter for cockroach node drain may need adjustment after changing this setting)
server.shutdown.query_wait duration 10s the server will wait for at least this amount of time for active queries to finish (note that the --drain-wait parameter for cockroach node drain may need adjustment after changing this setting)
server.shutdown.drain_wait duration 0s the minimum amount of time a server waits in an unready state before proceeding with a drain(note that the --drain-wait parameter for cockroach node drain may need adjustment after changing this setting)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that the unready state is the first step of a drain. So should "before proceeding with a drain" be e.g. "at the beginning of a drain"?

Copy link
Contributor Author

@cameronnunez cameronnunez Feb 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of the first step is to wait the duration of drainWait before we start draining. So while it is technically a part of the drain sequence, it is not a part of the actual draining phases.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, thanks

Reviewed 8 of 8 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @cameronnunez and @ZhouXing19)


docs/generated/settings/settings-for-tenants.txt, line 63 at r1 (raw file):

server.oidc_authentication.scopes	string	openid	sets OIDC scopes to include with authentication request (space delimited list of strings, required to start with `openid`)
server.rangelog.ttl	duration	720h0m0s	if nonzero, range log entries older than this duration are deleted every 10m0s. Should not be lowered below 24 hours.
server.shutdown.drain_wait	duration	0s	the minimum amount of time a server waits in an unready state before proceeding with a drain(note that the --drain-wait parameter for cockroach node drain may need adjustment after changing this setting)

why "minimum" here? Is there a way for the initial delay to be longer than the configured value?


pkg/kv/kvserver/store.go, line 1358 at r1 (raw file):

//
// Note: this code represents ONE round of draining. This code is iterated on
// indefinitely until all leases are transferred away.

perhaps you can point the reader to where this iteration can be found.


pkg/server/drain.go, line 174 at r1 (raw file):

// Drain idempotently activates the draining mode.
// Note: this represents ONE round of draining. This code is iterated on
// indefinitely until all range leases have been drained.

ditto


pkg/server/drain.go, line 230 at r1 (raw file):

		return err
	}
	// Finally, mark the node as draining in liveness and drain the

We need to explain the bit about liveness too.

Copy link
Collaborator

@ZhouXing19 ZhouXing19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:! My opinions are just for minor changes.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @cameronnunez, @knz, and @taroface)


docs/generated/settings/settings-for-tenants.txt, line 63 at r1 (raw file):

Previously, knz (kena) wrote…

why "minimum" here? Is there a way for the initial delay to be longer than the configured value?

I think it was trying to say that this is a "hard wait" without any early exit.


docs/generated/settings/settings-for-tenants.txt, line 63 at r1 (raw file):

Previously, taroface (Ryan Kuo) wrote…

My understanding is that the unready state is the first step of a drain. So should "before proceeding with a drain" be e.g. "at the beginning of a drain"?

It seems that we try to say this period is not a draining phase, but more of a "warm-up" that waits for the draining to starts.


pkg/server/drain.go, line 248 at r1 (raw file):

	// Set the gRPC mode of the node to "draining" and mark the node as "not ready".
	// Probes to /health?ready=1 will now notice the change in the node's readiness.

nit: "will now notice the node is in a "shutting-down" status, and is unready for new SQL connection."


pkg/server/drain.go, line 261 at r1 (raw file):

	// Drain all SQL connections.
	// The queryWait duration is used to wait on clients to self-disconnect.

Nit: Though this may have been conveyed by the words "wait" and "timeout", maybe we can explicitly add that "Once all existing SQL queries finish before the queryWait timeout, the server early exits this waiting phase and proceeds to the rest of the drain".

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @cameronnunez, @taroface, and @ZhouXing19)


docs/generated/settings/settings-for-tenants.txt, line 63 at r1 (raw file):

Previously, ZhouXing19 (Jane Xing) wrote…

I think it was trying to say that this is a "hard wait" without any early exit.

then the good word is "fixed", not "minimum".

@cameronnunez cameronnunez force-pushed the improve-drain-comments branch from f69fee2 to 2d430bd Compare February 9, 2022 16:55
@cameronnunez cameronnunez requested a review from a team as a code owner February 9, 2022 16:55
Copy link
Contributor Author

@cameronnunez cameronnunez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @knz, @taroface, and @ZhouXing19)


docs/generated/settings/settings-for-tenants.txt, line 63 at r1 (raw file):

Previously, knz (kena) wrote…

then the good word is "fixed", not "minimum".

Fixed.


docs/generated/settings/settings-for-tenants.txt, line 64 at r1 (raw file):

Previously, taroface (Ryan Kuo) wrote…

Nit: missing space between "drain"/"draining" and "(note" in the two above settings

Noted fixed.


pkg/kv/kvserver/store.go, line 1358 at r1 (raw file):

Previously, knz (kena) wrote…

perhaps you can point the reader to where this iteration can be found.

Done.


pkg/server/drain.go, line 174 at r1 (raw file):

Previously, knz (kena) wrote…

ditto

Done.


pkg/server/drain.go, line 230 at r1 (raw file):

Previously, knz (kena) wrote…

We need to explain the bit about liveness too.

Done.


pkg/server/drain.go, line 248 at r1 (raw file):

Previously, ZhouXing19 (Jane Xing) wrote…

nit: "will now notice the node is in a "shutting-down" status, and is unready for new SQL connection."

since the mode is modeDraining I think saying "draining" is more appropriate. Also the node can still accept new SQL connections here.


pkg/server/drain.go, line 261 at r1 (raw file):

Previously, ZhouXing19 (Jane Xing) wrote…

Nit: Though this may have been conveyed by the words "wait" and "timeout", maybe we can explicitly add that "Once all existing SQL queries finish before the queryWait timeout, the server early exits this waiting phase and proceeds to the rest of the drain".

More specific about it being a timeout; timeout implies a max duration.

@cameronnunez cameronnunez requested a review from knz February 9, 2022 16:59
Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 6 of 6 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @cameronnunez, @taroface, and @ZhouXing19)


docs/generated/settings/settings-for-tenants.txt, line 63 at r1 (raw file):

Previously, cameronnunez (Cameron Nunez) wrote…

Fixed.

i don't see it

also there's a space missing before a parenthesis

@cameronnunez cameronnunez force-pushed the improve-drain-comments branch from 2d430bd to 9c671b5 Compare February 9, 2022 22:46
Copy link
Contributor Author

@cameronnunez cameronnunez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @knz, @taroface, and @ZhouXing19)


docs/generated/settings/settings-for-tenants.txt, line 63 at r1 (raw file):

Previously, knz (kena) wrote…

i don't see it

also there's a space missing before a parenthesis

ah I was having issues with docs gen. Fixed.

@cameronnunez cameronnunez requested a review from knz February 9, 2022 22:47
Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @taroface and @ZhouXing19)

@cameronnunez cameronnunez force-pushed the improve-drain-comments branch from 9c671b5 to d95bef7 Compare February 10, 2022 15:44
@cameronnunez
Copy link
Contributor Author

thanks!

bors r=knz

@craig
Copy link
Contributor

craig bot commented Feb 10, 2022

Build failed:

@ajwerner
Copy link
Contributor

bors r-

CI failure

@cameronnunez cameronnunez force-pushed the improve-drain-comments branch 3 times, most recently from f5e98d7 to 147552f Compare February 14, 2022 15:44
@cameronnunez
Copy link
Contributor Author

bors r=knz

@craig
Copy link
Contributor

craig bot commented Feb 14, 2022

Build failed (retrying...):

@cameronnunez
Copy link
Contributor Author

bors r-

@craig
Copy link
Contributor

craig bot commented Feb 14, 2022

Canceled.

@cameronnunez cameronnunez force-pushed the improve-drain-comments branch from 147552f to 74e60bd Compare February 14, 2022 20:24
@irfansharif
Copy link
Contributor

PS, I think this is tripping up the merge queue due due to unexpected diffs: https://teamcity.cockroachdb.com/viewLog.html?buildId=4372990&problemId=2701

Have you tried dev generate bazel to see if it suggests changes?

@cameronnunez
Copy link
Contributor Author

yeah I've noticed but the regeneration of that file seems odd to me, but I'll add the lines and see what happens with CI

@cameronnunez
Copy link
Contributor Author

looks like I'm running into problems with node_modules because I switched between make build and dev build

@cameronnunez cameronnunez force-pushed the improve-drain-comments branch 2 times, most recently from f7bae40 to 5f33e8f Compare February 15, 2022 14:21
This patch improves the comments related to the draining process,
fixing incorrect statements and providing more details to make
the process more easily understood.

Release note: None
@cameronnunez cameronnunez force-pushed the improve-drain-comments branch from 5f33e8f to b7dfd62 Compare February 15, 2022 15:37
@cameronnunez
Copy link
Contributor Author

bors r=knz

@craig craig bot merged commit 1452123 into cockroachdb:master Feb 15, 2022
@craig
Copy link
Contributor

craig bot commented Feb 15, 2022

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants