-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: enhance comments surrounding draining #76172
Conversation
@ZhouXing19 we can be a bit more precise about how we think about the drain sequence. While we have been thinking about it as:
Really it's better to think about phase 3 as: Drain the SQL layer. This includes draining SQL connections, draining distributed SQL execution flows, and draining SQL table leases. |
cc @taroface |
0f46dd8
to
f69fee2
Compare
server.shutdown.lease_transfer_wait duration 5s the amount of time a server waits to transfer range leases before proceeding with the rest of the shutdown process (note that the --drain-wait parameter for cockroach node drain may need adjustment after changing this setting) | ||
server.shutdown.query_wait duration 10s the server will wait for at least this amount of time for active queries to finish (note that the --drain-wait parameter for cockroach node drain may need adjustment after changing this setting) | ||
server.shutdown.drain_wait duration 0s the minimum amount of time a server waits in an unready state before proceeding with a drain(note that the --drain-wait parameter for cockroach node drain may need adjustment after changing this setting) | ||
server.shutdown.lease_transfer_wait duration 5s the timeout for a single iteration of the range lease transfer phase of draining(note that the --drain-wait parameter for cockroach node drain may need adjustment after changing this setting) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: missing space between "drain"/"draining" and "(note" in the two above settings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed thanks.
server.shutdown.drain_wait duration 0s the amount of time a server waits in an unready state before proceeding with the rest of the shutdown process (note that the --drain-wait parameter for cockroach node drain may need adjustment after changing this setting) | ||
server.shutdown.lease_transfer_wait duration 5s the amount of time a server waits to transfer range leases before proceeding with the rest of the shutdown process (note that the --drain-wait parameter for cockroach node drain may need adjustment after changing this setting) | ||
server.shutdown.query_wait duration 10s the server will wait for at least this amount of time for active queries to finish (note that the --drain-wait parameter for cockroach node drain may need adjustment after changing this setting) | ||
server.shutdown.drain_wait duration 0s the minimum amount of time a server waits in an unready state before proceeding with a drain(note that the --drain-wait parameter for cockroach node drain may need adjustment after changing this setting) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that the unready state is the first step of a drain. So should "before proceeding with a drain" be e.g. "at the beginning of a drain"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of the first step is to wait the duration of drainWait
before we start draining. So while it is technically a part of the drain sequence, it is not a part of the actual draining phases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, thanks
Reviewed 8 of 8 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @cameronnunez and @ZhouXing19)
docs/generated/settings/settings-for-tenants.txt, line 63 at r1 (raw file):
server.oidc_authentication.scopes string openid sets OIDC scopes to include with authentication request (space delimited list of strings, required to start with `openid`) server.rangelog.ttl duration 720h0m0s if nonzero, range log entries older than this duration are deleted every 10m0s. Should not be lowered below 24 hours. server.shutdown.drain_wait duration 0s the minimum amount of time a server waits in an unready state before proceeding with a drain(note that the --drain-wait parameter for cockroach node drain may need adjustment after changing this setting)
why "minimum" here? Is there a way for the initial delay to be longer than the configured value?
pkg/kv/kvserver/store.go, line 1358 at r1 (raw file):
// // Note: this code represents ONE round of draining. This code is iterated on // indefinitely until all leases are transferred away.
perhaps you can point the reader to where this iteration can be found.
pkg/server/drain.go, line 174 at r1 (raw file):
// Drain idempotently activates the draining mode. // Note: this represents ONE round of draining. This code is iterated on // indefinitely until all range leases have been drained.
ditto
pkg/server/drain.go, line 230 at r1 (raw file):
return err } // Finally, mark the node as draining in liveness and drain the
We need to explain the bit about liveness too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
! My opinions are just for minor changes.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @cameronnunez, @knz, and @taroface)
docs/generated/settings/settings-for-tenants.txt, line 63 at r1 (raw file):
Previously, knz (kena) wrote…
why "minimum" here? Is there a way for the initial delay to be longer than the configured value?
I think it was trying to say that this is a "hard wait" without any early exit.
docs/generated/settings/settings-for-tenants.txt, line 63 at r1 (raw file):
Previously, taroface (Ryan Kuo) wrote…
My understanding is that the unready state is the first step of a drain. So should "before proceeding with a drain" be e.g. "at the beginning of a drain"?
It seems that we try to say this period is not a draining phase, but more of a "warm-up" that waits for the draining to starts.
pkg/server/drain.go, line 248 at r1 (raw file):
// Set the gRPC mode of the node to "draining" and mark the node as "not ready". // Probes to /health?ready=1 will now notice the change in the node's readiness.
nit: "will now notice the node is in a "shutting-down" status, and is unready for new SQL connection."
pkg/server/drain.go, line 261 at r1 (raw file):
// Drain all SQL connections. // The queryWait duration is used to wait on clients to self-disconnect.
Nit: Though this may have been conveyed by the words "wait" and "timeout", maybe we can explicitly add that "Once all existing SQL queries finish before the queryWait
timeout, the server early exits this waiting phase and proceeds to the rest of the drain".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @cameronnunez, @taroface, and @ZhouXing19)
docs/generated/settings/settings-for-tenants.txt, line 63 at r1 (raw file):
Previously, ZhouXing19 (Jane Xing) wrote…
I think it was trying to say that this is a "hard wait" without any early exit.
then the good word is "fixed", not "minimum".
f69fee2
to
2d430bd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @knz, @taroface, and @ZhouXing19)
docs/generated/settings/settings-for-tenants.txt, line 63 at r1 (raw file):
Previously, knz (kena) wrote…
then the good word is "fixed", not "minimum".
Fixed.
docs/generated/settings/settings-for-tenants.txt, line 64 at r1 (raw file):
Previously, taroface (Ryan Kuo) wrote…
Nit: missing space between "drain"/"draining" and "(note" in the two above settings
Noted fixed.
pkg/kv/kvserver/store.go, line 1358 at r1 (raw file):
Previously, knz (kena) wrote…
perhaps you can point the reader to where this iteration can be found.
Done.
pkg/server/drain.go, line 174 at r1 (raw file):
Previously, knz (kena) wrote…
ditto
Done.
pkg/server/drain.go, line 230 at r1 (raw file):
Previously, knz (kena) wrote…
We need to explain the bit about liveness too.
Done.
pkg/server/drain.go, line 248 at r1 (raw file):
Previously, ZhouXing19 (Jane Xing) wrote…
nit: "will now notice the node is in a "shutting-down" status, and is unready for new SQL connection."
since the mode is modeDraining
I think saying "draining" is more appropriate. Also the node can still accept new SQL connections here.
pkg/server/drain.go, line 261 at r1 (raw file):
Previously, ZhouXing19 (Jane Xing) wrote…
Nit: Though this may have been conveyed by the words "wait" and "timeout", maybe we can explicitly add that "Once all existing SQL queries finish before the
queryWait
timeout, the server early exits this waiting phase and proceeds to the rest of the drain".
More specific about it being a timeout; timeout implies a max duration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 6 of 6 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @cameronnunez, @taroface, and @ZhouXing19)
docs/generated/settings/settings-for-tenants.txt, line 63 at r1 (raw file):
Previously, cameronnunez (Cameron Nunez) wrote…
Fixed.
i don't see it
also there's a space missing before a parenthesis
2d430bd
to
9c671b5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @knz, @taroface, and @ZhouXing19)
docs/generated/settings/settings-for-tenants.txt, line 63 at r1 (raw file):
Previously, knz (kena) wrote…
i don't see it
also there's a space missing before a parenthesis
ah I was having issues with docs gen. Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @taroface and @ZhouXing19)
9c671b5
to
d95bef7
Compare
thanks! bors r=knz |
Build failed: |
bors r- CI failure |
f5e98d7
to
147552f
Compare
bors r=knz |
Build failed (retrying...): |
bors r- |
Canceled. |
147552f
to
74e60bd
Compare
PS, I think this is tripping up the merge queue due due to unexpected diffs: https://teamcity.cockroachdb.com/viewLog.html?buildId=4372990&problemId=2701 Have you tried |
yeah I've noticed but the regeneration of that file seems odd to me, but I'll add the lines and see what happens with CI |
looks like I'm running into problems with |
f7bae40
to
5f33e8f
Compare
This patch improves the comments related to the draining process, fixing incorrect statements and providing more details to make the process more easily understood. Release note: None
5f33e8f
to
b7dfd62
Compare
bors r=knz |
Build succeeded: |
This patch improves the comments related to the draining process,
fixing incorrect statements and providing more details to make
the process more easily understood.
Release note: None