Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update 'Stop a Node' with more draining info #2671

Merged
merged 13 commits into from
Mar 19, 2018

Conversation

rmloveland
Copy link
Contributor

@rmloveland rmloveland commented Mar 8, 2018

Fixes #2436
Fixes #2620

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@rmloveland rmloveland requested a review from asubiotto March 8, 2018 21:15
@rmloveland rmloveland requested review from a-robinson and removed request for asubiotto March 8, 2018 21:28
@rmloveland
Copy link
Contributor Author

@a-robinson - Alfonso is on holiday for 2 weeks and suggested you as a potential reviewer. Mind giving these changes a look?

@a-robinson
Copy link
Contributor

Reviewed 3 of 3 files at r1.
Review status: 2 of 3 files reviewed at latest revision, 4 unresolved discussions, some commit checks failed.


v2.0/cluster-settings.md, line 25 at r1 (raw file):

The following settings can be configured without further input from Cockroach Labs:

{% include settings/v2.0/settings.md %}

We shouldn't replace these with the entire list -- they were curated to be non-experimental settings that end users could reasonably understand and use on their own without additional documentation or handholding from us. I know it's a pain to keep this up-to-date, but we should either continue trying to do so or change the sentence above to not claim that they can all be configured without our advice.


v2.0/stop-a-node.md, line 19 at r1 (raw file):

When you stop a node, it performs the following steps:

- Finishes in-flight requests. Note that this is a best effort that times out at the `server.shutdown.query_wait` [cluster setting](cluster-settings.html).

I'd phrase this as "that times out after the duration specified by the server.shutdown.query_wait cluster setting"


v2.0/stop-a-node.md, line 21 at r1 (raw file):

- Finishes in-flight requests. Note that this is a best effort that times out at the `server.shutdown.query_wait` [cluster setting](cluster-settings.html).
- Transfers all *range leases* and Raft leadership to other nodes.
- Gossips its draining state to the cluster, so that other nodes do not try to distribute query planning to the draining node, and no leases are transferred to the draining node.  Note that this is best effort that times out at the `server.shutdown.drain_wait` [cluster setting](cluster-settings.html), so other nodes may not receive the gossip info in time.

s/best effort/a best effort/

And a similar rephrasing as above would make sense here, too.


v2.0/stop-a-node.md, line 21 at r1 (raw file):

- Finishes in-flight requests. Note that this is a best effort that times out at the `server.shutdown.query_wait` [cluster setting](cluster-settings.html).
- Transfers all *range leases* and Raft leadership to other nodes.
- Gossips its draining state to the cluster, so that other nodes do not try to distribute query planning to the draining node, and no leases are transferred to the draining node.  Note that this is best effort that times out at the `server.shutdown.drain_wait` [cluster setting](cluster-settings.html), so other nodes may not receive the gossip info in time.

I'm not positive how effective the "distribute query planning" part actually is (particularly given cockroachdb/cockroach#23601), but that's certainly how we'd like it to work so including it here is probably fine.

Out of curiosity, @solongordon do you know whether other nodes attempt to avoid scheduling distsql processing on draining nodes? Is it purely based on the leaseholder cache?


Comments from Reviewable

@jseldess
Copy link
Contributor

jseldess commented Mar 9, 2018

v2.0/cluster-settings.md, line 25 at r1 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

We shouldn't replace these with the entire list -- they were curated to be non-experimental settings that end users could reasonably understand and use on their own without additional documentation or handholding from us. I know it's a pain to keep this up-to-date, but we should either continue trying to do so or change the sentence above to not claim that they can all be configured without our advice.

cc @dt, who did the work to auto-generate this markdown.


Comments from Reviewable

@solongordon
Copy link
Contributor

Review status: 2 of 3 files reviewed at latest revision, 4 unresolved discussions, some commit checks failed.


v2.0/stop-a-node.md, line 21 at r1 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

I'm not positive how effective the "distribute query planning" part actually is (particularly given cockroachdb/cockroach#23601), but that's certainly how we'd like it to work so including it here is probably fine.

Out of curiosity, @solongordon do you know whether other nodes attempt to avoid scheduling distsql processing on draining nodes? Is it purely based on the leaseholder cache?

I haven't seen it in action but it does look like there's logic for avoiding processing on draining nodes: https://github.com/cockroachdb/cockroach/blob/afdf5e12bfdd11054582be7e33280fa185974d9e/pkg/sql/distsql_physical_planner.go#L540-L556


Comments from Reviewable

@dt
Copy link
Member

dt commented Mar 9, 2018

Review status: 2 of 3 files reviewed at latest revision, 4 unresolved discussions, some commit checks failed.


v2.0/cluster-settings.md, line 25 at r1 (raw file):

Previously, jseldess wrote…

cc @dt, who did the work to auto-generate this markdown.

I think we should put the whole list here and we should remove the sentence above.

Note that above that sentence is a call-out box saying that you should know what you're doing, or talk to us, before changing settings, which I believe adequately sets expectations around these.

For settings that are riskier or whatever, I think we should mention that it in their defined description (we might want to add a longer-form description field).

For extreme cases, we also have the ability to define settings as Hidden which excludes them from SHOW ALL CLUSTER SETTINGS and this list, though i'm generally in favor of documenting over hiding.


Comments from Reviewable

@a-robinson
Copy link
Contributor

:lgtm:


Reviewed 1 of 1 files at r2.
Review status: all files reviewed at latest revision, 4 unresolved discussions, some commit checks failed.


Comments from Reviewable

Copy link
Contributor

@jseldess jseldess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, @rmloveland, with one minor request.

The new node shutdown description is big improvement. Thank you!

Are you sure you don't want to handle the other versions in this PR?


- Finishes in-flight requests. Note that this is a best effort that times out at the `server.shutdown.query_wait` [cluster setting](cluster-settings.html).
- Transfers all *range leases* and Raft leadership to other nodes.
- Gossips its draining state to the cluster, so that other nodes do not try to distribute query planning to the draining node, and no leases are transferred to the draining node. Note that this is best effort that times out at the `server.shutdown.drain_wait` [cluster setting](cluster-settings.html), so other nodes may not receive the gossip info in time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Use only one space after the first period. I know you prefer 2, @rmloveland, but convention in our docs is 1.

| `sql.metrics.statement_details.dump_to_logs` | On each node, also copy collected per-statement statistics to the [logging output](debug-and-error-logs.html) when automatic reporting is enabled. | Boolean | `false` |
| `sql.metrics.statement_details.threshold` | Only collect per-statement statistics for statements that run longer than this threshold. | Interval | 0 seconds (all statements) |
| `sql.trace.log_statement_execute` | On each node, copy all executed statements to the [logging output](debug-and-error-logs.html). | Boolean | `false` |
{% include settings/v2.0/settings.md %}

<!-- Add this section back in once `system.settings` has been fleshed out.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this old commented-out text.

@rmloveland
Copy link
Contributor Author

Review status: 1 of 3 files reviewed at latest revision, 6 unresolved discussions, some commit checks pending.


v2.0/cluster-settings.md, line 25 at r1 (raw file):

Previously, dt (David Taylor) wrote…

I think we should put the whole list here and we should remove the sentence above.

Note that above that sentence is a call-out box saying that you should know what you're doing, or talk to us, before changing settings, which I believe adequately sets expectations around these.

For settings that are riskier or whatever, I think we should mention that it in their defined description (we might want to add a longer-form description field).

For extreme cases, we also have the ability to define settings as Hidden which excludes them from SHOW ALL CLUSTER SETTINGS and this list, though i'm generally in favor of documenting over hiding.

Removed the sentence above the list as recommended - in 898fa26


v2.0/cluster-settings.md, line 25 at r2 (raw file):

Previously, jseldess wrote…

Please remove this old commented-out text.

Removed the commented-out text in 94b0f11


v2.0/stop-a-node.md, line 19 at r1 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

I'd phrase this as "that times out after the duration specified by the server.shutdown.query_wait cluster setting"

Updated as recommended in 7fc7a2f


v2.0/stop-a-node.md, line 21 at r1 (raw file):

Previously, a-robinson (Alex Robinson) wrote…

s/best effort/a best effort/

And a similar rephrasing as above would make sense here, too.

Thanks - fixed the "a" best effort thing in 94f88ce

Updated the cluster setting language in 253660f


v2.0/stop-a-node.md, line 21 at r2 (raw file):

Previously, jseldess wrote…

nit: Use only one space after the first period. I know you prefer 2, @rmloveland, but convention in our docs is 1.

Oops, sorry - fixed in 94f88ce


Comments from Reviewable

@rmloveland
Copy link
Contributor Author

rmloveland commented Mar 14, 2018

Hey @a-robinson, at @jseldess' recommendation I'm going to try to add the 1.1.{5,6} changes to this PR as well. Do you mind answering some additional questions? I'm a lot less clear on these versions, as you will see from the below. :-)

I have some text below (which I also pushed in 64e718f so you can more easily comment) that I prepared in person with Alfonso on FixitDay before he left for holiday. Each bullet is followed by the version in which Alfonso said the changes occurred.

I made it a numbered list here so I can ask related questions for each item below.

First, the draft text from 64e718f:

1. Finishes in-flight requests. Note that this is a best effort that times out after the duration specified by the `???` cluster setting (1.1.5) 2. Transfers all *range leases* and Raft leadership to other nodes. (1.1.6) 3. Gossips its draining state to the cluster so that no leases are transferred to the draining node. Note that this is a best effort that times out after the duration specified by the `???` cluster setting, so other nodes may not receive the gossip info in time. (1.1.6) 4. No new ranges are transferred to the draining node, to avoid a possible loss of quorum after the node shuts down. (1.1.5)

My questions:

  1. Is there a cluster setting for this in 1.1.6? I'm not seeing one (at least with the server.* prefix) in the cluster settings SQL output below. If no setting, is there a default we can document?
  2. Is this accurate?
  3. Same as question 1 re: is there a cluster setting?
  4. Is this accurate?

The binary I'm running to look up these cluster settings is v1.1.6:

build:      CCL v1.1.6 @ 2018/03/12 17:55:09 (go1.8.3)

Cluster settings don't seem to allow for control of the features in items 1 and 3 above, but I'm not very familiar:

=> SHOW ALL CLUSTER SETTINGS;

...
| server.consistency_check.interval   | 24h0m0s  | d | the time between range consistency checks; set to 0 to disable consistency checking                    |
| server.declined_reservation_timeout | 1s       | d | the amount of time to consider the store throttled for up-replication after a reservation was declined |
| server.failed_reservation_timeout   | 5s       | d | the amount of time to consider the store throttled for up-replication after a failed reservation call  |
| server.remote_debugging.mode        | local    | s | set to enable remote debugging, localhost-only or disable (any, local, off)                            |
| server.time_until_store_dead        | 5m0s     | d | the time after which if there is no new gossiped information about a store, it is considered dead      |
| server.web_session_timeout          | 168h0m0s | d | the duration that a newly created web session will be valid                                            |
...

@rmloveland rmloveland force-pushed the fixitday-node-draining-update branch from 4f5db79 to 64e718f Compare March 14, 2018 15:19
@@ -14,6 +14,11 @@ For information about permanently removing nodes to downsize a cluster or react

### How It Works

- Finishes in-flight requests. Note that this is a best effort that times out after the duration specified by the `???` cluster setting (1.1.5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This setting is not in v1.1. The server just cancels all current sessions without waiting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, updated to say that in 14139e9

@@ -14,6 +14,11 @@ For information about permanently removing nodes to downsize a cluster or react

### How It Works

- Finishes in-flight requests. Note that this is a best effort that times out after the duration specified by the `???` cluster setting (1.1.5)
- Transfers all *range leases* and Raft leadership to other nodes. (1.1.6)
- Gossips its draining state to the cluster so that no leases are transferred to the draining node. Note that this is a best effort that times out after the duration specified by the `???` cluster setting, so other nodes may not receive the gossip info in time. (1.1.6)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still happens as of 1.1.6. In 1.1.5 and earlier, this part was broken.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for confirming. Do you know if there is a cluster setting for this in 1.1.6? None of the documented cluster settings for 1.1.6 look to be the one. And it wasn't clear from a quick SHOW ALL CLUSTER SETTINGS on a 1.1.6 binary (though I may have missed it).

@@ -14,6 +14,11 @@ For information about permanently removing nodes to downsize a cluster or react

### How It Works

- Finishes in-flight requests. Note that this is a best effort that times out after the duration specified by the `???` cluster setting (1.1.5)
- Transfers all *range leases* and Raft leadership to other nodes. (1.1.6)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still happens as of 1.1.6. In 1.1.5 and earlier, these parts didn't always happen as intended.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks! Since the stable web docs show 1.1.6 now, I'll leave this in (minus the version number).

- Finishes in-flight requests. Note that this is a best effort that times out after the duration specified by the `???` cluster setting (1.1.5)
- Transfers all *range leases* and Raft leadership to other nodes. (1.1.6)
- Gossips its draining state to the cluster so that no leases are transferred to the draining node. Note that this is a best effort that times out after the duration specified by the `???` cluster setting, so other nodes may not receive the gossip info in time. (1.1.6)
- No new ranges are transferred to the draining node, to avoid a possible loss of quorum after the node shuts down. (1.1.5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true for all 1.1.x versions, as far as I'm aware.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Again, leaving in but removing the version number.

@a-robinson
Copy link
Contributor

:lgtm_strong:


Reviewed 1 of 1 files at r5, 1 of 1 files at r8, 1 of 1 files at r13.
Review status: all files reviewed at latest revision, 5 unresolved discussions, all commit checks successful.


v1.1/stop-a-node.md, line 19 at r9 (raw file):

Previously, rmloveland (Rich Loveland) wrote…

Thanks for confirming. Do you know if there is a cluster setting for this in 1.1.6? None of the documented cluster settings for 1.1.6 look to be the one. And it wasn't clear from a quick SHOW ALL CLUSTER SETTINGS on a 1.1.6 binary (though I may have missed it).

Nope, no cluster setting in any of the v1.1 releases. It'd have to be something pretty critical for us to introduce/backport a new cluster setting in a patch release.


Comments from Reviewable

@rmloveland
Copy link
Contributor Author

@jseldess do you want a final editorial look at the v1.1 changes before merge?

Copy link
Contributor

@jseldess jseldess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with one nit.

@@ -14,7 +14,12 @@ For information about permanently removing nodes to downsize a cluster or react

### How It Works

When you stop a node, CockroachDB lets the node finish in-flight requests and transfers all **range leases** off the node before shutting it down. If the node then stays offline for a certain amount of time (5 minutes by default), the cluster considers the node dead and starts to transfer its **range replicas** to other nodes as well.
- Cancels all current sessions without waiting.
- Transfers all *range leases* and Raft leadership to other nodes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: bold instead of italics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed in bb14e1b

@rmloveland rmloveland merged commit 9d34ae6 into master Mar 19, 2018
@rmloveland rmloveland deleted the fixitday-node-draining-update branch March 19, 2018 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants