Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update 'Stop a Node' with more draining info #2671

Merged
merged 13 commits into from
Mar 19, 2018
52 changes: 52 additions & 0 deletions _includes/settings/v2.0/settings.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
| SETTING | TYPE | DEFAULT | DESCRIPTION |
|-----------------------------------------------------|-------------------|------------|-------------------------------------------------------------------------------------------------------------------------------------------------|
| `cloudstorage.gs.default.key` | string | `` | if set, JSON key to use during Google Cloud Storage operations |
| `cloudstorage.http.custom_ca` | string | `` | custom root CA (appended to system's default CAs) for verifying certificates when interacting with HTTPS storage |
| `cluster.organization` | string | `` | organization name |
| `debug.panic_on_failed_assertions` | boolean | `false` | panic when an assertion fails rather than reporting |
| `diagnostics.reporting.enabled` | boolean | `true` | enable reporting diagnostic metrics to cockroach labs |
| `diagnostics.reporting.interval` | duration | `1h0m0s` | interval at which diagnostics data should be reported |
| `diagnostics.reporting.send_crash_reports` | boolean | `true` | send crash and panic reports |
| `kv.allocator.lease_rebalancing_aggressiveness` | float | `1E+00` | set greater than 1.0 to rebalance leases toward load more aggressively, or between 0 and 1.0 to be more conservative about rebalancing leases |
| `kv.allocator.load_based_lease_rebalancing.enabled` | boolean | `true` | set to enable rebalancing of range leases based on load and latency |
| `kv.allocator.range_rebalance_threshold` | float | `5E-02` | minimum fraction away from the mean a store's range count can be before it is considered overfull or underfull |
| `kv.allocator.stat_based_rebalancing.enabled` | boolean | `false` | set to enable rebalancing of range replicas based on write load and disk usage |
| `kv.allocator.stat_rebalance_threshold` | float | `2E-01` | minimum fraction away from the mean a store's stats (like disk usage or writes per second) can be before it is considered overfull or underfull |
| `kv.bulk_io_write.max_rate` | byte size | `8.0 EiB` | the rate limit (bytes/sec) to use for writes to disk on behalf of bulk io ops |
| `kv.bulk_sst.sync_size` | byte size | `2.0 MiB` | threshold after which non-Rocks SST writes must fsync (0 disables) |
| `kv.raft.command.max_size` | byte size | `64 MiB` | maximum size of a raft command |
| `kv.raft_log.synchronize` | boolean | `true` | set to true to synchronize on Raft log writes to persistent storage |
| `kv.range.backpressure_range_size_multiplier` | float | `2E+00` | multiple of range_max_bytes that a range is allowed to grow to without splitting before writes to that range are blocked, or 0 to disable |
| `kv.range_descriptor_cache.size` | integer | `1000000` | maximum number of entries in the range descriptor and leaseholder caches |
| `kv.snapshot_rebalance.max_rate` | byte size | `2.0 MiB` | the rate limit (bytes/sec) to use for rebalance snapshots |
| `kv.snapshot_recovery.max_rate` | byte size | `8.0 MiB` | the rate limit (bytes/sec) to use for recovery snapshots |
| `kv.transaction.max_intents_bytes` | integer | `256000` | maximum number of bytes used to track write intents in transactions |
| `kv.transaction.max_refresh_spans_bytes` | integer | `256000` | maximum number of bytes used to track refresh spans in serializable transactions |
| `rocksdb.min_wal_sync_interval` | duration | `0s` | minimum duration between syncs of the RocksDB WAL |
| `server.consistency_check.interval` | duration | `24h0m0s` | the time between range consistency checks; set to 0 to disable consistency checking |
| `server.declined_reservation_timeout` | duration | `1s` | the amount of time to consider the store throttled for up-replication after a reservation was declined |
| `server.failed_reservation_timeout` | duration | `5s` | the amount of time to consider the store throttled for up-replication after a failed reservation call |
| `server.remote_debugging.mode` | string | `local` | set to enable remote debugging, localhost-only or disable (any, local, off) |
| `server.shutdown.drain_wait` | duration | `0s` | the amount of time a server waits in an unready state before proceeding with the rest of the shutdown process |
| `server.shutdown.query_wait` | duration | `10s` | the server will wait for at least this amount of time for active queries to finish |
| `server.time_until_store_dead` | duration | `5m0s` | the time after which if there is no new gossiped information about a store, it is considered dead |
| `server.web_session_timeout` | duration | `168h0m0s` | the duration that a newly created web session will be valid |
| `sql.defaults.distsql` | enumeration | `1` | Default distributed SQL execution mode [off = 0, auto = 1, on = 2] |
| `sql.distsql.distribute_index_joins` | boolean | `true` | if set, for index joins we instantiate a join reader on every node that has a stream; if not set, we use a single join reader |
| `sql.distsql.interleaved_joins.enabled` | boolean | `true` | if set we plan interleaved table joins instead of merge joins when possible |
| `sql.distsql.merge_joins.enabled` | boolean | `true` | if set, we plan merge joins when possible |
| `sql.distsql.temp_storage.joins` | boolean | `true` | set to true to enable use of disk for distributed sql joins |
| `sql.distsql.temp_storage.sorts` | boolean | `true` | set to true to enable use of disk for distributed sql sorts |
| `sql.distsql.temp_storage.workmem` | byte size | `64 MiB` | maximum amount of memory in bytes a processor can use before falling back to temp storage |
| `sql.metrics.statement_details.dump_to_logs` | boolean | `false` | dump collected statement statistics to node logs when periodically cleared |
| `sql.metrics.statement_details.enabled` | boolean | `true` | collect per-statement query statistics |
| `sql.metrics.statement_details.threshold` | duration | `0s` | minimum execution time to cause statistics to be collected |
| `sql.trace.log_statement_execute` | boolean | `false` | set to true to enable logging of executed statements |
| `sql.trace.session_eventlog.enabled` | boolean | `false` | set to true to enable session tracing |
| `sql.trace.txn.enable_threshold` | duration | `0s` | duration beyond which all transactions are traced (set to 0 to disable) |
| `timeseries.resolution_10s.storage_duration` | duration | `720h0m0s` | the amount of time to store timeseries data |
| `timeseries.storage.enabled` | boolean | `true` | if set, periodic timeseries data is stored within the cluster; disabling is not recommended unless you are storing the data elsewhere |
| `trace.debug.enable` | boolean | `false` | if set, traces for recent requests can be seen in the /debug page |
| `trace.lightstep.token` | string | `` | if set, traces go to Lightstep using this token |
| `trace.zipkin.collector` | string | `` | if set, traces go to the given Zipkin instance (example: '127.0.0.1:9411'); ignored if trace.lightstep.token is set. |
| `version` | custom validation | `2.0` | set the active cluster version in the format '<major>.<minor>'. |
7 changes: 6 additions & 1 deletion v1.1/stop-a-node.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,12 @@ For information about permanently removing nodes to downsize a cluster or react

### How It Works

When you stop a node, CockroachDB lets the node finish in-flight requests and transfers all **range leases** off the node before shutting it down. If the node then stays offline for a certain amount of time (5 minutes by default), the cluster considers the node dead and starts to transfer its **range replicas** to other nodes as well.
- Cancels all current sessions without waiting.
- Transfers all **range leases** and Raft leadership to other nodes.
- Gossips its draining state to the cluster so that no leases are transferred to the draining node. Note that this is a best effort, so other nodes may not receive the gossip info in time.
- No new ranges are transferred to the draining node, to avoid a possible loss of quorum after the node shuts down.

If the node then stays offline for a certain amount of time (5 minutes by default), the cluster considers the node dead and starts to transfer its **range replicas** to other nodes as well.

After that, if the node comes back online, its range replicas will determine whether or not they are still valid members of replica groups. If a range replica is still valid and any data in its range has changed, it will receive updates from another replica in the group. If a range replica is no longer valid, it will be removed from the node.

Expand Down
31 changes: 1 addition & 30 deletions v2.0/cluster-settings.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,36 +20,7 @@ They can be updated anytime after a cluster has been started, but only by the `r

{{site.data.alerts.callout_danger}}Many cluster settings are intended for tuning CockroachDB internals. Before changing these settings, we strongly encourage you to discuss your goals with Cockroach Labs; otherwise, you use them at your own risk.{{site.data.alerts.end}}

The following settings can be configured without further input from Cockroach Labs:

| Setting | Description | Value type | Default value |
|---------|-------------|---------------|---------------|
| `diagnostics.reporting.enabled` | Enable automatic reporting of usage data to Cockroach Labs. | Boolean | `true` |
| `diagnostics.reporting.interval` | Interval between automatic reports. **Note that increasing this value will also cause memory usage per node to increase, as the reporting data is collected into RAM.** | Interval | 1 hour |
| `diagnostics.reporting.report_metrics` | Enable collection and reporting of diagnostic metrics. Only applicable if `diagnostics.reporting.enabled` is `true`. | Boolean | `true` |
| `diagnostics.reporting.send_crash_reports` | Enable collection and reporting of node crashes. Only applicable if `diagnostics.reporting.enabled` is `true`. | Boolean | `true` |
| `sql.defaults.distsql` | Define whether new client sessions try to [distribute query execution](https://www.cockroachlabs.com/blog/local-and-distributed-processing-in-cockroachdb/) by default. | Integer | 1 (automatic) |
| `sql.metrics.statement_details.enabled` | Collect per-node, per-statement query statistics, visible in the virtual table `crdb_internal.node_statement_statistics`. | Boolean | `true` |
| `sql.metrics.statement_details.dump_to_logs` | On each node, also copy collected per-statement statistics to the [logging output](debug-and-error-logs.html) when automatic reporting is enabled. | Boolean | `false` |
| `sql.metrics.statement_details.threshold` | Only collect per-statement statistics for statements that run longer than this threshold. | Interval | 0 seconds (all statements) |
| `sql.trace.log_statement_execute` | On each node, copy all executed statements to the [logging output](debug-and-error-logs.html). | Boolean | `false` |

<!-- Add this section back in once `system.settings` has been fleshed out.

## Settings

types:

settings-registry.go

s = string
b = boolean
i = int
f = float
d = duration
z = byte-size (can set them with set cluster setting = 32 MiB)

-->
{% include settings/v2.0/settings.md %}

## View Current Cluster Settings

Expand Down
9 changes: 8 additions & 1 deletion v2.0/stop-a-node.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,14 @@ For information about permanently removing nodes to downsize a cluster or react

### How It Works

When you stop a node, CockroachDB lets the node finish in-flight requests and transfers all **range leases** off the node before shutting it down. If the node then stays offline for a certain amount of time (5 minutes by default), the cluster considers the node dead and starts to transfer its **range replicas** to other nodes as well.
When you stop a node, it performs the following steps:

- Finishes in-flight requests. Note that this is a best effort that times out after the duration specified by the `server.shutdown.query_wait` [cluster setting](cluster-settings.html).
- Transfers all **range leases** and Raft leadership to other nodes.
- Gossips its draining state to the cluster, so that other nodes do not try to distribute query planning to the draining node, and no leases are transferred to the draining node. Note that this is a best effort that times out after the duration specified by the `server.shutdown.drain_wait` [cluster setting](cluster-settings.html), so other nodes may not receive the gossip info in time.
- No new ranges are transferred to the draining node, to avoid a possible loss of quorum after the node shuts down.

If the node then stays offline for a certain amount of time (5 minutes by default), the cluster considers the node dead and starts to transfer its **range replicas** to other nodes as well.

After that, if the node comes back online, its range replicas will determine whether or not they are still valid members of replica groups. If a range replica is still valid and any data in its range has changed, it will receive updates from another replica in the group. If a range replica is no longer valid, it will be removed from the node.

Expand Down