diff --git a/_includes/v20.1/prod-deployment/node-shutdown.md b/_includes/v20.1/prod-deployment/node-shutdown.md index e3c9f03bd33..9e00d5b8f19 100644 --- a/_includes/v20.1/prod-deployment/node-shutdown.md +++ b/_includes/v20.1/prod-deployment/node-shutdown.md @@ -1,3 +1,8 @@ -- If the node was started with a process manager like [systemd](https://www.freedesktop.org/wiki/Software/systemd/), stop the node using the process manager. The process manager should be configured to send `SIGTERM` and then, after about 1 minute, `SIGKILL`. -- If the node was started using [`cockroach start`](cockroach-start.html) and is running in the foreground, press `ctrl-c` in the terminal. -- If the node was started using [`cockroach start`](cockroach-start.html) and the `--background` and `--pid-file` flags, run `kill `, where `` is the process ID of the node. \ No newline at end of file + \ No newline at end of file diff --git a/_includes/v20.2/prod-deployment/node-shutdown.md b/_includes/v20.2/prod-deployment/node-shutdown.md index e3c9f03bd33..9e00d5b8f19 100644 --- a/_includes/v20.2/prod-deployment/node-shutdown.md +++ b/_includes/v20.2/prod-deployment/node-shutdown.md @@ -1,3 +1,8 @@ -- If the node was started with a process manager like [systemd](https://www.freedesktop.org/wiki/Software/systemd/), stop the node using the process manager. The process manager should be configured to send `SIGTERM` and then, after about 1 minute, `SIGKILL`. -- If the node was started using [`cockroach start`](cockroach-start.html) and is running in the foreground, press `ctrl-c` in the terminal. -- If the node was started using [`cockroach start`](cockroach-start.html) and the `--background` and `--pid-file` flags, run `kill `, where `` is the process ID of the node. \ No newline at end of file +
    +
  • If the node was started with a process manager, gracefully stop the node by sending SIGTERM with the process manager. If the node is not shutting down after 1 minute, send SIGKILL to terminate the process. When using systemd, for example, set TimeoutStopSecs=60 in your configuration template and run systemctl stop <systemd config filename> to stop the node without systemd restarting it.
  • +
    Note:
    +

    The amount of time you should wait before sending SIGKILL can vary depending on your cluster configuration and workload, which affects how long it takes your nodes to complete a graceful shutdown. In certain edge cases, forcefully terminating the process before the node has completed shutdown can result in temporary data unavailability, latency spikes, uncertainty errors, ambiguous commit errors, or query timeouts. If you need maximum cluster availability, you can run cockroach node drain prior to node shutdown and actively monitor the draining process instead of automating it.

    +
    +
  • If the node was started using cockroach start and is running in the foreground, press ctrl-c in the terminal.
  • +
  • If the node was started using cockroach start and the --background and --pid-file flags, run kill <pid>, where <pid> is the process ID of the node.
  • +
\ No newline at end of file diff --git a/v19.1/remove-nodes.md b/v19.1/remove-nodes.md index d53dfba2e21..ebdf934ce63 100644 --- a/v19.1/remove-nodes.md +++ b/v19.1/remove-nodes.md @@ -23,7 +23,7 @@ A node is considered to be decommissioned when it meets two criteria: The decommissioning process transfers all range replicas on the node to other nodes. During and after this process, the node is considered "decommissioning" and continues to accept new SQL connections. Even without replicas, the node can still function as a gateway to route connections to relevant data. However, note that the [`/health?ready=1` monitoring endpoint](monitoring-and-alerting.html#health-ready-1) considers the node "unready" and returns a `503 Service Unavailable` status response code so load balancers stop directing traffic to the node. In v20.1, the health endpoint correctly considers the node "ready". -After all range replicas have been transferred, it's typical to use [`cockroach node drain`](view-node-details.html) to drain the node of SQL clients and [distributed SQL](architecture/sql-layer.html#distsql) queries. The node can then be stopped via a process manager or orchestration tool, or by sending `SIGTERM` manually. You can also use [`cockroach quit`](stop-a-node.html) to drain and shut down the node. When stopped, the node stops updating its liveness record, and after the duration configured via [`server.time_until_store_dead`](cluster-settings.html) is considered to be decommissioned. +After all range replicas have been transferred, the node can be drained of SQL clients, [distributed SQL](architecture/sql-layer.html#distsql) queries, and range leases, and then stopped. This can be done with a process manager or orchestration tool, or by sending `SIGTERM` manually. You can also use [`cockroach quit`](stop-a-node.html) to drain and shut down the node. When stopped, the node stops updating its liveness record, and after the duration configured via [`server.time_until_store_dead`](cluster-settings.html) is considered to be decommissioned. You can [check the status of node decommissioning](#check-the-status-of-decommissioning-nodes) with the CLI. diff --git a/v19.1/view-node-details.md b/v19.1/view-node-details.md index ae8cba2b02f..d7083b9b1cf 100644 --- a/v19.1/view-node-details.md +++ b/v19.1/view-node-details.md @@ -16,7 +16,7 @@ Subcommand | Usage `status` | View the status of one or all nodes, excluding nodes that have been decommissioned and taken offline. Depending on flags used, this can include details about range/replicas, disk usage, and decommissioning progress. `decommission` | Decommission nodes for removal from the cluster. See [Decommission Nodes](remove-nodes.html) for more details. `recommission` | Recommission nodes that have been decommissioned. See [Recommission Nodes](remove-nodes.html#recommission-nodes) for more details. -`drain` | Drain nodes of SQL clients and [distributed SQL](architecture/sql-layer.html#distsql) queries, and prevent ranges from rebalancing onto the node. This is usually done prior to [stopping the node](stop-a-node.html). +`drain` | Drain nodes of SQL clients, [distributed SQL](architecture/sql-layer.html#distsql) queries, and range leases, and prevent ranges from rebalancing onto the node. This is normally done during [node shutdown](stop-a-node.html), but the `drain` subcommand provides operators an option to interactively monitor, and if necessary intervene in, the draining process. ## Synopsis diff --git a/v19.2/cockroach-node.md b/v19.2/cockroach-node.md index a23d412387b..cf48ad9847f 100644 --- a/v19.2/cockroach-node.md +++ b/v19.2/cockroach-node.md @@ -18,7 +18,7 @@ Subcommand | Usage `status` | View the status of one or all nodes, excluding nodes that have been decommissioned and taken offline. Depending on flags used, this can include details about range/replicas, disk usage, and decommissioning progress. `decommission` | Decommission nodes for removal from the cluster. See [Decommission Nodes](remove-nodes.html) for more details. `recommission` | Recommission nodes that have been decommissioned. See [Recommission Nodes](remove-nodes.html#recommission-nodes) for more details. -`drain` | Drain nodes of SQL clients and [distributed SQL](architecture/sql-layer.html#distsql) queries, and prevent ranges from rebalancing onto the node. This is usually done prior to [stopping the node](cockroach-quit.html). +`drain` | Drain nodes of SQL clients, [distributed SQL](architecture/sql-layer.html#distsql) queries, and range leases, and prevent ranges from rebalancing onto the node. This is normally done during [node shutdown](cockroach-quit.html), but the `drain` subcommand provides operators an option to interactively monitor, and if necessary intervene in, the draining process. ## Synopsis diff --git a/v19.2/remove-nodes.md b/v19.2/remove-nodes.md index e8659fc0650..3bca2253636 100644 --- a/v19.2/remove-nodes.md +++ b/v19.2/remove-nodes.md @@ -21,9 +21,9 @@ A node is considered to be decommissioned when it meets two criteria: 1. The node has completed the decommissioning process. 2. The node has been stopped and has not [updated its liveness record](architecture/replication-layer.html#epoch-based-leases-table-data) for the duration configured via [`server.time_until_store_dead`](cluster-settings.html), which defaults to 5 minutes. -The decommissioning process transfers all range replicas on the node to other nodes. During and after this process, the node is considered "decommissioning" and continues to accept new SQL connections. Even without replicas, the node can still function as a gateway to route connections to relevant data. However, note that the [`/health?ready=1` monitoring endpoint](monitoring-and-alerting.html#health-ready-1) considers the node "unready" and returns a `503 Service Unavailable` status response code so load balancers stop directing traffic to the node. In v20.1, the health endpoint correctly considers the node "ready". +The decommissioning process transfers all range replicas on the node to other nodes. During and after this process, the node is considered "decommissioning" and continues to accept new SQL connections. Even without replicas, the node can still function as a gateway to route connections to relevant data. However, note that the [`/health?ready=1` monitoring endpoint](monitoring-and-alerting.html#health-ready-1) considers the node "unready" and returns a `503 Service Unavailable` status response code so load balancers stop directing traffic to the node. In v20.1, the health endpoint correctly considers the node "ready" during decommissioning. -After all range replicas have been transferred, it's typical to use [`cockroach node drain`](cockroach-node.html) to drain the node of SQL clients and [distributed SQL](architecture/sql-layer.html#distsql) queries. The node can then be stopped via a process manager or orchestration tool, or by sending `SIGTERM` manually. You can also use [`cockroach quit`](cockroach-quit.html) to drain and shut down the node. When stopped, the node stops updating its liveness record, and after the duration configured via [`server.time_until_store_dead`](cluster-settings.html) is considered to be decommissioned. +After all range replicas have been transferred, a graceful shutdown is initiated by sending `SIGTERM` or running [`cockroach quit`](cockroach-quit.html), during which the node is drained of SQL clients, [distributed SQL](architecture/sql-layer.html#distsql) queries, and range leases. Once draining completes and the process is terminated, the node stops updating its liveness record, and after the duration configured via [`server.time_until_store_dead`](cluster-settings.html) is considered to be decommissioned. You can [check the status of node decommissioning](#check-the-status-of-decommissioning-nodes) with the CLI. diff --git a/v20.1/cockroach-node.md b/v20.1/cockroach-node.md index eaa3fa28a88..fcfa2072292 100644 --- a/v20.1/cockroach-node.md +++ b/v20.1/cockroach-node.md @@ -18,7 +18,7 @@ Subcommand | Usage `status` | View the status of one or all nodes, excluding nodes that have been decommissioned and taken offline. Depending on flags used, this can include details about range/replicas, disk usage, and decommissioning progress. `decommission` | Decommission nodes for removal from the cluster. See [Decommission Nodes](remove-nodes.html) for more details. `recommission` | Recommission nodes that have been decommissioned. See [Recommission Nodes](remove-nodes.html#recommission-nodes) for more details. -`drain` | Drain nodes of SQL clients and [distributed SQL](architecture/sql-layer.html#distsql) queries, and prevent ranges from rebalancing onto the node. This is usually done prior to [stopping the node](cockroach-quit.html). +`drain` | Drain nodes of SQL clients, [distributed SQL](architecture/sql-layer.html#distsql) queries, and range leases, and prevent ranges from rebalancing onto the node. This is normally done by sending `SIGTERM` during [node shutdown](cockroach-quit.html), but the `drain` subcommand provides operators an option to interactively monitor, and if necessary intervene in, the draining process. ## Synopsis diff --git a/v20.1/cockroach-quit.md b/v20.1/cockroach-quit.md index 3a0e1f065ec..36e08452747 100644 --- a/v20.1/cockroach-quit.md +++ b/v20.1/cockroach-quit.md @@ -7,7 +7,7 @@ key: stop-a-node.html --- {{site.data.alerts.callout_danger}} -`cockroach quit` is no longer recommended, and will be deprecated in v20.2. To stop a node, it's best to first run [`cockroach node drain`](cockroach-node.html) and then do one of the following: +`cockroach quit` is no longer recommended, and will be deprecated in v20.2. To stop a node, do one of the following: {% include {{ page.version.version }}/prod-deployment/node-shutdown.md %} {{site.data.alerts.end}} diff --git a/v20.1/common-errors.md b/v20.1/common-errors.md index aef397800a1..7f009115a0b 100644 --- a/v20.1/common-errors.md +++ b/v20.1/common-errors.md @@ -31,10 +31,9 @@ To resolve this issue, do one of the following: If you're not sure what the IP address/hostname and port values might have been, you can look in the node's [logs](debug-and-error-logs.html). If necessary, you can also end the `cockroach` process, and then restart the node: -{% include copy-clipboard.html %} -~~~ shell -$ pkill cockroach -~~~ +{% include {{ page.version.version }}/prod-deployment/node-shutdown.md %} + +Then restart the node: {% include copy-clipboard.html %} ~~~ shell diff --git a/v20.1/remove-nodes.md b/v20.1/remove-nodes.md index 6a9ebcf58cb..75e2060765d 100644 --- a/v20.1/remove-nodes.md +++ b/v20.1/remove-nodes.md @@ -23,7 +23,7 @@ A node is considered to be decommissioned when it meets two criteria: The decommissioning process transfers all range replicas on the node to other nodes. During and after this process, the node is considered "decommissioning" and continues to accept new SQL connections. Even without replicas, the node can still function as a gateway to route connections to relevant data. For this reason, the [`/health?ready=1` monitoring endpoint](monitoring-and-alerting.html#health-ready-1) continues to consider the node "ready" so load balancers can continue directing traffic to the node. -After all range replicas have been transferred, it's typical to drain the node of SQL clients and [distributed SQL](architecture/sql-layer.html#distsql) queries. The node can then be stopped via a process manager or orchestration tool, or by sending `SIGTERM` manually. When stopped, the [`/health?ready=1` monitoring endpoint](monitoring-and-alerting.html#health-ready-1) starts returning a `503 Service Unavailable` status response code so that load balancers stop directing traffic to the node. At this point the node stops updating its liveness record, and after the duration configured via [`server.time_until_store_dead`](cluster-settings.html) is considered to be decommissioned. +After all range replicas have been transferred, a graceful shutdown is initiated by sending `SIGTERM`, during which the node is drained of SQL clients, [distributed SQL](architecture/sql-layer.html#distsql) queries, and range leases. Meanwhile, the [`/health?ready=1` monitoring endpoint](monitoring-and-alerting.html#health-ready-1) starts returning a `503 Service Unavailable` status response code so that load balancers stop directing traffic to the node. Once draining completes and the process is terminated, the node stops updating its liveness record, and after the duration configured via [`server.time_until_store_dead`](cluster-settings.html) is considered to be decommissioned. You can [check the status of node decommissioning](#check-the-status-of-decommissioning-nodes) with the CLI. @@ -160,33 +160,7 @@ Even with zero replicas on a node, its [status](admin-ui-cluster-overview-page.h ### Step 5. Stop the decommissioning node -A node should be drained of SQL clients and [distributed SQL](architecture/sql-layer.html#distsql) queries before being shut down. - -Run the [`cockroach node drain`](cockroach-node.html) command with the address of the node to drain: - -
-{% include copy-clipboard.html %} -~~~ shell -cockroach node drain --certs-dir=certs --host=
-~~~ -
- -
-{% include copy-clipboard.html %} -~~~ shell -cockroach node drain --insecure --host=
-~~~ -
- -Once the node has been drained, you'll see a confirmation: - -~~~ -node is draining... remaining: 1 -node is draining... remaining: 0 (complete) -ok -~~~ - -Stop the node using one of the following methods: +Drain and stop the node using one of the following methods: {% include {{ page.version.version }}/prod-deployment/node-shutdown.md %} @@ -339,33 +313,7 @@ Even with zero replicas on a node, its [status](admin-ui-cluster-overview-page.h ### Step 5. Stop the decommissioning nodes -Nodes should be drained of SQL clients and [distributed SQL](architecture/sql-layer.html#distsql) queries before being shut down. - -For each node, run the [`cockroach node drain`](cockroach-node.html) command with the address of the node to drain: - -
-{% include copy-clipboard.html %} -~~~ shell -cockroach node drain --certs-dir=certs --host=
-~~~ -
- -
-{% include copy-clipboard.html %} -~~~ shell -cockroach node drain --insecure --host=
-~~~ -
- -Once each node has been drained, you'll see a confirmation: - -~~~ -node is draining... remaining: 1 -node is draining... remaining: 0 (complete) -ok -~~~ - -Stop each node using one of the following methods: +Drain and stop each node using one of the following methods: {% include {{ page.version.version }}/prod-deployment/node-shutdown.md %} diff --git a/v20.1/upgrade-cockroach-version.md b/v20.1/upgrade-cockroach-version.md index 50f64c9b340..556b3e123ab 100644 --- a/v20.1/upgrade-cockroach-version.md +++ b/v20.1/upgrade-cockroach-version.md @@ -121,25 +121,11 @@ Note that this behavior is specific to upgrades from v19.2 to v20.1; it does not We recommend creating scripts to perform these steps instead of performing them manually. Also, if you are running CockroachDB on Kubernetes, see our documentation on [single-cluster](orchestrate-cockroachdb-with-kubernetes.html#upgrade-the-cluster) and/or [multi-cluster](orchestrate-cockroachdb-with-kubernetes-multi-cluster.html#upgrade-the-cluster) orchestrated deployments for upgrade guidance instead. {{site.data.alerts.end}} -1. Connect to the node. - -2. Stop the `cockroach` process. - - Without a process manager like `systemd`, use this command: - - {% include copy-clipboard.html %} - ~~~ shell - $ pkill cockroach - ~~~ - - If you are using `systemd` as the process manager, use this command to stop a node without `systemd` restarting it: - - {% include copy-clipboard.html %} - ~~~ shell - $ systemctl stop - ~~~ - - Then verify that the process has stopped: +1. Drain and stop the node using one of the following methods: + + {% include {{ page.version.version }}/prod-deployment/node-shutdown.md %} + + Verify that the process has stopped: {% include copy-clipboard.html %} ~~~ shell @@ -148,7 +134,7 @@ We recommend creating scripts to perform these steps instead of performing them Alternately, you can check the node's logs for the message `server drained and shutdown completed`. -3. Download and install the CockroachDB binary you want to use: +1. Download and install the CockroachDB binary you want to use:
@@ -180,7 +166,7 @@ We recommend creating scripts to perform these steps instead of performing them ~~~
-4. If you use `cockroach` in your `$PATH`, rename the outdated `cockroach` binary, and then move the new one into its place: +1. If you use `cockroach` in your `$PATH`, rename the outdated `cockroach` binary, and then move the new one into its place:
@@ -212,7 +198,7 @@ We recommend creating scripts to perform these steps instead of performing them ~~~
-5. Start the node to have it rejoin the cluster. +1. Start the node to have it rejoin the cluster. Without a process manager like `systemd`, re-run the [`cockroach start`](cockroach-start.html) command that you used to start the node initially, for example: @@ -231,13 +217,13 @@ We recommend creating scripts to perform these steps instead of performing them $ systemctl start ~~~ -6. Verify the node has rejoined the cluster through its output to `stdout` or through the [Admin UI](admin-ui-overview.html). +1. Verify the node has rejoined the cluster through its output to `stdout` or through the [Admin UI](admin-ui-overview.html). {{site.data.alerts.callout_info}} To access the Admin UI for a secure cluster, [create a user with a password](create-user.html#create-a-user-with-a-password). Then open a browser and go to `https://:8080`. On accessing the Admin UI, you will see a Login screen, where you will need to enter your username and password. {{site.data.alerts.end}} -7. If you use `cockroach` in your `$PATH`, you can remove the old binary: +1. If you use `cockroach` in your `$PATH`, you can remove the old binary: {% include copy-clipboard.html %} ~~~ shell @@ -246,7 +232,7 @@ We recommend creating scripts to perform these steps instead of performing them If you leave versioned binaries on your servers, you do not need to do anything. -8. Wait at least one minute after the node has rejoined the cluster, and then repeat these steps for the next node. +1. Wait at least one minute after the node has rejoined the cluster, and then repeat these steps for the next node. ## Step 5. Finish the upgrade diff --git a/v20.2/cockroach-node.md b/v20.2/cockroach-node.md index eaa3fa28a88..fcfa2072292 100644 --- a/v20.2/cockroach-node.md +++ b/v20.2/cockroach-node.md @@ -18,7 +18,7 @@ Subcommand | Usage `status` | View the status of one or all nodes, excluding nodes that have been decommissioned and taken offline. Depending on flags used, this can include details about range/replicas, disk usage, and decommissioning progress. `decommission` | Decommission nodes for removal from the cluster. See [Decommission Nodes](remove-nodes.html) for more details. `recommission` | Recommission nodes that have been decommissioned. See [Recommission Nodes](remove-nodes.html#recommission-nodes) for more details. -`drain` | Drain nodes of SQL clients and [distributed SQL](architecture/sql-layer.html#distsql) queries, and prevent ranges from rebalancing onto the node. This is usually done prior to [stopping the node](cockroach-quit.html). +`drain` | Drain nodes of SQL clients, [distributed SQL](architecture/sql-layer.html#distsql) queries, and range leases, and prevent ranges from rebalancing onto the node. This is normally done by sending `SIGTERM` during [node shutdown](cockroach-quit.html), but the `drain` subcommand provides operators an option to interactively monitor, and if necessary intervene in, the draining process. ## Synopsis diff --git a/v20.2/cockroach-quit.md b/v20.2/cockroach-quit.md index 756bd5edb43..70cb577dd76 100644 --- a/v20.2/cockroach-quit.md +++ b/v20.2/cockroach-quit.md @@ -7,7 +7,7 @@ key: stop-a-node.html --- {{site.data.alerts.callout_danger}} -`cockroach quit` is deprecated. To stop a node, it's best to first run [`cockroach node drain`](cockroach-node.html) and then do one of the following: +`cockroach quit` is deprecated. To stop a node, do one of the following: {% include {{ page.version.version }}/prod-deployment/node-shutdown.md %} {{site.data.alerts.end}} diff --git a/v20.2/common-errors.md b/v20.2/common-errors.md index 42b87444c8a..b215819ec96 100644 --- a/v20.2/common-errors.md +++ b/v20.2/common-errors.md @@ -31,10 +31,9 @@ To resolve this issue, do one of the following: If you're not sure what the IP address/hostname and port values might have been, you can look in the node's [logs](debug-and-error-logs.html). If necessary, you can also stop the `cockroach` process, and then restart the node: -{% include copy-clipboard.html %} -~~~ shell -$ pkill cockroach -~~~ +{% include {{ page.version.version }}/prod-deployment/node-shutdown.md %} + +Then restart the node: {% include copy-clipboard.html %} ~~~ shell diff --git a/v20.2/remove-nodes.md b/v20.2/remove-nodes.md index 84d4a868eb5..3234c209476 100644 --- a/v20.2/remove-nodes.md +++ b/v20.2/remove-nodes.md @@ -23,7 +23,7 @@ A node is considered to be decommissioned when it meets two criteria: The decommissioning process transfers all range replicas on the node to other nodes. During and after this process, the node is considered "decommissioning" and continues to accept new SQL connections. Even without replicas, the node can still function as a gateway to route connections to relevant data. For this reason, the [`/health?ready=1` monitoring endpoint](monitoring-and-alerting.html#health-ready-1) continues to consider the node "ready" so load balancers can continue directing traffic to the node. -After all range replicas have been transferred, it's typical to drain the node of SQL clients and [distributed SQL](architecture/sql-layer.html#distsql) queries. The node can then be stopped via a process manager or orchestration tool, or by sending `SIGTERM` manually. When stopped, the [`/health?ready=1` monitoring endpoint](monitoring-and-alerting.html#health-ready-1) starts returning a `503 Service Unavailable` status response code so that load balancers stop directing traffic to the node. At this point the node stops updating its liveness record, and after the duration configured via [`server.time_until_store_dead`](cluster-settings.html) is considered to be decommissioned. +After all range replicas have been transferred, a graceful shutdown is initiated by sending `SIGTERM`, during which the node is drained of SQL clients, [distributed SQL](architecture/sql-layer.html#distsql) queries, and range leases. Meanwhile, the [`/health?ready=1` monitoring endpoint](monitoring-and-alerting.html#health-ready-1) starts returning a `503 Service Unavailable` status response code so that load balancers stop directing traffic to the node. Once draining completes and the process is terminated, the node stops updating its liveness record, and after the duration configured via [`server.time_until_store_dead`](cluster-settings.html) is considered to be decommissioned. You can [check the status of node decommissioning](#check-the-status-of-decommissioning-nodes) with the CLI. @@ -160,33 +160,7 @@ Even with zero replicas on a node, its [status](admin-ui-cluster-overview-page.h ### Step 5. Stop the decommissioning node -A node should be drained of SQL clients and [distributed SQL](architecture/sql-layer.html#distsql) queries before being shut down. - -Run the [`cockroach node drain`](cockroach-node.html) command with the address of the node to drain: - -
-{% include copy-clipboard.html %} -~~~ shell -cockroach node drain --certs-dir=certs --host=
-~~~ -
- -
-{% include copy-clipboard.html %} -~~~ shell -cockroach node drain --insecure --host=
-~~~ -
- -Once the node has been drained, you'll see a confirmation: - -~~~ -node is draining... remaining: 1 -node is draining... remaining: 0 (complete) -ok -~~~ - -Stop the node using one of the following methods: +Drain and stop the node using one of the following methods: {% include {{ page.version.version }}/prod-deployment/node-shutdown.md %} @@ -339,33 +313,7 @@ Even with zero replicas on a node, its [status](admin-ui-cluster-overview-page.h ### Step 5. Stop the decommissioning nodes -Nodes should be drained of SQL clients and [distributed SQL](architecture/sql-layer.html#distsql) queries before being shut down. - -For each node, run the [`cockroach node drain`](cockroach-node.html) command with the address of the node to drain: - -
-{% include copy-clipboard.html %} -~~~ shell -cockroach node drain --certs-dir=certs --host=
-~~~ -
- -
-{% include copy-clipboard.html %} -~~~ shell -cockroach node drain --insecure --host=
-~~~ -
- -Once each node has been drained, you'll see a confirmation: - -~~~ -node is draining... remaining: 1 -node is draining... remaining: 0 (complete) -ok -~~~ - -Stop each node using one of the following methods: +Drain and stop each node using one of the following methods: {% include {{ page.version.version }}/prod-deployment/node-shutdown.md %} diff --git a/v20.2/upgrade-cockroach-version.md b/v20.2/upgrade-cockroach-version.md index d76b1db2f22..6b4868a90a9 100644 --- a/v20.2/upgrade-cockroach-version.md +++ b/v20.2/upgrade-cockroach-version.md @@ -80,26 +80,12 @@ For each node in your cluster, complete the following steps. Be sure to upgrade {{site.data.alerts.callout_success}} We recommend creating scripts to perform these steps instead of performing them manually. Also, if you are running CockroachDB on Kubernetes, see our documentation on [single-cluster](orchestrate-cockroachdb-with-kubernetes.html#upgrade-the-cluster) and/or [multi-cluster](orchestrate-cockroachdb-with-kubernetes-multi-cluster.html#upgrade-the-cluster) orchestrated deployments for upgrade guidance instead. {{site.data.alerts.end}} + +1. Drain and stop the node using one of the following methods: + + {% include {{ page.version.version }}/prod-deployment/node-shutdown.md %} -1. Connect to the node. - -2. Stop the `cockroach` process. - - Without a process manager like `systemd`, use this command: - - {% include copy-clipboard.html %} - ~~~ shell - $ pkill cockroach - ~~~ - - If you are using `systemd` as the process manager, use this command to stop a node without `systemd` restarting it: - - {% include copy-clipboard.html %} - ~~~ shell - $ systemctl stop - ~~~ - - Then verify that the process has stopped: + Verify that the process has stopped: {% include copy-clipboard.html %} ~~~ shell @@ -108,7 +94,7 @@ We recommend creating scripts to perform these steps instead of performing them Alternately, you can check the node's logs for the message `server drained and shutdown completed`. -3. Download and install the CockroachDB binary you want to use: +1. Download and install the CockroachDB binary you want to use:
@@ -140,7 +126,7 @@ We recommend creating scripts to perform these steps instead of performing them ~~~
-4. If you use `cockroach` in your `$PATH`, rename the outdated `cockroach` binary, and then move the new one into its place: +1. If you use `cockroach` in your `$PATH`, rename the outdated `cockroach` binary, and then move the new one into its place:
@@ -172,7 +158,7 @@ We recommend creating scripts to perform these steps instead of performing them ~~~
-5. Start the node to have it rejoin the cluster. +1. Start the node to have it rejoin the cluster. Without a process manager like `systemd`, re-run the [`cockroach start`](cockroach-start.html) command that you used to start the node initially, for example: @@ -191,13 +177,13 @@ We recommend creating scripts to perform these steps instead of performing them $ systemctl start ~~~ -6. Verify the node has rejoined the cluster through its output to `stdout` or through the [Admin UI](admin-ui-overview.html). +1. Verify the node has rejoined the cluster through its output to `stdout` or through the [Admin UI](admin-ui-overview.html). {{site.data.alerts.callout_info}} To access the Admin UI for a secure cluster, [create a user with a password](create-user.html#create-a-user-with-a-password). Then open a browser and go to `https://:8080`. On accessing the Admin UI, you will see a Login screen, where you will need to enter your username and password. {{site.data.alerts.end}} -7. If you use `cockroach` in your `$PATH`, you can remove the old binary: +1. If you use `cockroach` in your `$PATH`, you can remove the old binary: {% include copy-clipboard.html %} ~~~ shell @@ -206,7 +192,7 @@ We recommend creating scripts to perform these steps instead of performing them If you leave versioned binaries on your servers, you do not need to do anything. -8. Wait at least one minute after the node has rejoined the cluster, and then repeat these steps for the next node. +1. Wait at least one minute after the node has rejoined the cluster, and then repeat these steps for the next node. ## Step 5. Finish the upgrade