Skip to content

Latest commit

 

History

History
350 lines (274 loc) · 15.2 KB

kubernetes-remove-nodes-secure.md

File metadata and controls

350 lines (274 loc) · 15.2 KB

Before removing a node from your cluster, you must first decommission the node. This lets a node finish in-flight requests, rejects any new requests, and transfers all range replicas and range leases off the node.

{{site.data.alerts.callout_danger}} If you remove nodes without first telling CockroachDB to decommission them, you may cause data or even cluster unavailability. For more details about how this works and what to consider before removing nodes, see Decommission Nodes. {{site.data.alerts.end}}

{{site.data.alerts.callout_danger}} Do not scale down to fewer than 3 nodes. This is considered an anti-pattern on CockroachDB and will cause errors. {{site.data.alerts.end}}

1. Get a shell into one of the pods and use the [`cockroach node status`](cockroach-node.html) command to get the internal IDs of nodes:
{% include copy-clipboard.html %}
~~~ shell
$ kubectl exec -it cockroachdb-2 \
-- ./cockroach node status \
--certs-dir cockroach-certs
~~~

~~~
  id |                 address                 |               sql_address               |  build  |            started_at            |            updated_at            | locality | is_available | is_live
-----+-----------------------------------------+-----------------------------------------+---------+----------------------------------+----------------------------------+----------+--------------+----------
   1 | cockroachdb-0.cockroachdb.default:26257 | cockroachdb-0.cockroachdb.default:26257 | v20.1.4 | 2020-10-22 23:02:10.084425+00:00 | 2020-10-27 20:18:22.117115+00:00 |          | true         | true
   2 | cockroachdb-1.cockroachdb.default:26257 | cockroachdb-1.cockroachdb.default:26257 | v20.1.4 | 2020-10-22 23:02:46.533911+00:00 | 2020-10-27 20:18:22.558333+00:00 |          | true         | true
   3 | cockroachdb-2.cockroachdb.default:26257 | cockroachdb-2.cockroachdb.default:26257 | v20.1.4 | 2020-10-26 21:46:38.90803+00:00  | 2020-10-27 20:18:22.601021+00:00 |          | true         | true
   4 | cockroachdb-3.cockroachdb.default:26257 | cockroachdb-3.cockroachdb.default:26257 | v20.1.4 | 2020-10-27 19:54:04.714241+00:00 | 2020-10-27 20:18:22.74559+00:00  |          | true         | true
(4 rows)

1. Use the [`cockroach node decommission`](cockroach-node.html) command to decommission the node with the highest number in its address (in this case, the address including `cockroachdb-3`):

    {{site.data.alerts.callout_info}}
    It's important to decommission the node with the highest number in its address because, when you reduce the replica count, Kubernetes will remove the pod for that node.
    {{site.data.alerts.end}}

    {% include copy-clipboard.html %}
    ~~~ shell
    $ kubectl exec -it cockroachdb-3 \
    -- ./cockroach node decommission \
    --self \
    --certs-dir cockroach-certs \
    --host=<address of node to decommission>
    ~~~

    You'll then see the decommissioning status print to `stderr` as it changes:

    ~~~
     id | is_live | replicas | is_decommissioning | is_draining  
    +---+---------+----------+--------------------+-------------+
      4 |  true   |       73 |        true        |    false     
    (1 row)
    ~~~

    Once the node has been fully decommissioned and stopped, you'll see a confirmation:

    ~~~
     id | is_live | replicas | is_decommissioning | is_draining  
    +---+---------+----------+--------------------+-------------+
      4 |  true   |        0 |        true        |    false     
    (1 row)

    No more data reported on target nodes. Please verify cluster health before removing the nodes.
    ~~~

1. Once the node has been decommissioned, open and edit `example.yaml`.

    {% include copy-clipboard.html %}
    ~~~ shell
    $ vi example.yaml
    ~~~

1. In `example.yaml`, update the number of `nodes`:

    ~~~
    nodes: 3
    ~~~

1. Apply `example.yaml` with the new configuration:

    {% include copy-clipboard.html %}
    ~~~ shell
    $ kubectl apply -f example.yaml
    ~~~

    The Operator will remove the node with the highest number in its address (in this case, the address including `cockroachdb-3`) from the cluster.

1. Verify that the pod was successfully removed:

    {% include copy-clipboard.html %}
    ~~~ shell
    $ kubectl get pods
    ~~~

    ~~~
    NAME                        READY     STATUS    RESTARTS   AGE
    cockroachdb-0               1/1       Running   0          51m
    cockroachdb-1               1/1       Running   0          47m
    cockroachdb-2               1/1       Running   0          3m
    ...
    ~~~

1. You should also remove the persistent volume that was mounted to the pod. Get the persistent volume claims for the volumes:

    {% include copy-clipboard.html %}
    ~~~ shell
    $ kubectl get pvc
    ~~~

    ~~~
    NAME                    STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
    datadir-cockroachdb-0   Bound    pvc-75dadd4c-01a1-11ea-b065-42010a8e00cb   100Gi      RWO            standard       17m
    datadir-cockroachdb-1   Bound    pvc-75e143ca-01a1-11ea-b065-42010a8e00cb   100Gi      RWO            standard       17m
    datadir-cockroachdb-2   Bound    pvc-75ef409a-01a1-11ea-b065-42010a8e00cb   100Gi      RWO            standard       17m
    datadir-cockroachdb-3   Bound    pvc-75e561ba-01a1-11ea-b065-42010a8e00cb   100Gi      RWO            standard       17m
    ~~~

1. Verify that the PVC with the highest number in its name is no longer mounted to a pod:

    {% include copy-clipboard.html %}
    ~~~ shell
    $ kubectl describe pvc datadir-cockroachdb-3
    ~~~

    ~~~
    Name:          datadir-cockroachdb-3
    ...
    Mounted By:    <none>
    ~~~

1. Remove the persistent volume by deleting the PVC:

    {% include copy-clipboard.html %}
    ~~~ shell
    $ kubectl delete pvc datadir-cockroachdb-3
    ~~~

    ~~~
    persistentvolumeclaim "datadir-cockroachdb-3" deleted
    ~~~
</section>

<section class="filter-content" markdown="1" data-scope="manual">
1. Get a shell into the `cockroachdb-client-secure` pod you created earlier and use the [`cockroach node status`](cockroach-node.html) command to get the internal IDs of nodes:

    {% include copy-clipboard.html %}
    ~~~ shell
    $ kubectl exec -it cockroachdb-client-secure \
    -- ./cockroach node status \
    --certs-dir=/cockroach-certs \
    --host=cockroachdb-public
    ~~~

    ~~~
      id |               address                                     | build  |            started_at            |            updated_at            | is_available | is_live
    +----+---------------------------------------------------------------------------------+--------+----------------------------------+----------------------------------+--------------+---------+
       1 | cockroachdb-0.cockroachdb.default.svc.cluster.local:26257 | {{page.release_info.version}} | 2018-11-29 16:04:36.486082+00:00 | 2018-11-29 18:24:24.587454+00:00 | true         | true
       2 | cockroachdb-2.cockroachdb.default.svc.cluster.local:26257 | {{page.release_info.version}} | 2018-11-29 16:55:03.880406+00:00 | 2018-11-29 18:24:23.469302+00:00 | true         | true
       3 | cockroachdb-1.cockroachdb.default.svc.cluster.local:26257 | {{page.release_info.version}} | 2018-11-29 16:04:41.383588+00:00 | 2018-11-29 18:24:25.030175+00:00 | true         | true
       4 | cockroachdb-3.cockroachdb.default.svc.cluster.local:26257 | {{page.release_info.version}} | 2018-11-29 17:31:19.990784+00:00 | 2018-11-29 18:24:26.041686+00:00 | true         | true
    (4 rows)
    ~~~

    The pod uses the `root` client certificate created earlier to initialize the cluster, so there's no CSR approval required.

1. Note the ID of the node with the highest number in its address (in this case, the address including `cockroachdb-3`) and use the [`cockroach node decommission`](cockroach-node.html) command to decommission it:

    {{site.data.alerts.callout_info}}
    It's important to decommission the node with the highest number in its address because, when you reduce the replica count, Kubernetes will remove the pod for that node.
    {{site.data.alerts.end}}

    {% include copy-clipboard.html %}
    ~~~ shell
    $ kubectl exec -it cockroachdb-client-secure \
    -- ./cockroach node decommission <node ID> \
    --certs-dir=/cockroach-certs \
    --host=cockroachdb-public
    ~~~

    You'll then see the decommissioning status print to `stderr` as it changes:

    ~~~
     id | is_live | replicas | is_decommissioning | is_draining  
    +---+---------+----------+--------------------+-------------+
      4 |  true   |       73 |        true        |    false     
    (1 row)
    ~~~

    Once the node has been fully decommissioned and stopped, you'll see a confirmation:

    ~~~
     id | is_live | replicas | is_decommissioning | is_draining  
    +---+---------+----------+--------------------+-------------+
      4 |  true   |        0 |        true        |    false     
    (1 row)

    No more data reported on target nodes. Please verify cluster health before removing the nodes.
    ~~~

1. Once the node has been decommissioned, scale down your StatefulSet:

    {% include copy-clipboard.html %}
    ~~~ shell
    $ kubectl scale statefulset cockroachdb --replicas=3
    ~~~

    ~~~
    statefulset.apps/cockroachdb scaled
    ~~~

1. Verify that the pod was successfully removed:

    {% include copy-clipboard.html %}
    ~~~ shell
    $ kubectl get pods
    ~~~

    ~~~
    NAME                        READY     STATUS    RESTARTS   AGE
    cockroachdb-0               1/1       Running   0          51m
    cockroachdb-1               1/1       Running   0          47m
    cockroachdb-2               1/1       Running   0          3m
    cockroachdb-client-secure   1/1       Running   0          15m
    ...
    ~~~
</section>

<section class="filter-content" markdown="1" data-scope="helm">
1. Get a shell into the `cockroachdb-client-secure` pod you created earlier and use the [`cockroach node status`](cockroach-node.html) command to get the internal IDs of nodes:

    {% include copy-clipboard.html %}
    ~~~ shell
    $ kubectl exec -it cockroachdb-client-secure \
    -- ./cockroach node status \
    --certs-dir=/cockroach-certs \
    --host=my-release-cockroachdb-public
    ~~~    

    ~~~
      id |                                     address                                     | build  |            started_at            |            updated_at            | is_available | is_live
    +----+---------------------------------------------------------------------------------+--------+----------------------------------+----------------------------------+--------------+---------+
       1 | my-release-cockroachdb-0.my-release-cockroachdb.default.svc.cluster.local:26257 | {{page.release_info.version}} | 2018-11-29 16:04:36.486082+00:00 | 2018-11-29 18:24:24.587454+00:00 | true         | true
       2 | my-release-cockroachdb-2.my-release-cockroachdb.default.svc.cluster.local:26257 | {{page.release_info.version}} | 2018-11-29 16:55:03.880406+00:00 | 2018-11-29 18:24:23.469302+00:00 | true         | true
       3 | my-release-cockroachdb-1.my-release-cockroachdb.default.svc.cluster.local:26257 | {{page.release_info.version}} | 2018-11-29 16:04:41.383588+00:00 | 2018-11-29 18:24:25.030175+00:00 | true         | true
       4 | my-release-cockroachdb-3.my-release-cockroachdb.default.svc.cluster.local:26257 | {{page.release_info.version}} | 2018-11-29 17:31:19.990784+00:00 | 2018-11-29 18:24:26.041686+00:00 | true         | true
    (4 rows)
    ~~~

    The pod uses the `root` client certificate created earlier to initialize the cluster, so there's no CSR approval required.

1. Note the ID of the node with the highest number in its address (in this case, the address including `cockroachdb-3`) and use the [`cockroach node decommission`](cockroach-node.html) command to decommission it:

    {{site.data.alerts.callout_info}}
    It's important to decommission the node with the highest number in its address because, when you reduce the replica count, Kubernetes will remove the pod for that node.
    {{site.data.alerts.end}}

    {% include copy-clipboard.html %}
    ~~~ shell
    $ kubectl exec -it cockroachdb-client-secure \
    -- ./cockroach node decommission <node ID> \
    --certs-dir=/cockroach-certs \
    --host=my-release-cockroachdb-public
    ~~~    

    You'll then see the decommissioning status print to `stderr` as it changes:

    ~~~
     id | is_live | replicas | is_decommissioning | is_draining  
    +---+---------+----------+--------------------+-------------+
      4 |  true   |       73 |        true        |    false     
    (1 row)
    ~~~

    Once the node has been fully decommissioned and stopped, you'll see a confirmation:

    ~~~
     id | is_live | replicas | is_decommissioning | is_draining  
    +---+---------+----------+--------------------+-------------+
      4 |  true   |        0 |        true        |    false     
    (1 row)

    No more data reported on target nodes. Please verify cluster health before removing the nodes.
    ~~~

1. Once the node has been decommissioned, scale down your StatefulSet:

    {% include copy-clipboard.html %}
    ~~~ shell
    $ helm upgrade \
    my-release \
    cockroachdb/cockroachdb \
    --set statefulset.replicas=3 \
    --reuse-values
    ~~~

1. Verify that the pod was successfully removed:

    {% include copy-clipboard.html %}
    ~~~ shell
    $ kubectl get pods
    ~~~

    ~~~
    NAME                        READY     STATUS    RESTARTS   AGE
    my-release-cockroachdb-0    1/1       Running   0          51m
    my-release-cockroachdb-1    1/1       Running   0          47m
    my-release-cockroachdb-2    1/1       Running   0          3m
    cockroachdb-client-secure   1/1       Running   0          15m
    ...
    ~~~

1. You should also remove the persistent volume that was mounted to the pod. Get the persistent volume claims for the volumes:

    {% include copy-clipboard.html %}
    ~~~ shell
    $ kubectl get pvc
    ~~~

    ~~~
    NAME                               STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
    datadir-my-release-cockroachdb-0   Bound    pvc-75dadd4c-01a1-11ea-b065-42010a8e00cb   100Gi      RWO            standard       17m
    datadir-my-release-cockroachdb-1   Bound    pvc-75e143ca-01a1-11ea-b065-42010a8e00cb   100Gi      RWO            standard       17m
    datadir-my-release-cockroachdb-2   Bound    pvc-75ef409a-01a1-11ea-b065-42010a8e00cb   100Gi      RWO            standard       17m
    datadir-my-release-cockroachdb-3   Bound    pvc-75e561ba-01a1-11ea-b065-42010a8e00cb   100Gi      RWO            standard       17m
    ~~~

1. Verify that the PVC with the highest number in its name is no longer mounted to a pod:

    {% include copy-clipboard.html %}
    ~~~ shell
    $ kubectl describe pvc datadir-my-release-cockroachdb-3
    ~~~

    ~~~
    Name:          datadir-my-release-cockroachdb-3
    ...
    Mounted By:    <none>
    ~~~

1. Remove the persistent volume by deleting the PVC:

    {% include copy-clipboard.html %}
    ~~~ shell
    $ kubectl delete pvc datadir-my-release-cockroachdb-3
    ~~~

    ~~~
    persistentvolumeclaim "datadir-my-release-cockroachdb-3" deleted
    ~~~
</section>