diff --git a/docs_user/assemblies/assembly_migrating-ceph-cluster.adoc b/docs_user/assemblies/assembly_migrating-ceph-cluster.adoc new file mode 100644 index 000000000..5e5708396 --- /dev/null +++ b/docs_user/assemblies/assembly_migrating-ceph-cluster.adoc @@ -0,0 +1,44 @@ +ifdef::context[:parent-context: {context}] + +[id="ceph-migration_{context}"] + += Migrating the {CephCluster} Cluster + +:context: migrating-ceph + +:toc: left +:toclevels: 3 + +ifdef::parent-context[:context: {parent-context}] +ifndef::parent-context[:!context:] + +In the context of data plane adoption, where the {rhos_prev_long} +({OpenStackShort}) services are redeployed in {OpenShift}, you migrate a +{OpenStackPreviousInstaller}-deployed {CephCluster} cluster by using a process +called “externalizing” the {CephCluster} cluster. + +There are two deployment topologies that include an internal {CephCluster} +cluster: + +* {OpenStackShort} includes dedicated {CephCluster} nodes to host object + storage daemons (OSDs) + +* Hyperconverged Infrastructure (HCI), where Compute and Storage services are + colocated on hyperconverged nodes + +In either scenario, there are some {Ceph} processes that are deployed on +{OpenStackShort} Controller nodes: {Ceph} monitors, Ceph Object Gateway (RGW), +Rados Block Device (RBD), Ceph Metadata Server (MDS), Ceph Dashboard, and NFS +Ganesha. To migrate your {CephCluster} cluster, you must decommission the +Controller nodes and move the {Ceph} daemons to a set of target nodes that are +already part of the {CephCluster} cluster. + +include::../modules/con_ceph-daemon-cardinality.adoc[leveloffset=+1] + +include::assembly_migrating-ceph-monitoring-stack.adoc[leveloffset=+1] + +include::../modules/proc_migrating-ceph-mds.adoc[leveloffset=+1] + +include::assembly_migrating-ceph-rgw.adoc[leveloffset=+1] + +include::assembly_migrating-ceph-rbd.adoc[leveloffset=+1] diff --git a/docs_user/assemblies/assembly_migrating-ceph-monitoring-stack.adoc b/docs_user/assemblies/assembly_migrating-ceph-monitoring-stack.adoc index fad8e444e..4f7d3d14e 100644 --- a/docs_user/assemblies/assembly_migrating-ceph-monitoring-stack.adoc +++ b/docs_user/assemblies/assembly_migrating-ceph-monitoring-stack.adoc @@ -4,13 +4,10 @@ = Migrating the monitoring stack component to new nodes within an existing {Ceph} cluster -In the context of data plane adoption, where the {rhos_prev_long} ({OpenStackShort}) services are -redeployed in {OpenShift}, a {OpenStackPreviousInstaller}-deployed {CephCluster} cluster will undergo a migration in a process we are calling “externalizing” the {CephCluster} cluster. -There are two deployment topologies, broadly, that include an “internal” {CephCluster} cluster today: one is where {OpenStackShort} includes dedicated {CephCluster} nodes to host object storage daemons (OSDs), and the other is Hyperconverged Infrastructure (HCI) where Compute nodes -double up as {CephCluster} nodes. In either scenario, there are some {Ceph} processes that are deployed on {OpenStackShort} Controller nodes: {Ceph} monitors, Ceph Object Gateway (RGW), Rados Block Device (RBD), Ceph Metadata Server (MDS), Ceph Dashboard, and NFS Ganesha. The Ceph Dashboard module adds web-based monitoring and administration to the Ceph Manager. -With {OpenStackPreviousInstaller}-deployed {Ceph} this component is enabled as part of the overcloud deploy and it’s composed by: +With {OpenStackPreviousInstaller}-deployed {Ceph}, this component is enabled as +part of the overcloud deploy and it is composed of the following: - Ceph Manager module - Grafana @@ -18,16 +15,21 @@ With {OpenStackPreviousInstaller}-deployed {Ceph} this component is enabled as p - Alertmanager - Node exporter -The Ceph Dashboard containers are included through `tripleo-container-image-prepare` parameters and the high availability relies on `Haproxy` and `Pacemaker` deployed on the {OpenStackShort} front. -For an external {CephCluster} cluster, high availability is not supported. -The goal of this procedure is to migrate and relocate the Ceph Monitoring -components to free Controller nodes. - -For this procedure, we assume that we are beginning with a {OpenStackShort} based on {rhos_prev_ver} and a {Ceph} {CephRelease} deployment managed by {OpenStackPreviousInstaller}. -We assume that: - -* {Ceph} has been upgraded to {CephRelease} and is managed by cephadm/orchestrator -* Both the {Ceph} public and cluster networks are propagated, through{OpenStackPreviousInstaller}, to the target nodes +The Ceph Dashboard containers are included through +`tripleo-container-image-prepare` parameters and the high availability relies +on `Haproxy` and `Pacemaker` deployed on the {OpenStackShort} front. For an +external {CephCluster} cluster, high availability is not supported. The goal of +this procedure is to migrate and relocate the Ceph Monitoring components to +free Controller nodes. + +For this procedure, we assume that we are beginning with a {OpenStackShort} +based on {rhos_prev_ver} and a {Ceph} {CephRelease} deployment managed by +{OpenStackPreviousInstaller}. We assume that: + +* {Ceph} has been upgraded to {CephRelease} and is managed by + cephadm/orchestrator +* Both the {Ceph} public and cluster networks are propagated, + through {OpenStackPreviousInstaller}, to the target nodes include::../modules/proc_completing-prerequisites-for-migrating-ceph-monitoring-stack.adoc[leveloffset=+1] diff --git a/docs_user/assemblies/assembly_migrating-ceph-rbd.adoc b/docs_user/assemblies/assembly_migrating-ceph-rbd.adoc index 9ddf940d7..994d9cd43 100644 --- a/docs_user/assemblies/assembly_migrating-ceph-rbd.adoc +++ b/docs_user/assemblies/assembly_migrating-ceph-rbd.adoc @@ -4,13 +4,40 @@ = Migrating Red Hat Ceph Storage RBD to external RHEL nodes -For hyperconverged infrastructure (HCI) or dedicated Storage nodes that are running {Ceph} version 6 or later, you must migrate the daemons that are included in the {rhos_prev_long} control plane into the existing external Red Hat Enterprise Linux (RHEL) nodes. The external RHEL nodes typically include the Compute nodes for an HCI environment or dedicated storage nodes. +For Hyperconverged Infrastructure (HCI) or dedicated Storage nodes that are +running {Ceph} version 6 or later, you must migrate the daemons that are +included in the {rhos_prev_long} control plane into the existing external Red +Hat Enterprise Linux (RHEL) nodes. The external RHEL nodes typically include +the Compute nodes for an HCI environment or dedicated storage nodes. -To migrate Red Hat Ceph Storage Rados Block Device (RBD), your environment must meet the following requirements: +To migrate Red Hat Ceph Storage Rados Block Device (RBD), your environment must +meet the following requirements: -* {Ceph} is running version 6 or later and is managed by cephadm/orchestrator. -* NFS (ganesha) is migrated from a {OpenStackPreviousInstaller}-based deployment to cephadm. For more information, see xref:creating-a-ceph-nfs-cluster_migrating-databases[Creating a NFS Ganesha cluster]. -* Both the {Ceph} public and cluster networks are propagated, with {OpenStackPreviousInstaller}, to the target nodes. -* Ceph Monitors need to keep their IPs to avoid cold migration. +* {Ceph} is running version 6 or later and is managed by cephadm. +* NFS Ganesha is migrated from a {OpenStackPreviousInstaller}-based + deployment to cephadm. For more information, see + xref:creating-a-ceph-nfs-cluster_migrating-databases[Creating a NFS Ganesha + cluster]. +* Both the {Ceph} public and cluster networks are propagated, with + {OpenStackPreviousInstaller}, to the target nodes. +* Ceph MDS, Ceph Monitoring stack, Ceph MDS, Ceph RGW and other services are + migrated to the target nodes. +ifeval::["{build}" != "upstream"] +* The daemons distribution follows the cardinality constraints that are + described in link:https://access.redhat.com/articles/1548993[Red Hat Ceph + Storage: Supported configurations]. +endif::[] +* The {Ceph} cluster is healthy, and the `ceph -s` command returns `HEALTH_OK`. -include::../modules/proc_migrating-mon-and-mgr-from-controller-nodes.adoc[leveloffset=+1] +During the procedure to migrate the Ceph Mon daemons, the following actions +occur: + +* The mon IP addresses are moved to the target {Ceph} nodes. +* The existing Controller nodes are drained and decommisioned. +* Additional monitors are deployed to the target nodes, and they are promoted + as `_admin` nodes that can be used to manage the {CephCluster} cluster and + perform day 2 operations. + +include::../modules/proc_migrating-mgr-from-controller-nodes.adoc[leveloffset=+1] + +include::../modules/proc_migrating-mon-from-controller-nodes.adoc[leveloffset=+1] diff --git a/docs_user/assemblies/assembly_migrating-ceph-rgw.adoc b/docs_user/assemblies/assembly_migrating-ceph-rgw.adoc index 3116e242a..4b7cd198f 100644 --- a/docs_user/assemblies/assembly_migrating-ceph-rgw.adoc +++ b/docs_user/assemblies/assembly_migrating-ceph-rgw.adoc @@ -11,8 +11,6 @@ To migrate Ceph Object Gateway (RGW), your environment must meet the following r * {Ceph} is running version 6 or later and is managed by cephadm/orchestrator. * An undercloud is still available, and the nodes and networks are managed by {OpenStackPreviousInstaller}. -include::../modules/con_ceph-daemon-cardinality.adoc[leveloffset=+1] - include::../modules/proc_completing-prerequisites-for-migrating-ceph-rgw.adoc[leveloffset=+1] include::../modules/proc_migrating-the-rgw-backends.adoc[leveloffset=+1] diff --git a/docs_user/assemblies/ceph_migration.adoc b/docs_user/assemblies/ceph_migration.adoc deleted file mode 100644 index e2882c25c..000000000 --- a/docs_user/assemblies/ceph_migration.adoc +++ /dev/null @@ -1,15 +0,0 @@ -ifdef::context[:parent-context: {context}] - -[id="ceph-migration_{context}"] - -= Ceph migration - -:context: ceph-migration - -:toc: left -:toclevels: 3 - -include::../modules/ceph-monitoring_migration.adoc[leveloffset=+1] - -ifdef::parent-context[:context: {parent-context}] -ifndef::parent-context[:!context:] diff --git a/docs_user/main.adoc b/docs_user/main.adoc index ac3dd21c1..d633b00e2 100644 --- a/docs_user/main.adoc +++ b/docs_user/main.adoc @@ -22,12 +22,6 @@ include::assemblies/assembly_adopting-openstack-control-plane-services.adoc[leve include::assemblies/assembly_adopting-the-data-plane.adoc[leveloffset=+1] -include::assemblies/assembly_migrating-ceph-rbd.adoc[leveloffset=+1] - -include::assemblies/assembly_migrating-ceph-rgw.adoc[leveloffset=+1] - -include::modules/proc_migrating-ceph-mds.adoc[leveloffset=+1] - -include::assemblies/assembly_migrating-ceph-monitoring-stack.adoc[leveloffset=+1] - include::assemblies/assembly_migrating-the-object-storage-service.adoc[leveloffset=+1] + +include::assemblies/assembly_migrating-ceph-cluster.adoc[leveloffset=+1] diff --git a/docs_user/modules/con_ceph-daemon-cardinality.adoc b/docs_user/modules/con_ceph-daemon-cardinality.adoc index 8ed18b3ff..50aaa2afe 100644 --- a/docs_user/modules/con_ceph-daemon-cardinality.adoc +++ b/docs_user/modules/con_ceph-daemon-cardinality.adoc @@ -2,19 +2,19 @@ = {Ceph} daemon cardinality -{Ceph} 6 and later applies strict constraints in the way daemons can be colocated within the same node. +{Ceph} 6 and later applies strict constraints in the way daemons can be +colocated within the same node. ifeval::["{build}" != "upstream"] For more information, see link:https://access.redhat.com/articles/1548993[Red Hat Ceph Storage: Supported configurations]. endif::[] -The resulting topology depends on the available hardware, as well as the amount of {Ceph} services present in the Controller nodes which are going to be retired. -ifeval::["{build}" != "upstream"] -For more information about the procedure that is required to migrate the RGW component and keep an HA model using the Ceph ingress daemon, see link:{defaultCephURL}/object_gateway_guide/index#high-availability-for-the-ceph-object-gateway[High availability for the Ceph Object Gateway] in _Object Gateway Guide_. -endif::[] -ifeval::["{build}" != "downstream"] -The following document describes the procedure required to migrate the RGW component (and keep an HA model using the https://docs.ceph.com/en/latest/cephadm/services/rgw/#high-availability-service-for-rgw[Ceph Ingress daemon] in a common {OpenStackPreviousInstaller} scenario where Controller nodes represent the -https://github.com/openstack/tripleo-ansible/blob/master/tripleo_ansible/roles/tripleo_cephadm/tasks/rgw.yaml#L26-L30[spec placement] where the service is deployed. -endif::[] -As a general rule, the number of services that can be migrated depends on the number of available nodes in the cluster. The following diagrams cover the distribution of the {Ceph} daemons on the {Ceph} nodes where at least three nodes are required in a scenario that sees only RGW and RBD, without the {dashboard_first_ref}: +The resulting topology depends on the available hardware, as well as the amount +of {Ceph} services present in the Controller nodes that are going to be +retired. +As a general rule, the number of services that can be migrated depends on the +number of available nodes in the cluster. The following diagrams cover the +distribution of the {Ceph} daemons on the {Ceph} nodes where at least three +nodes are required in a scenario that includes only RGW and RBD, without the +{Ceph} Dashboard: ---- | | | | @@ -24,7 +24,8 @@ As a general rule, the number of services that can be migrated depends on the nu | osd | mon/mgr/crash | rgw/ingress | ---- -With the {dashboard}, and without {rhos_component_storage_file_first_ref} at least four nodes are required. The {dashboard} has no failover: +With the {dashboard}, and without {rhos_component_storage_file_first_ref}, at +least 4 nodes are required. The {Ceph} dashboard has no failover: ---- | | | | @@ -35,7 +36,8 @@ With the {dashboard}, and without {rhos_component_storage_file_first_ref} at lea | osd | rgw/ingress | (free) | ---- -With the {dashboard} and the {rhos_component_storage_file}, 5 nodes minimum are required, and the {dashboard} has no failover: +With the {Ceph} dashboard and the {rhos_component_storage_file}, 5 nodes +minimum are required, and the {Ceph} dashboard has no failover: ---- | | | | diff --git a/docs_user/modules/proc_migrating-mgr-from-controller-nodes.adoc b/docs_user/modules/proc_migrating-mgr-from-controller-nodes.adoc new file mode 100644 index 000000000..05a198d07 --- /dev/null +++ b/docs_user/modules/proc_migrating-mgr-from-controller-nodes.adoc @@ -0,0 +1,95 @@ += Migrating Ceph Manager daemons to {Ceph} nodes + +The following section describes how to move Ceph Manager daemons from the +{rhos_prev_long} Controller nodes to a set of target nodes. Target nodes might +be pre-existing {Ceph} nodes, or {OpenStackShort} Compute nodes if {Ceph} is +deployed by {OpenStackPreviousInstaller} with an HCI topology. +This procedure assumes that Cephadm and the {Ceph} Orchestrator are the tools +that drive the Ceph Manager migration. As is done with the other Ceph daemons +(MDS, Monitoring, and RGW), the procedure uses the Ceph spec to modify the +placement and reschedule the daemons. Ceph Manager is run in an active/passive +fashion, and it also provides many modules, including the Ceph orchestrator. + + +.Prerequisites + +* Configure the target nodes (CephStorage or ComputeHCI) to have both `storage` +and `storage_mgmt` networks to ensure that you can use both {Ceph} public and +cluster networks from the same node. This step requires you to interact with +{OpenStackPreviousInstaller}. From {rhos_prev_long} {rhos_prev_ver} and later +you do not have to run a stack update. + +.Procedure + +. Ssh into the target node and enable the firewall rules that are required to + reach a Manager service: ++ +---- +dports="6800:7300" +ssh heat-admin@ sudo iptables -I INPUT \ + -p tcp --match multiport --dports $dports -j ACCEPT; +---- ++ +Repeat this step for each ``. + +. Check that the rules are properly applied and persist them: ++ +---- +$ sudo iptables-save +$ sudo systemctl restart iptables +---- ++ +. Prepare the target node to host the new Ceph Manager daemon, and add the `mgr` +label to the target node: ++ +---- +ceph orch host label add mgr; done +---- ++ +* Replace `` with the hostname of the hosts listed in the {Ceph} + through the `ceph orch host ls` command ++ +Repeat the actions described above for each ` that will host a +Ceph Manager daemon. + +. Get the Ceph Manager spec: ++ +[source,yaml] +---- +sudo cephadm shell -- ceph orch ls --export mgr > mgr.yaml +---- + +. Edit the retrieved spec and add the `label: mgr` section to the `placement` + section: ++ +[source,yaml] +---- +service_type: mgr +service_id: mgr +placement: + label: mgr +---- + +. Save the spec in the `/tmp/mgr.yaml` file. +. Apply the spec with cephadm by using the orchestrator: ++ +---- +sudo cephadm shell -m /tmp/mgr.yaml -- ceph orch apply -i /mnt/mgr.yaml +---- ++ +As a result of this procedure, you see a Ceph Manager daemon count that matches +the number of hosts where the `mgr` label is added. + +. Verify that the new Ceph Manager are created in the target nodes: ++ +---- +ceph orch ps | grep -i mgr +ceph -s +---- ++ +[NOTE] +The procedure does not shrink the Ceph Manager daemons. The count is grown by +the number of target nodes, and migrating Ceph Monitor daemons to {Ceph} nodes +decommissions the stand-by Ceph Manager instances. For more information, see +xref:migrating-mon-from-controller-nodes_migrating-ceph-rbd[Migrating Ceph Monitor +daemons to {Ceph} nodes]. diff --git a/docs_user/modules/proc_migrating-mon-from-controller-nodes.adoc b/docs_user/modules/proc_migrating-mon-from-controller-nodes.adoc new file mode 100644 index 000000000..0bc0c6d27 --- /dev/null +++ b/docs_user/modules/proc_migrating-mon-from-controller-nodes.adoc @@ -0,0 +1,403 @@ +[id="migrating-mon-from-controller-nodes_{context}"] + += Migrating Ceph Monitor daemons to {Ceph} nodes + +The following section describes how to move Ceph Monitor daemons from the +{rhos_prev_long} Controller nodes to a set of target nodes. Target nodes might +be pre-existing {Ceph} nodes, or {OpenStackShort} Compute nodes if {Ceph} is +deployed by {OpenStackPreviousInstaller} with an HCI topology. +This procedure assumes that some of the steps are run on the source node that +we want to decommission, while other steps are run on the target node that is +supposed to host the redeployed daemon. + + +.Prerequisites + +Configure the target nodes (CephStorage or ComputeHCI) to have both `storage` +and `storage_mgmt` networks to ensure that you can use both {Ceph} public and +cluster networks from the same node. This step requires you to interact with +{OpenStackPreviousInstaller}. From {rhos_prev_long} {rhos_prev_ver} and later +you do not have to run a stack update. However, there are commands that you +must perform to run `os-net-config` on the bare metal node and configure +additional networks. + +. If target nodes are `CephStorage`, ensure that the network is defined in the +`metalsmith.yaml` for the CephStorageNodes: ++ +[source,yaml] +---- + - name: CephStorage + count: 2 + instances: + - hostname: oc0-ceph-0 + name: oc0-ceph-0 + - hostname: oc0-ceph-1 + name: oc0-ceph-1 + defaults: + networks: + - network: ctlplane + vif: true + - network: storage_cloud_0 + subnet: storage_cloud_0_subnet + - network: storage_mgmt_cloud_0 + subnet: storage_mgmt_cloud_0_subnet + network_config: + template: templates/single_nic_vlans/single_nic_vlans_storage.j2 +---- + +. Run the provisioning command: ++ +---- +$ openstack overcloud node provision \ + -o overcloud-baremetal-deployed-0.yaml --stack overcloud-0 \ + --network-config -y --concurrency 2 /home/stack/metalsmith-0.yam +---- + +. Verify that the storage network is configured on the target nodes: ++ +---- +(undercloud) [stack@undercloud ~]$ ssh heat-admin@192.168.24.14 ip -o -4 a +1: lo inet 127.0.0.1/8 scope host lo\ valid_lft forever preferred_lft forever +5: br-storage inet 192.168.24.14/24 brd 192.168.24.255 scope global br-storage\ valid_lft forever preferred_lft forever +6: vlan1 inet 192.168.24.14/24 brd 192.168.24.255 scope global vlan1\ valid_lft forever preferred_lft forever +7: vlan11 inet 172.16.11.172/24 brd 172.16.11.255 scope global vlan11\ valid_lft forever preferred_lft forever +8: vlan12 inet 172.16.12.46/24 brd 172.16.12.255 scope global vlan12\ valid_lft forever preferred_lft forever +---- + +.Procedure + +. Ssh into the target node and enable the firewall rules that are required to + reach a Mon service: ++ +---- +$ for port in 3300 6789; { + ssh heat-admin@ sudo iptables -I INPUT \ + -p tcp -m tcp --dport $port -m conntrack --ctstate NEW \ + -j ACCEPT; +} +---- ++ +* Replace `` with the hostname of the node that is supposed to + host the new mon. + +. Check that the rules are properly applied and persist them: ++ +---- +sudo iptables-save +sudo systemctl restart iptables +---- + +. To migrate the existing Mons to the target {Ceph} nodes, create the following + {Ceph} spec from the first mon (or the first Controller node) and modify the + placement based on the appropriate label. ++ +[source,yaml] +---- +service_type: mon +service_id: mon +placement: + label: mon +---- + +. Save the spec in the `/tmp/mon.yaml` file. +. Apply the spec with cephadm by using the orchestrator: ++ +---- +$ sudo cephadm shell -m /tmp/mon.yaml +$ ceph orch apply -i /mnt/mon.yaml +---- ++ +[NOTE] +Applying the `mon.yaml` spec allows the existing strategy to use `labels` +instead of `hosts`. As a result, any node with the `mon` label can host a Ceph +mon daemon. +Execute this step once to avoid multiple iterations when multiple Ceph Mons are +migrated. + +. Check the status of the {CephCluster} and the Ceph orchestrator daemons list. + Make sure that the three mons are in quorum and listed by the `ceph orch` + command: ++ +---- +# ceph -s + cluster: + id: f6ec3ebe-26f7-56c8-985d-eb974e8e08e3 + health: HEALTH_OK + + services: + mon: 3 daemons, quorum oc0-controller-0,oc0-controller-1,oc0-controller-2 (age 19m) + mgr: oc0-controller-0.xzgtvo(active, since 32m), standbys: oc0-controller-1.mtxohd, oc0-controller-2.ahrgsk + osd: 8 osds: 8 up (since 12m), 8 in (since 18m); 1 remapped pgs + + data: + pools: 1 pools, 1 pgs + objects: 0 objects, 0 B + usage: 43 MiB used, 400 GiB / 400 GiB avail + pgs: 1 active+clean +---- ++ +---- +[ceph: root@oc0-controller-0 /]# ceph orch host ls +HOST ADDR LABELS STATUS +oc0-ceph-0 192.168.24.14 osd +oc0-ceph-1 192.168.24.7 osd +oc0-controller-0 192.168.24.15 _admin mgr mon +oc0-controller-1 192.168.24.23 _admin mgr mon +oc0-controller-2 192.168.24.13 _admin mgr mon +---- + +. On the source node, back up the `/etc/ceph/` directory. The backup allows you + to execute cephadm and get a shell to the {Ceph} cluster from the source node: ++ +---- +$ mkdir -p $HOME/ceph_client_backup +$ sudo cp -R /etc/ceph $HOME/ceph_client_backup +---- + +. Before draining the source node and relocating the IP address of the storage + network to the target node, fail the ceph-mgr if it is active on the + source node: ++ +---- +$ ceph mgr fail +---- + +. Drain the source node and start the mon migration. From the cephadm shell, + remove the labels on the source node: ++ +---- +for label in mon mgr _admin; do + ceph orch host rm label $label; +done +---- + +. Remove the running mon daemon from the source node: ++ +---- +$ cephadm shell -- ceph orch daemon rm mon. --force" +---- + +. Run the drain command: ++ +---- +$ cephadm shell -- ceph drain +---- + +. Remove the `` host from the {CephCluster} cluster: ++ +---- +$ cephadm shell -- ceph orch host rm --force" +---- ++ +* Replace `` with the hostname of the source node. ++ + +[NOTE] +The source node is not part of the cluster anymore, and should not appear in +the {Ceph} host list when `cephadm shell -- ceph orch host ls` is run. +However, a `sudo podman ps` in the `` might list both mon and mgr +still up and running. + +---- +[root@oc0-controller-1 ~]# sudo podman ps +CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES +ifeval::["{build}" != "downstream"] +5c1ad36472bc quay.io/ceph/daemon@sha256:320c364dcc8fc8120e2a42f54eb39ecdba12401a2546763b7bef15b02ce93bc4 -n mon.oc0-contro... 35 minutes ago Up 35 minutes ago ceph-f6ec3ebe-26f7-56c8-985d-eb974e8e08e3-mon-oc0-controller-1 +3b14cc7bf4dd quay.io/ceph/daemon@sha256:320c364dcc8fc8120e2a42f54eb39ecdba12401a2546763b7bef15b02ce93bc4 -n mgr.oc0-contro... 35 minutes ago Up 35 minutes ago ceph-f6ec3ebe-26f7-56c8-985d-eb974e8e08e3-mgr-oc0-controller-1-mtxohd +endif::[] +ifeval::["{build}" == "downstream"] +5c1ad36472bc registry.redhat.io/ceph/rhceph@sha256:320c364dcc8fc8120e2a42f54eb39ecdba12401a2546763b7bef15b02ce93bc4 -n mon.oc0-contro... 35 minutes ago Up 35 minutes ago ceph-f6ec3ebe-26f7-56c8-985d-eb974e8e08e3-mon-oc0-controller-1 +3b14cc7bf4dd registry.redhat.io/ceph/rhceph@sha256:320c364dcc8fc8120e2a42f54eb39ecdba12401a2546763b7bef15b02ce93bc4 -n mgr.oc0-contro... 35 minutes ago Up 35 minutes ago ceph-f6ec3ebe-26f7-56c8-985d-eb974e8e08e3-mgr-oc0-controller-1-mtxohd +endif::[] +---- + +To cleanup the source node before moving to the next phase, cleanup the existing +containers and remove the cephadm related data from the node. +// fpantano: there's an automated procedure run through cephadm but it's too +// risky. If the user doesn't perform it properly the cluster can be affected. +// We can put a downstream comment to contact the RH support to clean the source +// node up in case of leftovers, and open a bug for cephadm. + +//. ssh into one of the existing Ceph mons (usually controller-1 or controller-2) +. Prepare the target node to host the new mon and add the `mon` label to the +target node: ++ +---- +for label in mon mgr _admin; do + ceph orch host label add $label; done +done +---- ++ +* Replace with the hostname of the host listed in the {CephCluster} + through the `ceph orch host ls` command. + +[Note] +At this point the cluster is running with only two mons, but a third mon appears +and will be deployed on the target node. +However, The third mon might be deployed on a different ip address available in +the node, and you need to redeploy it when the ip migration is concluded. +Even though the mon is deployed on the wrong ip address, it's useful keep the +quorum to three and it ensures we do not risk to lose the cluster because two +mons go in split brain. + +. Confirm that the cluster has three mons and they are in quorum: ++ +---- +$ cephadm shell -- ceph -s +$ cephadm shell -- ceph orch ps | grep -i mon +---- + +It is now possible to migrate the original mon IP address to the target node and +redeploy the existing mon on it. +The following IP address migration procedure assumes that the target nodes have +been originally deployed by {OpenStackPreviousInstaller} and the network configuration +is managed by `os-net-config`. + +// NOTE (fpantano): we need to document the same ip address migration procedure +// w/ an EDPM node that has already been adopted. + +. Get the mon ip address from the existing `/etc/ceph/ceph.conf` (check the `mon_host` +line), for example: ++ +---- +mon_host = [v2:172.17.3.60:3300/0,v1:172.17.3.60:6789/0] [v2:172.17.3.29:3300/0,v1:172.17.3.29:6789/0] [v2:172.17.3.53:3300/0,v1:172.17.3.53:6789/0] +---- + +. Confirm that the mon ip address is present on the source node `os-net-config` +configuration located in `/etc/os-net-config`: ++ +---- + +[tripleo-admin@controller-0 ~]$ grep "172.17.3.60" /etc/os-net-config/config.yaml + - ip_netmask: 172.17.3.60/24 +---- + +. Edit `/etc/os-net-config/config.yaml` and remove the `ip_netmask` line. + +. Save the file and refresh the node network configuration: ++ +---- +$ sudo os-net-config -c /etc/os-net-config/config.yaml +---- + +. Verify, using the `ip` command, that the IP address is not present in the source +node anymore. + +. Ssh into the target node, for example `cephstorage-0`, and add the IP address + for the new mon. + +. On the target node, edit `/etc/os-net-config/config.yaml` and +add the `- ip_netmask: 172.17.3.60` line that you removed in the source node. + +. Save the file and refresh the node network configuration: ++ +---- +$ sudo os-net-config -c /etc/os-net-config/config.yaml +---- + +. Verify, using the `ip` command, that the IP address is present in the target +node. + +. Get the ceph mon spec: ++ +---- +ceph orch ls --export mon > mon.yaml +---- + +. Edit the retrieved spec and add the `unmanaged: true` keyword: ++ +[source,yaml] +---- +service_type: mon +service_id: mon +placement: + label: mon +unmanaged: true +---- + +. Save the spec in the `/tmp/mon.yaml` file +. Apply the spec with cephadm by using the orchestrator: ++ +---- +$ sudo cephadm shell -m /tmp/mon.yaml +$ ceph orch apply -i /mnt/mon.yaml +---- ++ +The mon daemons are marked as ``, and it is now possible to redeploy +the existing daemon and bind it to the migrated IP address. + +. Delete the existing mon on the target node: ++ +---- +$ ceph orch daemon add rm mon. --force +---- ++ +. Redeploy the new mon on the target using the old IP address: ++ +---- +$ ceph orch daemon add mon : +---- ++ +* Replace `` with the hostname of the target node enrolled in the + {Ceph} cluster. +* Replace `` with the ip address of the migrated address. + + +. Get the ceph mon spec: ++ +---- +$ ceph orch ls --export mon > mon.yaml +---- + +. Edit the retrieved spec and set the `unmanaged` keyword to `false`: ++ +[source,yaml] +---- +service_type: mon +service_id: mon +placement: + label: mon +unmanaged: false +---- + +. Save the spec in `/tmp/mon.yaml` file. +. Apply the spec with cephadm by using the Ceph Orchestrator: ++ +---- +$ sudo cephadm shell -m /tmp/mon.yaml +$ ceph orch apply -i /mnt/mon.yaml +---- ++ +The new mon runs on the target node with the original IP address. + +. Identify the running `mgr`: ++ +---- +$ sudo cephadm shell -- ceph -s +---- ++ +. Refresh the mgr information by force-failing it: ++ +---- +$ ceph mgr fail +---- ++ +. Refresh the `OSD` information: ++ +---- +$ ceph orch reconfig osd.default_drive_group +---- ++ +Verify the {CephCluster} cluster is healthy: ++ +---- +[ceph: root@oc0-controller-0 specs]# ceph -s + cluster: + id: f6ec3ebe-26f7-56c8-985d-eb974e8e08e3 + health: HEALTH_OK +... +... +---- + +. Repeat this procedure for any additional Controller node that hosts a mon + until you have migrated all the Ceph Mon daemons to the target nodes. diff --git a/docs_user/modules/proc_relocating-one-instance-of-a-monitoring-stack-to-migrate-daemons-to-target-nodes.adoc b/docs_user/modules/proc_relocating-one-instance-of-a-monitoring-stack-to-migrate-daemons-to-target-nodes.adoc index 361c4b0ef..9e910c4c1 100644 --- a/docs_user/modules/proc_relocating-one-instance-of-a-monitoring-stack-to-migrate-daemons-to-target-nodes.adoc +++ b/docs_user/modules/proc_relocating-one-instance-of-a-monitoring-stack-to-migrate-daemons-to-target-nodes.adoc @@ -104,9 +104,8 @@ With the procedure described above we lose High Availability: the monitoring stack daemons have no VIP and haproxy anymore; Node exporters are still running on all the nodes: instead of using labels we keep the current approach as we want to not reduce the monitoring space covered. - ++ //kgilliga: What does "the procedure described above" refer to? - . Update the Ceph Dashboard Manager configuration. An important aspect that should be considered at this point is to replace and verify that the {Ceph} configuration is aligned with the relocation you just made. Run the `ceph config dump` command and review the current config. In particular, focus on the following configuration entries: @@ -122,7 +121,7 @@ mgr advanced mgr/dashboard/controller-0.ycokob/server_addr 172.17.3.33 mgr advanced mgr/dashboard/controller-1.lmzpuc/server_addr 172.17.3.147 mgr advanced mgr/dashboard/controller-2.xpdgfl/server_addr 172.17.3.138 ---- - ++ . Verify that `grafana`, `alertmanager` and `prometheus` `API_HOST/URL` point to the IP addresses (on the storage network) of the node where each daemon has been relocated. This should be automatically addressed by cephadm and it shouldn’t @@ -149,7 +148,7 @@ mgr advanced mgr/dashboard/ALERTMANAGER_API_HOST http://172.17.3.83:9093 mgr advanced mgr/dashboard/PROMETHEUS_API_HOST http://172.17.3.83:9092 mgr advanced mgr/dashboard/GRAFANA_API_URL https://172.17.3.144:3100 ---- - ++ . The Ceph Dashboard (mgr module plugin) has not been impacted at all by this relocation. The service is provided by the Ceph Manager daemon, hence we might experience an impact when the active mgr is migrated or is force-failed.