Skip to content

Commit

Permalink
Merge pull request #461 from fmount/ceph_doc
Browse files Browse the repository at this point in the history
Rework Ceph RBD migration documentation
  • Loading branch information
klgill authored Jun 7, 2024
2 parents b73a146 + 04e42d9 commit bb5aef5
Show file tree
Hide file tree
Showing 10 changed files with 612 additions and 63 deletions.
44 changes: 44 additions & 0 deletions docs_user/assemblies/assembly_migrating-ceph-cluster.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
ifdef::context[:parent-context: {context}]

[id="ceph-migration_{context}"]

= Migrating the {CephCluster} Cluster

:context: migrating-ceph

:toc: left
:toclevels: 3

ifdef::parent-context[:context: {parent-context}]
ifndef::parent-context[:!context:]

In the context of data plane adoption, where the {rhos_prev_long}
({OpenStackShort}) services are redeployed in {OpenShift}, you migrate a
{OpenStackPreviousInstaller}-deployed {CephCluster} cluster by using a process
called “externalizing” the {CephCluster} cluster.

There are two deployment topologies that include an internal {CephCluster}
cluster:

* {OpenStackShort} includes dedicated {CephCluster} nodes to host object
storage daemons (OSDs)

* Hyperconverged Infrastructure (HCI), where Compute and Storage services are
colocated on hyperconverged nodes

In either scenario, there are some {Ceph} processes that are deployed on
{OpenStackShort} Controller nodes: {Ceph} monitors, Ceph Object Gateway (RGW),
Rados Block Device (RBD), Ceph Metadata Server (MDS), Ceph Dashboard, and NFS
Ganesha. To migrate your {CephCluster} cluster, you must decommission the
Controller nodes and move the {Ceph} daemons to a set of target nodes that are
already part of the {CephCluster} cluster.

include::../modules/con_ceph-daemon-cardinality.adoc[leveloffset=+1]

include::assembly_migrating-ceph-monitoring-stack.adoc[leveloffset=+1]

include::../modules/proc_migrating-ceph-mds.adoc[leveloffset=+1]

include::assembly_migrating-ceph-rgw.adoc[leveloffset=+1]

include::assembly_migrating-ceph-rbd.adoc[leveloffset=+1]
32 changes: 17 additions & 15 deletions docs_user/assemblies/assembly_migrating-ceph-monitoring-stack.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -4,30 +4,32 @@

= Migrating the monitoring stack component to new nodes within an existing {Ceph} cluster

In the context of data plane adoption, where the {rhos_prev_long} ({OpenStackShort}) services are
redeployed in {OpenShift}, a {OpenStackPreviousInstaller}-deployed {CephCluster} cluster will undergo a migration in a process we are calling “externalizing” the {CephCluster} cluster.
There are two deployment topologies, broadly, that include an “internal” {CephCluster} cluster today: one is where {OpenStackShort} includes dedicated {CephCluster} nodes to host object storage daemons (OSDs), and the other is Hyperconverged Infrastructure (HCI) where Compute nodes
double up as {CephCluster} nodes. In either scenario, there are some {Ceph} processes that are deployed on {OpenStackShort} Controller nodes: {Ceph} monitors, Ceph Object Gateway (RGW), Rados Block Device (RBD), Ceph Metadata Server (MDS), Ceph Dashboard, and NFS Ganesha.
The Ceph Dashboard module adds web-based monitoring and administration to the
Ceph Manager.
With {OpenStackPreviousInstaller}-deployed {Ceph} this component is enabled as part of the overcloud deploy and it’s composed by:
With {OpenStackPreviousInstaller}-deployed {Ceph}, this component is enabled as
part of the overcloud deploy and it is composed of the following:

- Ceph Manager module
- Grafana
- Prometheus
- Alertmanager
- Node exporter

The Ceph Dashboard containers are included through `tripleo-container-image-prepare` parameters and the high availability relies on `Haproxy` and `Pacemaker` deployed on the {OpenStackShort} front.
For an external {CephCluster} cluster, high availability is not supported.
The goal of this procedure is to migrate and relocate the Ceph Monitoring
components to free Controller nodes.

For this procedure, we assume that we are beginning with a {OpenStackShort} based on {rhos_prev_ver} and a {Ceph} {CephRelease} deployment managed by {OpenStackPreviousInstaller}.
We assume that:

* {Ceph} has been upgraded to {CephRelease} and is managed by cephadm/orchestrator
* Both the {Ceph} public and cluster networks are propagated, through{OpenStackPreviousInstaller}, to the target nodes
The Ceph Dashboard containers are included through
`tripleo-container-image-prepare` parameters and the high availability relies
on `Haproxy` and `Pacemaker` deployed on the {OpenStackShort} front. For an
external {CephCluster} cluster, high availability is not supported. The goal of
this procedure is to migrate and relocate the Ceph Monitoring components to
free Controller nodes.

For this procedure, we assume that we are beginning with a {OpenStackShort}
based on {rhos_prev_ver} and a {Ceph} {CephRelease} deployment managed by
{OpenStackPreviousInstaller}. We assume that:

* {Ceph} has been upgraded to {CephRelease} and is managed by
cephadm/orchestrator
* Both the {Ceph} public and cluster networks are propagated,
through {OpenStackPreviousInstaller}, to the target nodes

include::../modules/proc_completing-prerequisites-for-migrating-ceph-monitoring-stack.adoc[leveloffset=+1]

Expand Down
41 changes: 34 additions & 7 deletions docs_user/assemblies/assembly_migrating-ceph-rbd.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,40 @@

= Migrating Red Hat Ceph Storage RBD to external RHEL nodes

For hyperconverged infrastructure (HCI) or dedicated Storage nodes that are running {Ceph} version 6 or later, you must migrate the daemons that are included in the {rhos_prev_long} control plane into the existing external Red Hat Enterprise Linux (RHEL) nodes. The external RHEL nodes typically include the Compute nodes for an HCI environment or dedicated storage nodes.
For Hyperconverged Infrastructure (HCI) or dedicated Storage nodes that are
running {Ceph} version 6 or later, you must migrate the daemons that are
included in the {rhos_prev_long} control plane into the existing external Red
Hat Enterprise Linux (RHEL) nodes. The external RHEL nodes typically include
the Compute nodes for an HCI environment or dedicated storage nodes.

To migrate Red Hat Ceph Storage Rados Block Device (RBD), your environment must meet the following requirements:
To migrate Red Hat Ceph Storage Rados Block Device (RBD), your environment must
meet the following requirements:

* {Ceph} is running version 6 or later and is managed by cephadm/orchestrator.
* NFS (ganesha) is migrated from a {OpenStackPreviousInstaller}-based deployment to cephadm. For more information, see xref:creating-a-ceph-nfs-cluster_migrating-databases[Creating a NFS Ganesha cluster].
* Both the {Ceph} public and cluster networks are propagated, with {OpenStackPreviousInstaller}, to the target nodes.
* Ceph Monitors need to keep their IPs to avoid cold migration.
* {Ceph} is running version 6 or later and is managed by cephadm.
* NFS Ganesha is migrated from a {OpenStackPreviousInstaller}-based
deployment to cephadm. For more information, see
xref:creating-a-ceph-nfs-cluster_migrating-databases[Creating a NFS Ganesha
cluster].
* Both the {Ceph} public and cluster networks are propagated, with
{OpenStackPreviousInstaller}, to the target nodes.
* Ceph MDS, Ceph Monitoring stack, Ceph MDS, Ceph RGW and other services are
migrated to the target nodes.
ifeval::["{build}" != "upstream"]
* The daemons distribution follows the cardinality constraints that are
described in link:https://access.redhat.com/articles/1548993[Red Hat Ceph
Storage: Supported configurations].
endif::[]
* The {Ceph} cluster is healthy, and the `ceph -s` command returns `HEALTH_OK`.

include::../modules/proc_migrating-mon-and-mgr-from-controller-nodes.adoc[leveloffset=+1]
During the procedure to migrate the Ceph Mon daemons, the following actions
occur:

* The mon IP addresses are moved to the target {Ceph} nodes.
* The existing Controller nodes are drained and decommisioned.
* Additional monitors are deployed to the target nodes, and they are promoted
as `_admin` nodes that can be used to manage the {CephCluster} cluster and
perform day 2 operations.

include::../modules/proc_migrating-mgr-from-controller-nodes.adoc[leveloffset=+1]

include::../modules/proc_migrating-mon-from-controller-nodes.adoc[leveloffset=+1]
2 changes: 0 additions & 2 deletions docs_user/assemblies/assembly_migrating-ceph-rgw.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,6 @@ To migrate Ceph Object Gateway (RGW), your environment must meet the following r
* {Ceph} is running version 6 or later and is managed by cephadm/orchestrator.
* An undercloud is still available, and the nodes and networks are managed by {OpenStackPreviousInstaller}.

include::../modules/con_ceph-daemon-cardinality.adoc[leveloffset=+1]

include::../modules/proc_completing-prerequisites-for-migrating-ceph-rgw.adoc[leveloffset=+1]

include::../modules/proc_migrating-the-rgw-backends.adoc[leveloffset=+1]
Expand Down
15 changes: 0 additions & 15 deletions docs_user/assemblies/ceph_migration.adoc

This file was deleted.

10 changes: 2 additions & 8 deletions docs_user/main.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -22,12 +22,6 @@ include::assemblies/assembly_adopting-openstack-control-plane-services.adoc[leve

include::assemblies/assembly_adopting-the-data-plane.adoc[leveloffset=+1]

include::assemblies/assembly_migrating-ceph-rbd.adoc[leveloffset=+1]

include::assemblies/assembly_migrating-ceph-rgw.adoc[leveloffset=+1]

include::modules/proc_migrating-ceph-mds.adoc[leveloffset=+1]

include::assemblies/assembly_migrating-ceph-monitoring-stack.adoc[leveloffset=+1]

include::assemblies/assembly_migrating-the-object-storage-service.adoc[leveloffset=+1]

include::assemblies/assembly_migrating-ceph-cluster.adoc[leveloffset=+1]
26 changes: 14 additions & 12 deletions docs_user/modules/con_ceph-daemon-cardinality.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,19 @@

= {Ceph} daemon cardinality

{Ceph} 6 and later applies strict constraints in the way daemons can be colocated within the same node.
{Ceph} 6 and later applies strict constraints in the way daemons can be
colocated within the same node.
ifeval::["{build}" != "upstream"]
For more information, see link:https://access.redhat.com/articles/1548993[Red Hat Ceph Storage: Supported configurations].
endif::[]
The resulting topology depends on the available hardware, as well as the amount of {Ceph} services present in the Controller nodes which are going to be retired.
ifeval::["{build}" != "upstream"]
For more information about the procedure that is required to migrate the RGW component and keep an HA model using the Ceph ingress daemon, see link:{defaultCephURL}/object_gateway_guide/index#high-availability-for-the-ceph-object-gateway[High availability for the Ceph Object Gateway] in _Object Gateway Guide_.
endif::[]
ifeval::["{build}" != "downstream"]
The following document describes the procedure required to migrate the RGW component (and keep an HA model using the https://docs.ceph.com/en/latest/cephadm/services/rgw/#high-availability-service-for-rgw[Ceph Ingress daemon] in a common {OpenStackPreviousInstaller} scenario where Controller nodes represent the
https://github.com/openstack/tripleo-ansible/blob/master/tripleo_ansible/roles/tripleo_cephadm/tasks/rgw.yaml#L26-L30[spec placement] where the service is deployed.
endif::[]
As a general rule, the number of services that can be migrated depends on the number of available nodes in the cluster. The following diagrams cover the distribution of the {Ceph} daemons on the {Ceph} nodes where at least three nodes are required in a scenario that sees only RGW and RBD, without the {dashboard_first_ref}:
The resulting topology depends on the available hardware, as well as the amount
of {Ceph} services present in the Controller nodes that are going to be
retired.
As a general rule, the number of services that can be migrated depends on the
number of available nodes in the cluster. The following diagrams cover the
distribution of the {Ceph} daemons on the {Ceph} nodes where at least three
nodes are required in a scenario that includes only RGW and RBD, without the
{Ceph} Dashboard:

----
| | | |
Expand All @@ -24,7 +24,8 @@ As a general rule, the number of services that can be migrated depends on the nu
| osd | mon/mgr/crash | rgw/ingress |
----

With the {dashboard}, and without {rhos_component_storage_file_first_ref} at least four nodes are required. The {dashboard} has no failover:
With the {dashboard}, and without {rhos_component_storage_file_first_ref}, at
least 4 nodes are required. The {Ceph} dashboard has no failover:

----
| | | |
Expand All @@ -35,7 +36,8 @@ With the {dashboard}, and without {rhos_component_storage_file_first_ref} at lea
| osd | rgw/ingress | (free) |
----

With the {dashboard} and the {rhos_component_storage_file}, 5 nodes minimum are required, and the {dashboard} has no failover:
With the {Ceph} dashboard and the {rhos_component_storage_file}, 5 nodes
minimum are required, and the {Ceph} dashboard has no failover:

----
| | | |
Expand Down
95 changes: 95 additions & 0 deletions docs_user/modules/proc_migrating-mgr-from-controller-nodes.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
= Migrating Ceph Manager daemons to {Ceph} nodes

The following section describes how to move Ceph Manager daemons from the
{rhos_prev_long} Controller nodes to a set of target nodes. Target nodes might
be pre-existing {Ceph} nodes, or {OpenStackShort} Compute nodes if {Ceph} is
deployed by {OpenStackPreviousInstaller} with an HCI topology.
This procedure assumes that Cephadm and the {Ceph} Orchestrator are the tools
that drive the Ceph Manager migration. As is done with the other Ceph daemons
(MDS, Monitoring, and RGW), the procedure uses the Ceph spec to modify the
placement and reschedule the daemons. Ceph Manager is run in an active/passive
fashion, and it also provides many modules, including the Ceph orchestrator.


.Prerequisites

* Configure the target nodes (CephStorage or ComputeHCI) to have both `storage`
and `storage_mgmt` networks to ensure that you can use both {Ceph} public and
cluster networks from the same node. This step requires you to interact with
{OpenStackPreviousInstaller}. From {rhos_prev_long} {rhos_prev_ver} and later
you do not have to run a stack update.
.Procedure

. Ssh into the target node and enable the firewall rules that are required to
reach a Manager service:
+
----
dports="6800:7300"
ssh heat-admin@<target_node> sudo iptables -I INPUT \
-p tcp --match multiport --dports $dports -j ACCEPT;
----
+
Repeat this step for each `<target_node>`.

. Check that the rules are properly applied and persist them:
+
----
$ sudo iptables-save
$ sudo systemctl restart iptables
----
+
. Prepare the target node to host the new Ceph Manager daemon, and add the `mgr`
label to the target node:
+
----
ceph orch host label add <target_node> mgr; done
----
+
* Replace `<target_node>` with the hostname of the hosts listed in the {Ceph}
through the `ceph orch host ls` command
+
Repeat the actions described above for each `<target_node> that will host a
Ceph Manager daemon.
. Get the Ceph Manager spec:
+
[source,yaml]
----
sudo cephadm shell -- ceph orch ls --export mgr > mgr.yaml
----

. Edit the retrieved spec and add the `label: mgr` section to the `placement`
section:
+
[source,yaml]
----
service_type: mgr
service_id: mgr
placement:
label: mgr
----

. Save the spec in the `/tmp/mgr.yaml` file.
. Apply the spec with cephadm by using the orchestrator:
+
----
sudo cephadm shell -m /tmp/mgr.yaml -- ceph orch apply -i /mnt/mgr.yaml
----
+
As a result of this procedure, you see a Ceph Manager daemon count that matches
the number of hosts where the `mgr` label is added.

. Verify that the new Ceph Manager are created in the target nodes:
+
----
ceph orch ps | grep -i mgr
ceph -s
----
+
[NOTE]
The procedure does not shrink the Ceph Manager daemons. The count is grown by
the number of target nodes, and migrating Ceph Monitor daemons to {Ceph} nodes
decommissions the stand-by Ceph Manager instances. For more information, see
xref:migrating-mon-from-controller-nodes_migrating-ceph-rbd[Migrating Ceph Monitor
daemons to {Ceph} nodes].
Loading

0 comments on commit bb5aef5

Please sign in to comment.