Skip to content

Commit

Permalink
Reorganize Ceph assemblies
Browse files Browse the repository at this point in the history
Ceph assemblies are now better reorganized to follow a simple rule/struct.
A main ceph-cluster migration assembly is included in main, and it contains
a quick intro and the (ordered) list of procedures (including the cardinality
section that is critical here and will be improved in a follow up patch).
This way the ceph doc is very easy to access and maintain. There are also
fixes to wrong references (e.g. horizon != Ceph dashboard).

Signed-off-by: Francesco Pantano <[email protected]>
  • Loading branch information
fmount committed Jun 5, 2024
1 parent 48f764f commit 62babe2
Show file tree
Hide file tree
Showing 10 changed files with 252 additions and 224 deletions.
44 changes: 44 additions & 0 deletions docs_user/assemblies/assembly_migrating-ceph-cluster.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
ifdef::context[:parent-context: {context}]

[id="ceph-migration_{context}"]

= Migrating the {CephCluster} Cluster

:context: migrating-ceph

:toc: left
:toclevels: 3

ifdef::parent-context[:context: {parent-context}]
ifndef::parent-context[:!context:]

In the context of data plane adoption, where the {rhos_prev_long}
({OpenStackShort}) services are redeployed in {OpenShift}, you migrate a
{OpenStackPreviousInstaller}-deployed {CephCluster} cluster by using a process
called “externalizing” the {CephCluster} cluster.

There are two deployment topologies that include an internal {CephCluster}
cluster:

* {OpenStackShort} includes dedicated {CephCluster} nodes to host object
storage daemons (OSDs)

* Hyperconverged Infrastructure (HCI), where Compute and Storage services are
colocated on hyperconverged nodes

In either scenario, there are some {Ceph} processes that are deployed on
{OpenStackShort} Controller nodes: {Ceph} monitors, Ceph Object Gateway (RGW),
Rados Block Device (RBD), Ceph Metadata Server (MDS), Ceph Dashboard, and NFS
Ganesha. To migrate your {CephCluster} cluster, you must decommission the
Controller nodes and move the {Ceph} daemons to a set of target nodes that are
already part of the {CephCluster} cluster.

include::../modules/con_ceph-daemon-cardinality.adoc[leveloffset=+1]

include::assembly_migrating-ceph-monitoring-stack.adoc[leveloffset=+1]

include::../modules/proc_migrating-ceph-mds.adoc[leveloffset=+1]

include::assembly_migrating-ceph-rgw.adoc[leveloffset=+1]

include::assembly_migrating-ceph-rbd.adoc[leveloffset=+1]
32 changes: 17 additions & 15 deletions docs_user/assemblies/assembly_migrating-ceph-monitoring-stack.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -4,30 +4,32 @@

= Migrating the monitoring stack component to new nodes within an existing {Ceph} cluster

In the context of data plane adoption, where the {rhos_prev_long} ({OpenStackShort}) services are
redeployed in {OpenShift}, a {OpenStackPreviousInstaller}-deployed {CephCluster} cluster will undergo a migration in a process we are calling “externalizing” the {CephCluster} cluster.
There are two deployment topologies, broadly, that include an “internal” {CephCluster} cluster today: one is where {OpenStackShort} includes dedicated {CephCluster} nodes to host object storage daemons (OSDs), and the other is Hyperconverged Infrastructure (HCI) where Compute nodes
double up as {CephCluster} nodes. In either scenario, there are some {Ceph} processes that are deployed on {OpenStackShort} Controller nodes: {Ceph} monitors, Ceph Object Gateway (RGW), Rados Block Device (RBD), Ceph Metadata Server (MDS), Ceph Dashboard, and NFS Ganesha.
The Ceph Dashboard module adds web-based monitoring and administration to the
Ceph Manager.
With {OpenStackPreviousInstaller}-deployed {Ceph} this component is enabled as part of the overcloud deploy and it’s composed by:
With {OpenStackPreviousInstaller}-deployed {Ceph}, this component is enabled as
part of the overcloud deploy and it is composed of the following:

- Ceph Manager module
- Grafana
- Prometheus
- Alertmanager
- Node exporter

The Ceph Dashboard containers are included through `tripleo-container-image-prepare` parameters and the high availability relies on `Haproxy` and `Pacemaker` deployed on the {OpenStackShort} front.
For an external {CephCluster} cluster, high availability is not supported.
The goal of this procedure is to migrate and relocate the Ceph Monitoring
components to free Controller nodes.

For this procedure, we assume that we are beginning with a {OpenStackShort} based on {rhos_prev_ver} and a {Ceph} {CephRelease} deployment managed by {OpenStackPreviousInstaller}.
We assume that:

* {Ceph} has been upgraded to {CephRelease} and is managed by cephadm/orchestrator
* Both the {Ceph} public and cluster networks are propagated, through{OpenStackPreviousInstaller}, to the target nodes
The Ceph Dashboard containers are included through
`tripleo-container-image-prepare` parameters and the high availability relies
on `Haproxy` and `Pacemaker` deployed on the {OpenStackShort} front. For an
external {CephCluster} cluster, high availability is not supported. The goal of
this procedure is to migrate and relocate the Ceph Monitoring components to
free Controller nodes.

For this procedure, we assume that we are beginning with a {OpenStackShort}
based on {rhos_prev_ver} and a {Ceph} {CephRelease} deployment managed by
{OpenStackPreviousInstaller}. We assume that:

* {Ceph} has been upgraded to {CephRelease} and is managed by
cephadm/orchestrator
* Both the {Ceph} public and cluster networks are propagated,
through {OpenStackPreviousInstaller}, to the target nodes

include::../modules/proc_completing-prerequisites-for-migrating-ceph-monitoring-stack.adoc[leveloffset=+1]

Expand Down
41 changes: 29 additions & 12 deletions docs_user/assemblies/assembly_migrating-ceph-rbd.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -4,22 +4,39 @@

= Migrating Red Hat Ceph Storage RBD to external RHEL nodes

For hyperconverged infrastructure (HCI) or dedicated Storage nodes that are running {Ceph} version 6 or later, you must migrate the daemons that are included in the {rhos_prev_long} control plane into the existing external Red Hat Enterprise Linux (RHEL) nodes. The external RHEL nodes typically include the Compute nodes for an HCI environment or dedicated storage nodes.
For Hyperconverged Infrastructure (HCI) or dedicated Storage nodes that are
running {Ceph} version 6 or later, you must migrate the daemons that are
included in the {rhos_prev_long} control plane into the existing external Red
Hat Enterprise Linux (RHEL) nodes. The external RHEL nodes typically include
the Compute nodes for an HCI environment or dedicated storage nodes.

To migrate Red Hat Ceph Storage Rados Block Device (RBD), your environment must meet the following requirements:
To migrate Red Hat Ceph Storage Rados Block Device (RBD), your environment must
meet the following requirements:

* {Ceph} is running version 6 or later and is managed by cephadm/orchestrator.
* NFS (ganesha) is migrated from a {OpenStackPreviousInstaller}-based deployment to cephadm. For more information, see xref:creating-a-ceph-nfs-cluster_migrating-databases[Creating a NFS Ganesha cluster].
* Both the {Ceph} public and cluster networks are propagated, with {OpenStackPreviousInstaller}, to the target nodes.
* Ceph MDS, Ceph Monitoring stack, Ceph MDS, Ceph RGW and other services have been migrated already to the target nodes;
* {Ceph} is running version 6 or later and is managed by cephadm.
* NFS Ganesha is migrated from a {OpenStackPreviousInstaller}-based
deployment to cephadm. For more information, see
xref:creating-a-ceph-nfs-cluster_migrating-databases[Creating a NFS Ganesha
cluster].
* Both the {Ceph} public and cluster networks are propagated, with
{OpenStackPreviousInstaller}, to the target nodes.
* Ceph MDS, Ceph Monitoring stack, Ceph MDS, Ceph RGW and other services are
migrated to the target nodes.
ifeval::["{build}" != "upstream"]
* The daemons distribution follows the cardinality constraints described in the doc link:https://access.redhat.com/articles/1548993[Red Hat Ceph Storage: Supported configurations]
* The daemons distribution follows the cardinality constraints that are
described in link:https://access.redhat.com/articles/1548993[Red Hat Ceph
Storage: Supported configurations].
endif::[]
* The Ceph cluster is healthy, and the `ceph -s` command returns `HEALTH_OK`
* The procedure keeps the mon IP addresses by moving them to the {Ceph} nodes
* Drain the existing Controller nodes
* Deploy additional monitors to the existing nodes, and promote them as
_admin nodes that administrators can use to manage the {CephCluster} cluster and perform day 2 operations against it.
* The {Ceph} cluster is healthy, and the `ceph -s` command returns `HEALTH_OK`.

During the procedure to migrate the Ceph Mon daemons, the following actions
occur:

* The mon IP addresses are moved to the target {Ceph} nodes.
* The existing Controller nodes are drained and decommisioned.
* Additional monitors are deployed to the target nodes, and they are promoted
as `_admin` nodes that can be used to manage the {CephCluster} cluster and
perform day 2 operations.

include::../modules/proc_migrating-mgr-from-controller-nodes.adoc[leveloffset=+1]

Expand Down
2 changes: 0 additions & 2 deletions docs_user/assemblies/assembly_migrating-ceph-rgw.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,6 @@ To migrate Ceph Object Gateway (RGW), your environment must meet the following r
* {Ceph} is running version 6 or later and is managed by cephadm/orchestrator.
* An undercloud is still available, and the nodes and networks are managed by {OpenStackPreviousInstaller}.

include::../modules/con_ceph-daemon-cardinality.adoc[leveloffset=+1]

include::../modules/proc_completing-prerequisites-for-migrating-ceph-rgw.adoc[leveloffset=+1]

include::../modules/proc_migrating-the-rgw-backends.adoc[leveloffset=+1]
Expand Down
15 changes: 0 additions & 15 deletions docs_user/assemblies/ceph_migration.adoc

This file was deleted.

8 changes: 1 addition & 7 deletions docs_user/main.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,4 @@ include::assemblies/assembly_adopting-the-data-plane.adoc[leveloffset=+1]

include::assemblies/assembly_migrating-the-object-storage-service.adoc[leveloffset=+1]

include::assemblies/assembly_migrating-ceph-monitoring-stack.adoc[leveloffset=+1]

include::modules/proc_migrating-ceph-mds.adoc[leveloffset=+1]

include::assemblies/assembly_migrating-ceph-rgw.adoc[leveloffset=+1]

include::assemblies/assembly_migrating-ceph-rbd.adoc[leveloffset=+1]
include::assemblies/assembly_migrating-ceph-cluster.adoc[leveloffset=+1]
26 changes: 14 additions & 12 deletions docs_user/modules/con_ceph-daemon-cardinality.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,19 @@

= {Ceph} daemon cardinality

{Ceph} 6 and later applies strict constraints in the way daemons can be colocated within the same node.
{Ceph} 6 and later applies strict constraints in the way daemons can be
colocated within the same node.
ifeval::["{build}" != "upstream"]
For more information, see link:https://access.redhat.com/articles/1548993[Red Hat Ceph Storage: Supported configurations].
endif::[]
The resulting topology depends on the available hardware, as well as the amount of {Ceph} services present in the Controller nodes which are going to be retired.
ifeval::["{build}" != "upstream"]
For more information about the procedure that is required to migrate the RGW component and keep an HA model using the Ceph ingress daemon, see link:{defaultCephURL}/object_gateway_guide/index#high-availability-for-the-ceph-object-gateway[High availability for the Ceph Object Gateway] in _Object Gateway Guide_.
endif::[]
ifeval::["{build}" != "downstream"]
The following document describes the procedure required to migrate the RGW component (and keep an HA model using the https://docs.ceph.com/en/latest/cephadm/services/rgw/#high-availability-service-for-rgw[Ceph Ingress daemon] in a common {OpenStackPreviousInstaller} scenario where Controller nodes represent the
https://github.com/openstack/tripleo-ansible/blob/master/tripleo_ansible/roles/tripleo_cephadm/tasks/rgw.yaml#L26-L30[spec placement] where the service is deployed.
endif::[]
As a general rule, the number of services that can be migrated depends on the number of available nodes in the cluster. The following diagrams cover the distribution of the {Ceph} daemons on the {Ceph} nodes where at least three nodes are required in a scenario that sees only RGW and RBD, without the {dashboard_first_ref}:
The resulting topology depends on the available hardware, as well as the amount
of {Ceph} services present in the Controller nodes that are going to be
retired.
As a general rule, the number of services that can be migrated depends on the
number of available nodes in the cluster. The following diagrams cover the
distribution of the {Ceph} daemons on the {Ceph} nodes where at least three
nodes are required in a scenario that includes only RGW and RBD, without the
{Ceph} Dashboard:

----
| | | |
Expand All @@ -24,7 +24,8 @@ As a general rule, the number of services that can be migrated depends on the nu
| osd | mon/mgr/crash | rgw/ingress |
----

With the {dashboard}, and without {rhos_component_storage_file_first_ref} at least four nodes are required. The {dashboard} has no failover:
With the {dashboard}, and without {rhos_component_storage_file_first_ref}, at
least 4 nodes are required. The {Ceph} dashboard has no failover:

----
| | | |
Expand All @@ -35,7 +36,8 @@ With the {dashboard}, and without {rhos_component_storage_file_first_ref} at lea
| osd | rgw/ingress | (free) |
----

With the {dashboard} and the {rhos_component_storage_file}, 5 nodes minimum are required, and the {dashboard} has no failover:
With the {Ceph} dashboard and the {rhos_component_storage_file}, 5 nodes
minimum are required, and the {Ceph} dashboard has no failover:

----
| | | |
Expand Down
85 changes: 40 additions & 45 deletions docs_user/modules/proc_migrating-mgr-from-controller-nodes.adoc
Original file line number Diff line number Diff line change
@@ -1,73 +1,66 @@
[id="migrating-mgr-from-controller-nodes_{context}"]
= Migrating Ceph Manager daemons to {Ceph} nodes

= Migrating Ceph Mgr daemons to {Ceph} nodes
The following section describes how to move Ceph Manager daemons from the
{rhos_prev_long} Controller nodes to a set of target nodes. Target nodes might
be pre-existing {Ceph} nodes, or {OpenStackShort} Compute nodes if {Ceph} is
deployed by {OpenStackPreviousInstaller} with an HCI topology.
This procedure assumes that Cephadm and the {Ceph} Orchestrator are the tools
that drive the Ceph Manager migration. As is done with the other Ceph daemons
(MDS, Monitoring, and RGW), the procedure uses the Ceph spec to modify the
placement and reschedule the daemons. Ceph Manager is run in an active/passive
fashion, and it also provides many modules, including the Ceph orchestrator.

The following section describes how to move Ceph Mgr daemons from the
OpenStack controller nodes to a set of target nodes. Target nodes might be
pre-existing {Ceph} nodes, or OpenStack Compute nodes if Ceph is deployed by
{OpenStackPreviousInstaller} with an HCI topology.

.Prerequisites

Configure the target nodes (CephStorage or ComputeHCI) to have both `storage`
* Configure the target nodes (CephStorage or ComputeHCI) to have both `storage`
and `storage_mgmt` networks to ensure that you can use both {Ceph} public and
cluster networks from the same node. This step requires you to interact with
{OpenStackPreviousInstaller}. From {rhos_prev_long} {rhos_prev_ver} and later
you do not have to run a stack update.
.Procedure

This procedure assumes that cephadm and the orchestrator are the tools that
drive the Ceph Mgr migration. As done with the other Ceph daemons (MDS,
Monitoring and RGW), the procedure uses the Ceph spec to modify the placement
and reschedule the daemons. Ceph Mgr is run in an active/passive fashion, and
it's also responsible to provide many modules, including the orchestrator.

. Before start the migration, ssh into the target node and enable the firewall
rules required to reach a Mgr service.
[source,bash]
. Ssh into the target node and enable the firewall rules that are required to
reach a Manager service:
+
----
dports="6800:7300"
ssh heat-admin@<target_node> sudo iptables -I INPUT \
-p tcp --match multiport --dports $dports -j ACCEPT;
----
+
Repeat this step for each `<target_node>`.

[NOTE]
Repeat the previous action for each target_node.

. Check the rules are properly applied and persist them:
. Check that the rules are properly applied and persist them:
+
[source,bash]
----
sudo iptables-save
sudo systemctl restart iptables
$ sudo iptables-save
$ sudo systemctl restart iptables
----

. Prepare the target node to host the new Ceph Mgr daemon, and add the `mgr`
+
. Prepare the target node to host the new Ceph Manager daemon, and add the `mgr`
label to the target node:
+
[source,bash]
----
ceph orch host label add <target_node> mgr; done
----
+
* Replace `<target_node>` with the hostname of the hosts listed in the {Ceph}
through the `ceph orch host ls` command
+
Repeat the actions described above for each `<target_node> that will host a
Ceph Manager daemon.
- Replace <target_node> with the hostname of the hosts listed in the {Ceph}
through the `ceph orch host ls` command.

Repeat this action for each node that will be host a Ceph Mgr daemon.

Get the Ceph Mgr spec and update the `placement` section to use `label` as the
main scheduling strategy.

. Get the Ceph Mgr spec:
. Get the Ceph Manager spec:
+
[source,yaml]
----
sudo cephadm shell -- ceph orch ls --export mgr > mgr.yaml
----

.Edit the retrieved spec and add the `label: mgr` section:
. Edit the retrieved spec and add the `label: mgr` section to the `placement`
section:
+
[source,yaml]
----
Expand All @@ -77,24 +70,26 @@ placement:
label: mgr
----

. Save the spec in `/tmp/mgr.yaml`
. Apply the spec with cephadm using the orchestrator:
. Save the spec in the `/tmp/mgr.yaml` file.
. Apply the spec with cephadm by using the orchestrator:
+
----
sudo cephadm shell -m /tmp/mgr.yaml -- ceph orch apply -i /mnt/mgr.yaml
----
+
As a result of this procedure, you see a Ceph Manager daemon count that matches
the number of hosts where the `mgr` label is added.

According to the numner of nodes where the `mgr` label is added, you will see a
Ceph Mgr daemon count that matches the number of hosts.

. Verify new Ceph Mgr have been created in the target_nodes:
. Verify that the new Ceph Manager are created in the target nodes:
+
----
ceph orch ps | grep -i mgr
ceph -s
----
+
[NOTE]
The procedure does not shrink the Ceph Mgr daemons: the count is grown by the
number of target nodes, and the xref:migrating-mon-from-controller-nodes[Ceph Mon migration procedure]
will decommission the stand-by Ceph Mgr instances.
The procedure does not shrink the Ceph Manager daemons. The count is grown by
the number of target nodes, and migrating Ceph Monitor daemons to {Ceph} nodes
decommissions the stand-by Ceph Manager instances. For more information, see
xref:migrating-mon-from-controller-nodes_migrating-ceph-rbd[Migrating Ceph Monitor
daemons to {Ceph} nodes].
Loading

0 comments on commit 62babe2

Please sign in to comment.