From d3494de7057bbbb16d68f3e033ec0dc2b0cdb121 Mon Sep 17 00:00:00 2001 From: Balazs Gibizer Date: Tue, 13 Jun 2023 06:28:20 +0200 Subject: [PATCH 1/2] [spec] Propose EDPM OpenStack service architecture --- spec/edpm-openstack-service-architecture.md | 351 ++++++++++++++++++++ 1 file changed, 351 insertions(+) create mode 100644 spec/edpm-openstack-service-architecture.md diff --git a/spec/edpm-openstack-service-architecture.md b/spec/edpm-openstack-service-architecture.md new file mode 100644 index 0000000..d16d554 --- /dev/null +++ b/spec/edpm-openstack-service-architecture.md @@ -0,0 +1,351 @@ +# OpenStack service deployment architecture on EDPM Node + +## Problem description + +Various OpenStack components have agents that need to run on the External +DataPlane Node. For example OpenStack Nova's nova-api, nova-conductor +and nova-scheduler services can run in kubernetes pods independently from the +hypervisors they manage. In the other hand the nova-compute service is designed +to be collocated with the hypervisor it manages in case the libvirt driver is +in use. In the current design the EDPM Node serves as a hypervisor for Nova, +therefore the nova-compute service needs to be deployed on EDPM nodes. + +Similar examples from other OpenStack components are ovn-metadata-agent, +neutron-nicswitch agent, and in certain deployment scenarios the cinder-volume +service. + +### The deployment dependency + +The deployment of an EDPM node needs to consider multiple phases + +* **phase -1: Bare metal provisioning**. This step transforms an empty bare + metal machine to a running server with a host OS. +* **phase 0: Infrastructure deployment and configuration** of storage, + networking, and computing resources. This step transforms a generic running + server to a more specific storage node, networking node, computing + hypervisor, or a mix of these. +* **phase 1: OpenStack service deployment**. This transforms the server to act + as an OpenStack storage node, networking node, or compute node (or a mix of + them) by deploying and configuring the specific OpenStack services that + connects the server to the OpenStack control plane. + +Both phase -1 and 0 can be performed independently from the rest of the +OpenStack control plane deployment. However phase 1 needs to be coordinated +with the podified infrastructure, and OpenStack service deployment. For example +to generate the configuration of the nova-compute service on an EDPM node the +deployment engine needs to know how to connect the nova-compute service to the +podified infrastructure e.g. to which nova cell and therefore via which message +bus. Handling this dependency is one of the scope of this specification. + +### The OpenStack service configuration interface + +Independently from the fact that an OpenStack service is running in a +kubernetes pod or in a container on the EDPM Node the service has a well +defined and widely known configuration interface implemented by oslo.config +configuration options in one or more *.ini like* files. + +The configuration options can be divided to two major parts: +* configuration that is generated by the deployment engine. E.g. database + hostnames and message bus transport URLs +* configuration that is provided by the human deployer. E.g. CPU allocation + ratio, PCI device spec and alias + +The existing podified control plane API, the OpenStackControlPlane CRD and the +service sub CRDs, provides a simple way for the human deployer to pass +configuration parameters to the OpenStack services. Each CRD that represents +an OpenStack service type provides a `CustomServiceConfig` field that is a +plain text field to carrying oslo.config formatted configuration. Then the +deployment engine makes sure that this configuration snippet is provided +to the deployed OpenStack service binary in a way that the config options +defined in it has higher priority than the automatically generated, default +service configuration. + +On the podified control plane side kubernetes APIs are used to generate, store, +and transfer the OpenStack service configuration, and to deploy, configure, and +start the OpenStack service binary. While parts of this is visible to the human +deployer. E.g. kubernetes Pods, Deployments, ConfigMaps, VolumeMounts. The +human deployer only needs to understand the `CustomServiceConfig` text fields +to be able to pass through the desired configuration to an OpenStack service. + +As OpenStack services are running also on the EDPM Nodes there is also a need +to pass through human defined configuration to these services as well. + +On the EDPM side Ansible roles are used to provision and configure services. +Ansible has its own configuration passing mechanism, Ansible variables. It is +technically possible to pass through oslo.config options as Ansible variables +to configure the OpenStack services. + +However it means that the same oslo.config configuration interface of the +OpenStack service are represented in two different formats. On the podified +control plane side it is an oslo.config snippet in the `CustomServiceConfig` +field. While on the EDPM side Ansible variables. The former does not involve +any translation. The naming and structure of the config options are exactly the +same in the `CustomServiceConfig` field as in the actual oslo.config file of +the OpenStack binary. In the other hand the latter applies a name mangling to +avoid name clashes between different oslo.config sections and between different +services using the same Ansible inventory to pass through variables. Moreover +the latter requires explicit work in the Ansible roles every time a new config +variable is added. + +For example, enabling debug logging for the OVN metadata agent on the EDPM node +requires the following oslo.config file to be provided to the service: +```ini + [DEFAULT] + debug = True +``` + +With the use of the `CustomServiceConfig` the following needs to be added to +the CR representing the given OVN metadata agent: +```yaml +customServiceConfig: | + [DEFAULT] + debug = True +``` + +With ansible variables the following needs to be added to the CR that triggers +deployment of the given OVN metadata agent: +```yaml +ansibleVars: | + edpm_ovn_metadata_agent_metadata_agent_DEFAULT_debug: true +``` + +## Proposed change + +Keep the existing DataPlane, DataPlaneRole, and DataPlaneNode CRDs to represent +and coordinate phase -1 and phase 0 of the EDPM deployment. But decouple +phase 1 from the existing DataPlane model. This means that the DataPlane CRDs +will not represent and therefore won't directly deploy any OpenStack services +on the EDPM node. These CRDs still expose the status of the phase -1 and +phase 0 deployment, so other operators can decide when to start deploying +OpenStack service to the EDPM node. + +Define CRDs representing each OpenStack service type that needs to be run on +the EDPM node (e.g. NovaExternalCompute, OVNMetadataAgent, etc). These CRDs +are defined in and therefore reconciled by the respective service operator +(e.g. nova-operator, neutron-operator). These OpenStack service CRs will be +created automatically based on the existence of DataPlaneRole and Node CRs and +the user specified annotations on these CRs. The service operator +will ensure that both the podified control plane dependencies +(e.g. message bus) are deployed and the EDPM node phase 0 is finished before +generates the service configuration and creates AnsibleEE CRs to deploy +OpenStack service on the EDPM node. + +The EDPM related OpenStack service CRDs will have `CustomServiceConfig` fields +that will have the same syntax and semantic as the `CustomServiceConfig` fields +in the existing podified Openstack service CRDs, while the implementation of +such configuration will use Ansible to transfer the generated configuration to +the service running on the EDPM node. + +Note that this spec proposes to aligning `CustomServiceConfig` field for +OpenStack service CRDs regardless where that modelled OpenStack service runs +(k8s pod or EDPM podman container). But at the same time this spec does not +propose that these CRDs are aligned in any other ways. E.g. this spec does not +propose that to other fields like `NodeSelector` that are meaningful and +exposed on OpenStack service CRDs running in k8s pods should be defined for +OpenStack service CRDs that are running on the EDPM nodes. + +This implies that and OpenStack service that is deployed in both pods and +EDPM containers cannot be represented with the same CRD. For example the +nova-compute service with libvirt driver will always run on EDPM nodes, while +nova-compute with the ironic driver will run in k8s pods. Therefore these two +scenarios need to be represented by two CRDs, `NovaExternalCompute` and +`NovaCompute`. The `NovaCompute` CRD will have `NodeSelector` field while the +`NovaExternalCompute` will not have that field. But both CRDs will have +`CustomServiceConfig` field. + +**Benefits**: + +* The DataPlane CRDs and therefore the dataplane-operator implementation kept + OpenStack service agnostic. Adding support for deploying a new OpenStack + or partner service on the EDPM node can be done without changing the + DataPlane CRDs or the dataplane-operator implementation. + +* The service operators encapsulates all the service specific logic. Every + deployment, configuration or upgrade concerns for an OpenStack component can + be handled within a single, cohesive operator implementation. + +* The `CustomServiceConfig` field in the service CRDs simplifies and unifies + the OpenStack service configuration for the human deployer regardless where + the service is being deployed (i.e. podified or EDPM). + +**Drawbacks**: + +* The mechanism to watch DataPlane CRs from service operators and auto create + service CRs adds implementation complexity and introduces possible + performance impacts that could be avoided if dataplane-operator creates the + service CRs. + +* The Ready condition of the DataPlaneRole and the DataPlaneNode can only + represent the phase -1 and phase 0 readiness of the EDPM node. The phase 1 + readiness can only be collected by checking the Ready condition of every + service CRs related to this EDPM node. + + +### Example + +NovaCell/nova-cell1 +```yaml +apiVersion: nova.openstack.org/v1beta1 +kind: NovaCell +metadata: + name: nova-cell1 +... +``` +No changes in NovaCell. + +OpenstackDataPlaneRole/edpm-compute +```yaml +apiVersion: dataplane.openstack.org/v1beta1 +kind: OpenStackDataPlaneRole +metadata: + name: edpm-compute + annotations: + nova.openstack.org/cell: nova-cell1 +spec: + dataPlane: openstack-edpm + nodeTemplate: + ansiblePort: 22 +... +``` +The `novaTemplate` is removed from the `nodeTemplate` and a new user specified +annotation `nova.openstack.org/cell` is added so the human deployer can +define that this Role describe a set of EDPM nodes that are compute nodes +connected to cell1. + +OpenstackDataPlaneNode/edpm-compute-0 +```yaml +apiVersion: dataplane.openstack.org/v1beta1 +kind: OpenStackDataPlaneNode +metadata: + name: edpm-compute-0 +spec: + ansibleHost: 192.168.122.100 + hostName: edpm-compute-0 + node: + ansibleSSHPrivateKeySecret: dataplane-ansible-ssh-private-key-secret + role: edpm-compute +... +``` +The `novaTemplate` is removed from the `node`. As this DataPlaneNode refers +to a DataPlaneRole that has a `nova.openstack.org/cell` annotation, this +DataPlaneNode is a compute node connected to cell1. + +OpenstackDataPlaneNode/edpm-compute-1 +```yaml +apiVersion: dataplane.openstack.org/v1beta1 +kind: OpenStackDataPlaneNode +metadata: + name: edpm-compute-1 + annotations: + nova.openstack.org/cell: nova-cell2 +spec: + ansibleHost: 192.168.122.101 + hostName: edpm-compute-1 + node: + ansibleSSHPrivateKeySecret: dataplane-ansible-ssh-private-key-secret + role: edpm-compute +... +``` +This DataPlaneNode has an `nova.openstack.org/cell` annotation that overrides +the cell annotation of its role. So this EDPM node is a compute node connected +to cell2. + +NovaExternalCompute=edpm-compute-0 +```yaml +apiVersion: nova.openstack.org/v1beta1 +kind: NovaExternalCompute +metadata: + name: edpm-compute-0 +spec: + dataplaneNodeName: edpm-compute-0 + cellName: nova-cell1 + customServiceConfig: | + # human deployer can set this to pass through service specific config + sshKeySecretName: dataplane-ansible-ssh-private-key-secret +... +``` +This CR is auto created by the nova-operator based on the corresponding +DataPlaneNode CR and the cell annotation set on it. The spec fields, except +`CustomServiceConfig` is populated by the nova-operator. The +`CustomServiceConfig` field is never set by the nova-operator it can be used +by the human deployer. + +The following fields will be copied over from the DataplaneNode (or Role due to +inheritance) and changes there will be propagated to the NovaExternalCompute CR +* InventoryConfigMapName +* NetworkAttachments +* SshKeySecretName + +The AnsibleEEContainerImage, NovaComputeContainerImage, and +NovaLibvirtContainerImage will be set based on the respective ENV value via the +defaulting webhook in nova-operator during NovaExternalCompute auto creation +but these fields can be changed later independently from their defaults. + +The name of the NovaExternalCompute CR will also be aligned with the name of +the DataPlaneNode CR it is related to. + +The NovaInstance field is removed and the CellName field is redefined to hold +the name of the NovaCell CR. The name of the NovaCell CR is a combination of +the name of the cell from the NovaCellTemplate in the Nova CR and the name of +the Nova CR. So it uniquely identifies a cell even if there are multiple Nova +CRs exist in the namespace. + +The Deploy field of the NovaExternalCompute CR will be filled based on the +DataPlaneNode Deploy field. + +The upstream nova project does not support moving computes between cells. +Therefore the NovaCell field can only be set at creation and cannot be updated +later. The dataplane-operator should not know about the meaning of the cell +annotation so it cannot directly enforce this immutability. Therefore the +nova-operator needs to register a validation webhook on the DataPlaneRole and +DataPlaneNode CR to implement this rule. + + +## Alternatives + +* Continue the pattern what we have today. The DataPlane CRDs would embed + OpenStack specific service templates and the dataplane-operator would create + service specific CRDs based on that to trigger OpenStack service deployment + on the EDPM node after phase 0 is succeeded. This alternative can support + `CustomServiceConfig` based OpenStack service configuration. + + It is not the preferred alternative as it would cause that the + dataplane-operator gather OpenStack service specific implementations. And + therefore OpenStack component specific knowledge would be split across + multiple operators. + +* Do not represent OpenStack service types as CRDs. Keep DataPlane CRDs as + generic Ansible interfaces to trigger service deployment. + + It is not the preferred alternative as it would duplicate the service + configuration syntax and semantic for the human deployer. + Also it would move OpenStack service specific logic to the + dataplane-operator, and therefore OpenStack component specific knowledge + would be split across multiple operators. + +* Decouple the service CRDs from the DataPlane CRDs as in the main proposal + but instead of watching the DataPlaneNode CRs to trigger the auto creation + of the relevant service CR, the DataPlaneNode - Service CR connection is + defined by DataPlaneNode CR names part of the OpenStackControlPlane CR. + E.g. the `OpenStackControlPlane.Spec.Nova.Template.CellTemplate` can contain + a list of DataPlaneNode names to define that the given DataPlaneNode are + compute nodes and therefore NovaExternalCompute CR creation is needed. + + This alternative would remove the need for the watch in the service operator + and therefore remove some of the complexity and performance overhead of of + the main proposal. However it would mean that the OpenstackControlPlane CR + contains data plane related information. + +## Implementation considerations + +The nova-operator will have a new controller that will watch DataPlaneNode CRs, +therefore the controller will Reconcile when a DataPlaneNode CR is created or +updated. The controller will look up the CR and checks for the +`nova.openstack.org/cell` annotation. If it is not present then the reconcile +ends. If the annotation is present then the controller will CreateOrPatch a +NovaExternalCompute CR based on the fields in the DataPlaneNode CR and the +fields of the NovaCell CR pointed by the `nova.openstack.org/cell`. + +The existing controller for NovaExternalCompute will reconcile the CR as today +but it will wait for the ReadyCondition of the DataPlaneNode CR before creating +the AnsibleEE CRs to deploy compute services on the EDPM node. From 32e957dd9ed6a272b0ef2acd33cb5be96d225e6a Mon Sep 17 00:00:00 2001 From: Balazs Gibizer Date: Tue, 13 Jun 2023 06:29:46 +0200 Subject: [PATCH 2/2] [spec]Grouping EDPM nodes --- spec/grouping-nodes.md | 155 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 155 insertions(+) create mode 100644 spec/grouping-nodes.md diff --git a/spec/grouping-nodes.md b/spec/grouping-nodes.md new file mode 100644 index 0000000..9a394de --- /dev/null +++ b/spec/grouping-nodes.md @@ -0,0 +1,155 @@ +# Grouping nodes + +## Problem description + +Today a DataPlaneRole CR defines a grouping of OpenStackDataPlaneNode CRs to +make it possible to define configuration once but apply it to each Node having +the same Role. Today a DataPlaneNode can only belong to a single Role. Today +there is CRD field level inheritance defined between Role and Node. If a CRD +field is defined both in the Role and the Node then the value from the Node +will take precedence. With today's model the DataPlaneRole has a direct +grouping effect for the NovaExternalCompute CRs as well. + +However if we move the EDPM related OpenStack service creation from the +dataplane-operator to the respective service-operator [as proposed in a +separate spec](https://github.com/openstack-k8s-operators/docs/pull/41), then +DataPlaneRole based grouping will not be automatically applied. So we need a +different solution. + +Moreover the current restriction that a DataPlaneNode can only belong to a +single DataPlaneRole is limiting. From nova perspective a set of compute nodes +can be configured with CPU pinning while from neutron perspective a set of +compute nodes can be configured with DPDK. If the two sets of nodes overlap but +are not equal then there will be a need for additional roles to be defined for +each combination of the configuration (pinning with DPDK, pinning without +DPDK, sharing with DPDK, sharing without DPDK). This shows that the current +restriction can lead to the explosion of the number of necessary groups to +model the different combinations of service configs. + +## Proposed change + +Allow an EDPM node to belong to multiple groups by defining OpenStack service +specific profiles. The DataPlaneRole could remain as is today but it would +represent the grouping of the EDPM nodes only just from the generic +hardware, host OS, infrastructure configuration perspective. + +Each service operator that has EDPM specific service CRD (e.g. +NovaExternalCompute, OVNMetadataAgent) can define a Profile CRD (e.g. +ComputeProfile, NetworkProfile) to describe the service specific configuration +of a set of EDPM nodes. Then the human deployer can use annotations on the +DataPlaneRole or DataPlaneNode to select a single Profile per service type. + +### Example + +ComputeProfile with shared CPUs +```yaml +apiVersion: nova.openstack.org/v1beta1 +kind: ComputeProfile +metadata: + name: dell-r740-shared-cpus +spec: + customComputeServiceConfig: | + [compute] + cpu_shared_set = 4-12,^8,15 + [DEFAULT] + cpu_allocation_ratio = 4.0 +``` + +ComputeProfile with pinned CPUs +```yaml +apiVersion: nova.openstack.org/v1beta1 +kind: ComputeProfile +metadata: +name: dell-r740-dedicated-cpus +spec: + customComputeServiceConfig: | + [compute] + cpu_dedicated_set = 4-12,^8,15 + [DEFAULT] + cpu_allocation_ratio = 1.0 +``` + +OpenstackDataPlaneRole/edpm-compute +```yaml +apiVersion: dataplane.openstack.org/v1beta1 +kind: OpenStackDataPlaneRole +metadata: + name: edpm-compute + annotations: + nova.openstack.org/cell: nova-cell1 + nova.openstack.org/compute-profile: dell-r740-shared-cpus + telemetry.openstack.org/telemetry-profile: collector +spec: +... +``` +This Role defines a set of EDPM compute nodes with shared CPUs. + + +OpenstackDataPlaneNode/edpm-compute-0 +```yaml +apiVersion: dataplane.openstack.org/v1beta1 +kind: OpenStackDataPlaneNode +metadata: + name: edpm-compute-0 +spec: + role: edpm-compute +... +``` +This EDPM node is a compute that has shared CPUs as the Role selected the +dell-r740-shared-cpus ComputeProfile + +OpenstackDataPlaneNode/edpm-compute-1 +```yaml +apiVersion: dataplane.openstack.org/v1beta1 +kind: OpenStackDataPlaneNode +metadata: + name: edpm-compute-1 + annotations: + nova.openstack.org/compute-profile: dell-r740-dedicated-cpus + neutron.openstack.org/network-profile: dpdk +spec: + role: edpm-compute +... +``` +This EDPM node is a compute with pinned CPUs and DPDK data path as the +compute-profile annotation on the Node overrides the same annotation from the +Role. But this EDPM node still has the a collector telemetry-profile inherited +from the Role. + +NovaExternalCompute=edpm-compute-0 +```yaml +apiVersion: nova.openstack.org/v1beta1 +kind: NovaExternalCompute +metadata: + name: edpm-compute-0 +spec: + dataplaneNodeName: edpm-compute-0 + computeProfileName: dell-r740-shared-cpus +.... +``` +This CR is automatically created by nova-operator and the new +`ComputeProfileName` field is filled based on the +`nova.openstack.org/compute-profile` of the corresponding DataPlaneNode. + +The `ComputeProfileName` field only holds a single profile name. So this +proposal does not allow composing ComputeProfiles. Composable ComputeProfiles +would require detailed definition of the precedence between the profiles and +we want to avoid that complexity here. + +## Alternatives + + +## Implementation considerations + +The dataplane-operator does not need to understand and handle the profile +annotations. That will be set by the human deployer and read by the service +operator the annotation refers to. + +The new ComputeProfileName field will point to the ComputeProfile CRD for the +generic compute config. Changes in the DatalPlaneNode or Role compute-profile +annotation will be propagated to here. So assigning a different compute profile +to a Node is possible and it will mean a reconfiguration of the compute +service(s) on the EDPM node. + +There is no need for controller reconciling ComputeProfile CRs as those CRs are +purely just data stores. \ No newline at end of file