Make Galera service A/P #229

dciabrin · 2024-06-12T08:09:23Z

The service created for a Galera cluster always balances traffic to all available Galera replicas, which means the cluster is effectively configured as an A/A service.

In order to limit the burden on clients of supporting A/A semantics, change the way the service is configured to act as A/P instead.

A new callback script is introduced in pods. It is called by mysqld automatically every time the Galera library detects a change in the cluster (node join, crash, network partition...). This callback script reconfigures the service CR via label selectors to drive traffic to a single pod at a time.

Jira: OSPRH-7405

zzzeek

were we going to do a CRD change to make this configurable? would be handy for us to be able to deploy A/A so that we can in fact have CI / tempest jobs that do this and we can slowly try to improve codebases

dciabrin · 2024-06-12T14:03:26Z

@zzzeek my plan is to do that in a subsequent PR, to avoid any CRD change that would force a bump in openstack-operator.

stuggi · 2024-06-12T14:08:04Z

@zzzeek my plan is to do that in a subsequent PR, to avoid any CRD change that would force a bump in openstack-operator.

Depending on the CRD change, I think we should do it now.

stuggi · 2024-06-12T14:08:39Z

controllers/galera_controller.go

 		service.Spec = pkgsvc.Spec
+		if present {
+			service.Spec.Selector[ActivePodSelectorKey] = activePod
+		}
 		err := controllerutil.SetOwnerReference(instance, service, r.Client.Scheme())


Is there a reason why the instance is not set as the Controller (same for L483)? currently the controller won't reconcile if the service gets deleted.

I guess we need it now for this PR as otherwise the controller will reconcile on the selector update and returns it to what it think it should be?

With this, there is a scenario where we'd loose the a-p selector. If someone deletes the service (for whatever reason), first of all the controller currently won't reconcile and re-create it. So a user would have to restart the mariadb operator, or do some other change to trigger a reconcile, but in this scenario, we don't have the active pod selector as it is a new service and there won't be a change to the galera cluster and the script won't run. With this we are in the multi master config.

Another approach could be (not sure yet which one is better) that the script adds an annotation to the statefulset on which one is the master (I guess on a pod restart it will always run at least once?), the controller would then reconcile and can update the service.
There is also a downside in this approach, the operator needs to run. If the mariadb-operator is down when there is a galera change, those would not be reflected properly. This could happen e.g. if the operator and the deployment pod which was the master are located on a worker which failed. Depending on how log it takes to re-schedule the operator the db would be not available ... so ... maybe it can be a mix, like checking if there is already the statefulset and take the info from there

Is there a reason why the instance is not set as the Controller (same for L483)? currently the controller won't reconcile if the service gets deleted.

I'm not sure I got this one. Isn't it what lines 509 and 483 supposed to do? When I do e.g. oc describe service openstack, I can see a label owner=mariadb-operator.

I guess we need it now for this PR as otherwise the controller will reconcile on the selector update and returns it to what it think it should be?

Hmmm you're correct we'd lose this label in this case.

With this, there is a scenario where we'd loose the a-p selector. If someone deletes the service (for whatever reason), first of all the controller currently won't reconcile and re-create it. So a user would have to restart the mariadb operator, or do some other change to trigger a reconcile, but in this scenario, we don't have the active pod selector as it is a new service and there won't be a change to the galera cluster and the script won't run. With this we are in the multi master config.

You're correct, the label won't be recreated automatically. Restarting a pod would be sufficient for galera of a cluster change and to recreate the right label, although that would be a manual action.

Another approach could be (not sure yet which one is better) that the script adds an annotation to the statefulset on which one is the master (I guess on a pod restart it will always run at least once?), the controller would then reconcile and can update the service. There is also a downside in this approach, the operator needs to run. If the mariadb-operator is down when there is a galera change, those would not be reflected properly. This could happen e.g. if the operator and the deployment pod which was the master are located on a worker which failed. Depending on how log it takes to re-schedule the operator the db would be not available ... so ... maybe it can be a mix, like checking if there is already the statefulset and take the info from there

So I'm against trying to duplicate the state in the CR's status because I don't want to ensure that the states are in sync. However we could handle this deletion case by selecting any one of the healthy pod as the current active endpoint. This wouldn't conflict with the script as it only performs conditional writes.

Is there a reason why the instance is not set as the Controller (same for L483)? currently the controller won't reconcile if the service gets deleted.

I'm not sure I got this one. Isn't it what lines 509 and 483 supposed to do? When I do e.g. oc describe service openstack, I can see a label owner=mariadb-operator.

They are adding an owner, but not as the controller. Owns(&corev1.Service{}). in the NewControllerManagedBy() will only reconcile on the ones are also the controller, like in https://github.com/openstack-k8s-operators/mariadb-operator/blob/main/controllers/galera_controller.go#L447 .

I guess we need it now for this PR as otherwise the controller will reconcile on the selector update and returns it to what it think it should be?

Hmmm you're correct we'd lose this label in this case.

With this, there is a scenario where we'd loose the a-p selector. If someone deletes the service (for whatever reason), first of all the controller currently won't reconcile and re-create it. So a user would have to restart the mariadb operator, or do some other change to trigger a reconcile, but in this scenario, we don't have the active pod selector as it is a new service and there won't be a change to the galera cluster and the script won't run. With this we are in the multi master config.

You're correct, the label won't be recreated automatically. Restarting a pod would be sufficient for galera of a cluster change and to recreate the right label, although that would be a manual action.

right, restart one of the pods would also work.

Another approach could be (not sure yet which one is better) that the script adds an annotation to the statefulset on which one is the master (I guess on a pod restart it will always run at least once?), the controller would then reconcile and can update the service. There is also a downside in this approach, the operator needs to run. If the mariadb-operator is down when there is a galera change, those would not be reflected properly. This could happen e.g. if the operator and the deployment pod which was the master are located on a worker which failed. Depending on how log it takes to re-schedule the operator the db would be not available ... so ... maybe it can be a mix, like checking if there is already the statefulset and take the info from there

So I'm against trying to duplicate the state in the CR's status because I don't want to ensure that the states are in sync. However we could handle this deletion case by selecting any one of the healthy pod as the current active endpoint. This wouldn't conflict with the script as it only performs conditional writes.

No, I am not saying to store it in the CR status. Right now the script updates the service/endpoint. it could also add an annotation to the statefulset object with the information. And the controller here takes it from there instead of the service. since the statefulset remains in the mentioned scenario, the information would survive.

But we could handle this in a follow up. just wanted to bring up that there are situations where we may get back to the a-a config.

Is there a reason why the instance is not set as the Controller (same for L483)? currently the controller won't reconcile if the service gets deleted.

I'm not sure I got this one. Isn't it what lines 509 and 483 supposed to do? When I do e.g. oc describe service openstack, I can see a label owner=mariadb-operator.

They are adding an owner, but not as the controller. Owns(&corev1.Service{}). in the NewControllerManagedBy() will only reconcile on the ones are also the controller, like in https://github.com/openstack-k8s-operators/mariadb-operator/blob/main/controllers/galera_controller.go#L447 .

Oh I see now... Thanks for pointing that out. I don't see why we shouldn't own it, I am going to update
the PR.

No, I am not saying to store it in the CR status. Right now the script updates the service/endpoint. it could also add an annotation to the statefulset object with the information. And the controller here takes it from there instead of the service. since the statefulset remains in the mentioned scenario, the information would survive.

But we could handle this in a follow up. just wanted to bring up that there are situations where we may get back to the a-a config.

I think it's sufficient to always set the first pod as the current active endpoint, as 1) if that information becomes incorrect, it is guaranteed to get updated by galera as soon as it detects that this galera node is unresponsive. 2) if it is a healthy pod, it can be used as the active endpoint anyway, and the script will still be able to perform failover next time it is needed.

stuggi · 2024-06-13T06:45:39Z

controllers/galera_controller.go

@@ -800,6 +814,7 @@ func (r *GaleraReconciler) SetupWithManager(mgr ctrl.Manager) error {
 		Owns(&corev1.Endpoints{}).
 		Owns(&corev1.ConfigMap{}).
 		Owns(&corev1.ServiceAccount{}).
+		Owns(&corev1.Service{}).


already in L813

dciabrin · 2024-06-13T07:49:07Z

/retest

dciabrin · 2024-06-13T10:00:22Z

As discussed with @stuggi, we now do SetControllerReference on all Service CRs to get notified of deleting and react appropriately.

The service created for a Galera cluster always balances traffic to all available Galera replicas, which means the cluster is effectively configured as an A/A service. In order to limit the burden on clients of supporting A/A semantics, change the way the service is configured to act as A/P instead. A new callback script is introduced in pods. It is called by mysqld automatically every time the Galera library detects a change in the cluster (node join, crash, network partition...). This callback script reconfigures the service CR via label selectors to drive traffic to a single pod at a time. Jira: OSPRH-7405

stuggi

/lgtm

openshift-ci · 2024-06-13T13:05:18Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dciabrin, stuggi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [dciabrin,stuggi]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

This reverts commit ab95f15. The openstack-k8s-operators/mariadb-operator#229 switched the mariadb-operator to Active/Passive mode instead of multimaster. This means we don't need to force synchronization from the client config any more.

This reverts commit a5b0bf4. The openstack-k8s-operators/mariadb-operator#229 switched the mariadb-operator to Active/Passive mode instead of multimaster. This means we don't need to force synchronization from the client config any more.

This reverts commit ab95f15. The openstack-k8s-operators/mariadb-operator#229 switched the mariadb-operator to Active/Passive mode instead of multimaster. This means we don't need to force synchronization from the client config any more.

This reverts commit 980c318. The openstack-k8s-operators/mariadb-operator#229 switched the mariadb-operator to deploy in Active/Passive mode per default, instead of multimaster.

This reverts commit 67b2877. The openstack-k8s-operators/mariadb-operator#229 switched the mariadb-operator to deploy in Active/Passive mode per default, instead of multimaster.

The openstack-k8s-operators/mariadb-operator#229 switched the mariadb-operator to deploy in Active/Passive mode per default, instead of multimaster. So we should no longer set `mysql_wsrep_sync_wait`. This reverts commit cf4f3f6.

Revert "Cinder wait for DB writes on reads" The openstack-k8s-operators/mariadb-operator#229 switched the mariadb-operator to deploy in Active/Passive mode per default, instead of multimaster. So we should no longer set mysql_wsrep_sync_wait. Related PR: openstack-k8s-operators/cinder-operator#401 This reverts commit cf4f3f6. Reviewed-by: John Fulton <[email protected]>

This reverts commit e17640c. No longer needed because of openstack-k8s-operators/mariadb-operator#229

dciabrin requested review from zzzeek and stuggi June 12, 2024 08:09

openshift-ci bot requested review from abays and frenzyfriday June 12, 2024 08:09

openshift-ci bot added the approved label Jun 12, 2024

dciabrin force-pushed the galera-a-p branch from e9a0352 to 830f4de Compare June 12, 2024 08:30

zzzeek reviewed Jun 12, 2024

View reviewed changes

stuggi reviewed Jun 12, 2024

View reviewed changes

dciabrin force-pushed the galera-a-p branch from 830f4de to e76b3db Compare June 12, 2024 23:25

stuggi reviewed Jun 13, 2024

View reviewed changes

dciabrin force-pushed the galera-a-p branch 2 times, most recently from c7af585 to 5fb18af Compare June 13, 2024 09:59

dciabrin force-pushed the galera-a-p branch from 5fb18af to b519c8d Compare June 13, 2024 11:19

stuggi approved these changes Jun 13, 2024

View reviewed changes

openshift-ci bot assigned stuggi Jun 13, 2024

openshift-ci bot added the lgtm label Jun 13, 2024

openshift-merge-bot bot merged commit 7d997dc into openstack-k8s-operators:main Jun 13, 2024
6 checks passed

gibizer mentioned this pull request Jun 14, 2024

Revert "Wait for DB writes to propagate (causality checks)" openstack-k8s-operators/nova-operator#788

Merged

gibizer mentioned this pull request Jun 14, 2024

Revert "Wait for DB writes to propagate (causality checks)" openstack-k8s-operators/placement-operator#211

Merged

stuggi mentioned this pull request Jun 17, 2024

Revert "Wait for DB writes only on reads" openstack-k8s-operators/glance-operator#558

Merged

stuggi mentioned this pull request Jun 18, 2024

Revert mysql wsrep sync wait openstack-k8s-operators/cinder-operator#401

Merged

Akrog mentioned this pull request Jun 18, 2024

Revert "Cinder wait for DB writes on reads" openstack-k8s-operators/architecture#297

Merged

gthiemonge added a commit to gthiemonge/octavia-operator that referenced this pull request Jun 19, 2024

Revert "Wait for DB writes to propagate (causality checks)"

430b0a0

This reverts commit e17640c. No longer needed because of openstack-k8s-operators/mariadb-operator#229

gthiemonge mentioned this pull request Jun 19, 2024

Revert "Wait for DB writes to propagate (causality checks)" openstack-k8s-operators/octavia-operator#331

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make Galera service A/P #229

Make Galera service A/P #229

dciabrin commented Jun 12, 2024 •

edited by openshift-ci bot

Loading

zzzeek left a comment

dciabrin commented Jun 12, 2024

stuggi commented Jun 12, 2024

stuggi Jun 12, 2024

dciabrin Jun 12, 2024

stuggi Jun 12, 2024

dciabrin Jun 12, 2024

stuggi Jun 13, 2024

dciabrin commented Jun 13, 2024

dciabrin commented Jun 13, 2024

stuggi left a comment

openshift-ci bot commented Jun 13, 2024

Make Galera service A/P #229

Make Galera service A/P #229

Conversation

dciabrin commented Jun 12, 2024 • edited by openshift-ci bot Loading

zzzeek left a comment

Choose a reason for hiding this comment

dciabrin commented Jun 12, 2024

stuggi commented Jun 12, 2024

stuggi Jun 12, 2024

Choose a reason for hiding this comment

dciabrin Jun 12, 2024

Choose a reason for hiding this comment

stuggi Jun 12, 2024

Choose a reason for hiding this comment

dciabrin Jun 12, 2024

Choose a reason for hiding this comment

stuggi Jun 13, 2024

Choose a reason for hiding this comment

dciabrin commented Jun 13, 2024

dciabrin commented Jun 13, 2024

stuggi left a comment

Choose a reason for hiding this comment

openshift-ci bot commented Jun 13, 2024

dciabrin commented Jun 12, 2024 •

edited by openshift-ci bot

Loading