Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[18.0-fr1] Improve the failover of galera service #290

Conversation

openshift-cherrypick-robot

This is an automated cherry-pick of #289

/assign stuggi

When a galera node is in the process of shutting down (e.g. during a rolling restart
caused by a minor update), the node is unable to serve SQL queries, however it is
still connected to clients. This confuses clients who get unexpected SQL status [1]
and prevent them from retrying their queries, causing unexpected errors down the road.

Improve the pod stop pre-hook to failover the active endpoint to another pod prior
to shutting down the galera server, and kill connected clients to force them to reconnect
to the new active endpoint. At this stage, the galera server can be safely shutdown
as no client will see its WSREP state update.

Also update the failover script: 1) when no endpoint is available, ensure no traffic
is going through any pod. 2) do not trigger a endpoint failover as long as the current
endpoint targets a galera node that is still part of the primary partition (i.e. it
is still able to serve traffic).

[1] 'WSREP has not yet prepared node for application use'

Jira: OSPRH-11488
@dciabrin
Copy link
Contributor

/test mariadb-operator-build-deploy

Copy link
Contributor

openshift-ci bot commented Nov 21, 2024

@dciabrin: No presubmit jobs available for openstack-k8s-operators/[email protected]

In response to this:

/test mariadb-operator-build-deploy

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@dciabrin
Copy link
Contributor

/retest-required

Copy link
Contributor

@olliewalsh olliewalsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Copy link
Contributor

openshift-ci bot commented Nov 21, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: olliewalsh, openshift-cherrypick-robot

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit c6fdbd1 into openstack-k8s-operators:18.0-fr1 Nov 21, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants