Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression: PR 2253 causes all-at-once tenant update #2332

Closed
mmulvanny opened this issue Oct 8, 2024 · 5 comments
Closed

Regression: PR 2253 causes all-at-once tenant update #2332

mmulvanny opened this issue Oct 8, 2024 · 5 comments

Comments

@mmulvanny
Copy link

Expected Behavior

Updating a tenant should cause pods to update in a rolling fashion, and the MinIO service should remain available at all times.

Current Behavior

Updating to 6.0.3 causes the operator to delete all tenant pods at once, causing a MinIO outage.

Possible Solution

Steps to Reproduce (for bugs)

  1. Deploy MinIO tenant with multiple pods managed by operator version 5.0.15
  2. Upgrade operator to 6.0.3

Context

We upgraded the MinIO operator from 5.0.15 to 6.0.3.

Regression

This was caused by the combination of PR 2221 (which moved environment configuration to a sidecar) and PR 2253 (which deleted pods on configuration changes). Was PR 2253 intended to remove rolling updates?

Your Environment

  • Version used (minio-operator): This occurred immediately after an upgrade to 6.0.3.
  • Environment name and version (e.g. kubernetes v1.17.2): Kubernetes 1.28.7
  • Server type and version:
  • Operating System and version Ubuntu 20.04
  • Link to your deployment file:
@harshavardhana
Copy link
Member

There is no such thing as rolling updates @mmulvanny in our operator. We always employ in-place updates of the container binary and then subsequently statefulset rolls the changes.

However the cluster itself must be online way before this, please share the operator logs and let us make the right assessment on what happened.

Thanks

@cesnietor
Copy link
Contributor

For upgrading to version 6 we have documentation since there are breaking changes:
please see:
https://min.io/docs/minio/kubernetes/eks/operations/install-deploy-manage/upgrade-minio-operator.html#upgrade-minio-operator-5-0-15-to-operator-version-stable

@cesnietor
Copy link
Contributor

closing, please reopen if the docs don't help.

@mmulvanny
Copy link
Author

We performed an upgrade of another instance today and ran into the same issue. We use Flux to manage our Helm releases, but our upgrade steps were equivalent to the Helm upgrade steps on the page @cesnietor linked. We upgraded the tenant's and the operator's Helm charts from 5.0.15 to 6.0.3 simultaneously.

Our operator log is here:

minio-operator-6.0.3-upgrade.log

@harshavardhana we always see exactly the behavior you described where the controller updates the Statefulset and then the pods restart one-by-one. That started to happen in this case too, but then the operator deleted the pods and forced them to start up together.

I tried modifying the environment variables of the tenant StatefulSet in our test environment to see if I could get the 6.0.3 operator to delete pods and wasn't able to. Was there a particular condition that would invoke the code path of PR 2253 that I failed to reproduce by doing that?

@ramondeklein
Copy link
Contributor

@mmulvanny Unfortunately, Kubernetes doesn't allow an STS to restart all pods at once (it can do this for a deployment using the Recreate strategy). If there are updates to the STS, then Kubernetes will initiate a rolling restart. If the operator updates the STS with a new image, then Kubernetes will probably already start a rolling update.

The operator will force all pods to terminate (and thus restart all at once) when it detects that there was a change to any of the environment variables (starting with MINIO_) of the minio container in the statefulset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants