-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
K8SSAND-483 ⁃ Updating Statefulsets is broken when upgrading to 1.7.0 #103
Comments
As I mentioned in the description users upgrading to 1.7.0 will hit this issue. If we revert the change, then users who have created new CassandraDatacenters with 1.7.0 will hit this issue when they upgrade if we revert. Either way there is an upgrade problem. We can provide a script to resolve the upgrade issue. The script will do the following:
I can work on creating this script and we can use use it whether we decide to revert #18 and do a 1.7.1 release or just provide it as a work around for users upgrading to 1.7.0 |
Just to add some context: The operator makes two services for dse pods, one with Setting the This can be useful in a number of scenarios, one of which is having the pods on an overlay network with stable IPs, where pod to pod communication happens via dns names: https://www.datastax.com/blog/2021/05/how-connect-stateful-workloads-across-kubernetes-clusters (Also added the above to the original PR #18) |
I am still encountering this with 1.7.1 |
Hi @talonx we're working as we speak on a script that can remedy the problem on systems that were previously upgraded from 1.6 -> 1.7.0. Unfortunately, just upgrading to 1.7.1 won't address the issue. We're hoping to have the script and a blog post with some details on the issue available very soon. |
Hi @jdonenine in my case I had upgraded from my previous version (not 1.6 or 1.7) to 1.7.1 and started seeing this in the logs. Is that expected? |
No, I would not have expected that if you hadn't already gone to 1.7.0 @talonx . A couple of questions...
|
|
@talonx Can you show the output of |
And would it be possible to see the labels of CassandraDatacenter object? |
What happened?
I created a CassandraDatacenter with cass-operator 1.6.0. I then updated to 1.7.0. cass-operator fails to apply StatefulSet changes. cass-operator logs this error:
{"level":"error","ts":1621959974.8570168,"logger":"controller-runtime.controller","msg":"Reconciler error","controller":"cassandradatacenter-controller","request":"default/labels","error":"StatefulSet.apps \"labels-labels-default-sts\" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', and 'updateStrategy' are forbidden","stacktrace":"github.com/go-logr/zapr.(_zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:128\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(_Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:258\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(_Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:232\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(_Controller).worker\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:211\nk8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88"}
Note this part in particular:
This regression is due to the changes in #18 which change the
ServiceName
property of the StatefulSet.cass-operator logs this error and does not continue with the reconciliation process. The Cassandra pods will remain running. If you try to change the CassandraDatacenter spec in such a way that would result in a change to the StatefulSet, the changes won't be applied.
Did you expect to see something different?
How to reproduce it (as minimally and precisely as possible):
Environment
Cass Operator version:
**Anything else we need to know?**:The error occurs in the `CheckRackPodTemplate` function in `reconcile_racks.go`. This will impact any existing CassandraDatacenter that upgrades cass-operator.The bug will not impact new CassandraDatacenters installed with 1.7.0. I am inclined to say that we need to revert the changes in Allow dns lookup by pod name for all pods #18; however, doing so will introduce this problem for users who created new CassandraDatacenters with 1.7.0 and then go to upgrade. Given that we need to carefully consider how best to resolve this.v1.7.0
┆Issue is synchronized with this Jira Bug by Unito
┆fixVersions: k8ssandra-1.2.0,cass-operator-1.7.1
┆friendlyId: K8SSAND-483
┆priority: Highest
The text was updated successfully, but these errors were encountered: